1
|
Thawornwattana Y, Seixas F, Yang Z, Mallet J. Major patterns in the introgression history of Heliconius butterflies. eLife 2023; 12:RP90656. [PMID: 38108819 PMCID: PMC10727504 DOI: 10.7554/elife.90656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2023] Open
Abstract
Gene flow between species, although usually deleterious, is an important evolutionary process that can facilitate adaptation and lead to species diversification. It also makes estimation of species relationships difficult. Here, we use the full-likelihood multispecies coalescent (MSC) approach to estimate species phylogeny and major introgression events in Heliconius butterflies from whole-genome sequence data. We obtain a robust estimate of species branching order among major clades in the genus, including the 'melpomene-silvaniform' group, which shows extensive historical and ongoing gene flow. We obtain chromosome-level estimates of key parameters in the species phylogeny, including species divergence times, present-day and ancestral population sizes, as well as the direction, timing, and intensity of gene flow. Our analysis leads to a phylogeny with introgression events that differ from those obtained in previous studies. We find that Heliconius aoede most likely represents the earliest-branching lineage of the genus and that 'silvaniform' species are paraphyletic within the melpomene-silvaniform group. Our phylogeny provides new, parsimonious histories for the origins of key traits in Heliconius, including pollen feeding and an inversion involved in wing pattern mimicry. Our results demonstrate the power and feasibility of the full-likelihood MSC approach for estimating species phylogeny and key population parameters despite extensive gene flow. The methods used here should be useful for analysis of other difficult species groups with high rates of introgression.
Collapse
Affiliation(s)
| | - Fernando Seixas
- Department of Organismic and Evolutionary Biology, Harvard UniversityCambridgeUnited States
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College LondonLondonUnited Kingdom
| | - James Mallet
- Department of Organismic and Evolutionary Biology, Harvard UniversityCambridgeUnited States
| |
Collapse
|
2
|
Flouri T, Jiao X, Huang J, Rannala B, Yang Z. Efficient Bayesian inference under the multispecies coalescent with migration. Proc Natl Acad Sci U S A 2023; 120:e2310708120. [PMID: 37871206 PMCID: PMC10622872 DOI: 10.1073/pnas.2310708120] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Accepted: 08/15/2023] [Indexed: 10/25/2023] Open
Abstract
Analyses of genome sequence data have revealed pervasive interspecific gene flow and enriched our understanding of the role of gene flow in speciation and adaptation. Inference of gene flow using genomic data requires powerful statistical methods. Yet current likelihood-based methods involve heavy computation and are feasible for small datasets only. Here, we implement the multispecies-coalescent-with-migration model in the Bayesian program bpp, which can be used to test for gene flow and estimate migration rates, as well as species divergence times and population sizes. We develop Markov chain Monte Carlo algorithms for efficient sampling from the posterior, enabling the analysis of genome-scale datasets with thousands of loci. Implementation of both introgression and migration models in the same program allows us to test whether gene flow occurred continuously over time or in pulses. Analyses of genomic data from Anopheles mosquitoes demonstrate rich information in typical genomic datasets about the mode and rate of gene flow.
Collapse
Affiliation(s)
- Tomáš Flouri
- Department of Genetics, Evolution, and Environment, University College London, LondonWC1E 6BT, United Kingdom
| | - Xiyun Jiao
- Department of Statistics and Data Science, China Southern University of Science and Technology, Shenzhen518055, China
| | - Jun Huang
- Department of Intelligent Medical Engineering, School of Biomedical Engineering, Capital Medical University, Beijing100069, China
| | - Bruce Rannala
- Department of Evolution and Ecology, University of California, Davis, CA95616
| | - Ziheng Yang
- Department of Genetics, Evolution, and Environment, University College London, LondonWC1E 6BT, United Kingdom
| |
Collapse
|
3
|
Tiley GP, Flouri T, Jiao X, Poelstra JW, Xu B, Zhu T, Rannala B, Yoder AD, Yang Z. Estimation of species divergence times in presence of cross-species gene flow. Syst Biol 2023; 72:820-836. [PMID: 36961245 PMCID: PMC10405360 DOI: 10.1093/sysbio/syad015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 03/22/2023] [Indexed: 03/25/2023] Open
Abstract
Cross-species introgression can have significant impacts on phylogenomic reconstruction of species divergence events. Here, we used simulations to show how the presence of even a small amount of introgression can bias divergence time estimates when gene flow is ignored in the analysis. Using advances in analytical methods under the multispecies coalescent (MSC) model, we demonstrate that by accounting for incomplete lineage sorting and introgression using large phylogenomic data sets this problem can be avoided. The multispecies-coalescent-with-introgression (MSci) model is capable of accurately estimating both divergence times and ancestral effective population sizes, even when only a single diploid individual per species is sampled. We characterize some general expectations for biases in divergence time estimation under three different scenarios: 1) introgression between sister species, 2) introgression between non-sister species, and 3) introgression from an unsampled (i.e., ghost) outgroup lineage. We also conducted simulations under the isolation-with-migration (IM) model and found that the MSci model assuming episodic gene flow was able to accurately estimate species divergence times despite high levels of continuous gene flow. We estimated divergence times under the MSC and MSci models from two published empirical datasets with previous evidence of introgression, one of 372 target-enrichment loci from baobabs (Adansonia), and another of 1000 transcriptome loci from 14 species of the tomato relative, Jaltomata. The empirical analyses not only confirm our findings from simulations, demonstrating that the MSci model can reliably estimate divergence times but also show that divergence time estimation under the MSC can be robust to the presence of small amounts of introgression in empirical datasets with extensive taxon sampling. [divergence time; gene flow; hybridization; introgression; MSci model; multispecies coalescent].
Collapse
Affiliation(s)
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, London, UK
| | - Xiyun Jiao
- Department of Genetics, Evolution and Environment, University College London, London, UK
- Department of Statistics and Data Science, China Southern University of Science and Technology, Shenzhen, Guangdong, China
| | | | - Bo Xu
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Tianqi Zhu
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, China
- Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, China
| | - Bruce Rannala
- Department of Evolution and Ecology, University of California, Davis, Davis, CA, USA
| | - Anne D Yoder
- Department of Biology, Duke University, Durham, NC, USA
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London, UK
| |
Collapse
|
4
|
Adams R, DeGiorgio M. Likelihood-Based Tests of Species Tree Hypotheses. Mol Biol Evol 2023; 40:msad159. [PMID: 37440530 PMCID: PMC10368450 DOI: 10.1093/molbev/msad159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 06/20/2023] [Accepted: 07/06/2023] [Indexed: 07/15/2023] Open
Abstract
Likelihood-based tests of phylogenetic trees are a foundation of modern systematics. Over the past decade, an enormous wealth and diversity of model-based approaches have been developed for phylogenetic inference of both gene trees and species trees. However, while many techniques exist for conducting formal likelihood-based tests of gene trees, such frameworks are comparatively underdeveloped and underutilized for testing species tree hypotheses. To date, widely used tests of tree topology are designed to assess the fit of classical models of molecular sequence data and individual gene trees and thus are not readily applicable to the problem of species tree inference. To address this issue, we derive several analogous likelihood-based approaches for testing topologies using modern species tree models and heuristic algorithms that use gene tree topologies as input for maximum likelihood estimation under the multispecies coalescent. For the purpose of comparing support for species trees, these tests leverage the statistical procedures of their original gene tree-based counterparts that have an extended history for testing phylogenetic hypotheses at a single locus. We discuss and demonstrate a number of applications, limitations, and important considerations of these tests using simulated and empirical phylogenomic data sets that include both bifurcating topologies and reticulate network models of species relationships. Finally, we introduce the open-source R package SpeciesTopoTestR (SpeciesTopology Tests in R) that includes a suite of functions for conducting formal likelihood-based tests of species topologies given a set of input gene tree topologies.
Collapse
Affiliation(s)
- Richard Adams
- Agricultural Statistics Laboratory, University of Arkansas, Fayetteville, AR
- Department of Entomology and Plant Pathology, University of Arkansas, Fayetteville, AR
| | - Michael DeGiorgio
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL
| |
Collapse
|
5
|
Xu Y, Hu J, Shi Z, Chen W, Zhou J, Zhang B, Yong F, Khanal L, Jiang X, Chen Z. Integrative systematics and evolutionary history of Berylmys bowersi (Mammalia, Rodentia, Muridae). Ecol Evol 2023; 13:e10234. [PMID: 37408634 PMCID: PMC10318578 DOI: 10.1002/ece3.10234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 06/05/2023] [Accepted: 06/09/2023] [Indexed: 07/07/2023] Open
Abstract
The Bower's Berylmys (Berylmys bowersi) is one of the largest rodent species with a wide distribution range in southern China and the Indochinese Peninsula. The taxonomy and evolutionary history of the B. bowersi is still controversial and confusing. In this study, we used two mitochondrial (Cyt b and COI) and three nuclear (GHR, IRBP, and RAG1) genes to estimate the phylogeny, divergence times, and biogeographic history of B. bowersi. We also explored morphological variations among the specimens collected across China. Our phylogenetic analyses indicated that the traditional B. bowersi contains at least two species: B. bowersi and B. latouchei. Berylmys latouchei was considered a junior synonym of B. bowersi distributed in eastern China, which is confirmed to be distinguishable at specific level because of its larger size, relatively larger and whiter hind feet, and several cranial traits. The estimated split of B. bowersi and B. latouchei was at the early Pleistocene (ca. 2.00 Mya), which might be the outcome of the combined effects of climate change in the early Pleistocene and isolation by the Minjiang River. Our results highlight the Wuyi Mountains in northern Fujian, China, as a glacial refugia during the Pleistocene and call for more intensive surveys and systematic revisions of small mammals in eastern China.
Collapse
Affiliation(s)
- Yifan Xu
- Collaborative Innovation Center of Recovery and Reconstruction of Degraded Ecosystem in Wanjiang Basin Co‐founded by Anhui Province and Ministry of Education, School of Ecology and EnvironmentAnhui Normal UniversityWuhuChina
- State Key Laboratory of Genetic Resources and Evolution & Yunnan Key Laboratory of Biodiversity and Ecological Security of Gaoligong Mountain, Kunming Institute of ZoologyChinese Academy of SciencesKunmingChina
| | - Jiangxiao Hu
- Collaborative Innovation Center of Recovery and Reconstruction of Degraded Ecosystem in Wanjiang Basin Co‐founded by Anhui Province and Ministry of Education, School of Ecology and EnvironmentAnhui Normal UniversityWuhuChina
| | - Zifan Shi
- Collaborative Innovation Center of Recovery and Reconstruction of Degraded Ecosystem in Wanjiang Basin Co‐founded by Anhui Province and Ministry of Education, School of Ecology and EnvironmentAnhui Normal UniversityWuhuChina
| | - Wenwen Chen
- School of Resources and Environmental EngineeringAnhui UniversityHefeiChina
| | - Jiajun Zhou
- Zhejiang Forest Resources Monitoring CenterHangzhouChina
| | - Baowei Zhang
- School of Life SciencesAnhui UniversityHefeiChina
| | - Fan Yong
- Nanjing Institute of Environmental SciencesMinistry of Ecology and EnvironmentNanjingChina
| | - Laxman Khanal
- Central Department of Zoology, Institute of Science and TechnologyTribhuvan UniversityKathmanduNepal
| | - Xuelong Jiang
- State Key Laboratory of Genetic Resources and Evolution & Yunnan Key Laboratory of Biodiversity and Ecological Security of Gaoligong Mountain, Kunming Institute of ZoologyChinese Academy of SciencesKunmingChina
| | - Zhongzheng Chen
- Collaborative Innovation Center of Recovery and Reconstruction of Degraded Ecosystem in Wanjiang Basin Co‐founded by Anhui Province and Ministry of Education, School of Ecology and EnvironmentAnhui Normal UniversityWuhuChina
- State Key Laboratory of Genetic Resources and Evolution & Yunnan Key Laboratory of Biodiversity and Ecological Security of Gaoligong Mountain, Kunming Institute of ZoologyChinese Academy of SciencesKunmingChina
| |
Collapse
|
6
|
Ji J, Jackson DJ, Leaché AD, Yang Z. Power of Bayesian and Heuristic Tests to Detect Cross-Species Introgression with Reference to Gene Flow in the Tamias quadrivittatus Group of North American Chipmunks. Syst Biol 2023; 72:446-465. [PMID: 36504374 PMCID: PMC10275556 DOI: 10.1093/sysbio/syac077] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 11/15/2022] [Accepted: 12/01/2022] [Indexed: 10/25/2023] Open
Abstract
In the past two decades, genomic data have been widely used to detect historical gene flow between species in a variety of plants and animals. The Tamias quadrivittatus group of North America chipmunks, which originated through a series of rapid speciation events, are known to undergo massive amounts of mitochondrial introgression. Yet in a recent analysis of targeted nuclear loci from the group, no evidence for cross-species introgression was detected, indicating widespread cytonuclear discordance. The study used the heuristic method HYDE to detect gene flow, which may suffer from low power. Here we use the Bayesian method implemented in the program BPP to re-analyze these data. We develop a Bayesian test of introgression, calculating the Bayes factor via the Savage-Dickey density ratio using the Markov chain Monte Carlo (MCMC) sample under the model of introgression. We take a stepwise approach to constructing an introgression model by adding introgression events onto a well-supported binary species tree. The analysis detected robust evidence for multiple ancient introgression events affecting the nuclear genome, with introgression probabilities reaching 63%. We estimate population parameters and highlight the fact that species divergence times may be seriously underestimated if ancient cross-species gene flow is ignored in the analysis. We examine the assumptions and performance of HYDE and demonstrate that it lacks power if gene flow occurs between sister lineages or if the mode of gene flow does not match the assumed hybrid-speciation model with symmetrical population sizes. Our analyses highlight the power of likelihood-based inference of cross-species gene flow using genomic sequence data. [Bayesian test; BPP; chipmunks; introgression; MSci; multispecies coalescent; Savage-Dickey density ratio.].
Collapse
Affiliation(s)
- Jiayi Ji
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Donavan J Jackson
- Department of Biology and Burke Museum of Natural History and Culture, University of Washington, Box 351800, Seattle, WA 98195-1800, USA
| | - Adam D Leaché
- Department of Biology and Burke Museum of Natural History and Culture, University of Washington, Box 351800, Seattle, WA 98195-1800, USA
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| |
Collapse
|
7
|
Rivas-González I, Rousselle M, Li F, Zhou L, Dutheil JY, Munch K, Shao Y, Wu D, Schierup MH, Zhang G. Pervasive incomplete lineage sorting illuminates speciation and selection in primates. Science 2023; 380:eabn4409. [PMID: 37262154 DOI: 10.1126/science.abn4409] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Accepted: 01/19/2023] [Indexed: 06/03/2023]
Abstract
Incomplete lineage sorting (ILS) causes the phylogeny of some parts of the genome to differ from the species tree. In this work, we investigate the frequencies and determinants of ILS in 29 major ancestral nodes across the entire primate phylogeny. We find up to 64% of the genome affected by ILS at individual nodes. We exploit ILS to reconstruct speciation times and ancestral population sizes. Estimated speciation times are much more recent than genomic divergence times and are in good agreement with the fossil record. We show extensive variation of ILS along the genome, mainly driven by recombination but also by the distance to genes, highlighting a major impact of selection on variation along the genome. In many nodes, ILS is reduced more on the X chromosome compared with autosomes than expected under neutrality, which suggests higher impacts of natural selection on the X chromosome. Finally, we show an excess of ILS in genes with immune functions and a deficit of ILS in housekeeping genes. The extensive ILS in primates discovered in this study provides insights into the speciation times, ancestral population sizes, and patterns of natural selection that shape primate evolution.
Collapse
Affiliation(s)
- Iker Rivas-González
- Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus C, Denmark
| | | | - Fang Li
- BGI-Research, BGI-Wuhan, Wuhan 430074, China
- Institute of Animal Sex and Development, ZhejiangWanli University, Ningbo 315104, China
- BGI-Research, BGI-Shenzhen, Shenzhen 518083, China
| | - Long Zhou
- Evolutionary & Organismal Biology Research Center, Zhejiang University School of Medicine, Hangzhou 310058, China
- Women's Hospital, School of Medicine, Zhejiang University, Shangcheng District, Hangzhou 310006, China
| | - Julien Y Dutheil
- Max Planck Institute for Evolutionary Biology, Plön, Germany
- Institute of Evolution Sciences of Montpellier (ISEM), CNRS, University of Montpellier, IRD, EPHE, 34095 Montpellier, France
| | - Kasper Munch
- Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus C, Denmark
| | - Yong Shao
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
| | - Dongdong Wu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
- National Resource Center for Non-Human Primates, Kunming Primate Research Center, and National Research Facility for Phenotypic and Genetic Analysis of Model Animals (Primate Facility), Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650107, China
- Kunming Natural History Museum of Zoology, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
| | - Mikkel H Schierup
- Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus C, Denmark
| | - Guojie Zhang
- Evolutionary & Organismal Biology Research Center, Zhejiang University School of Medicine, Hangzhou 310058, China
- Women's Hospital, School of Medicine, Zhejiang University, Shangcheng District, Hangzhou 310006, China
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou 311121, China
- Villum Centre for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, DK-2100 Copenhagen, Denmark
| |
Collapse
|
8
|
Huang J, Thawornwattana Y, Flouri T, Mallet J, Yang Z. Inference of Gene Flow between Species under Misspecified Models. Mol Biol Evol 2022; 39:6783212. [PMID: 36317198 PMCID: PMC9729068 DOI: 10.1093/molbev/msac237] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
Genomic sequence data provide a rich source of information about the history of species divergence and interspecific hybridization or introgression. Despite recent advances in genomics and statistical methods, it remains challenging to infer gene flow, and as a result, one may have to estimate introgression rates and times under misspecified models. Here we use mathematical analysis and computer simulation to examine estimation bias and issues of interpretation when the model of gene flow is misspecified in analysis of genomic datasets, for example, if introgression is assigned to the wrong lineages. In the case of two species, we establish a correspondence between the migration rate in the continuous migration model and the introgression probability in the introgression model. When gene flow occurs continuously through time but in the analysis is assumed to occur at a fixed time point, common evolutionary parameters such as species divergence times are surprisingly well estimated. However, the time of introgression tends to be estimated towards the recent end of the period of continuous gene flow. When introgression events are assigned incorrectly to the parental or daughter lineages, introgression times tend to collapse onto species divergence times, with introgression probabilities underestimated. Overall, our analyses suggest that the simple introgression model is useful for extracting information concerning between-specific gene flow and divergence even when the model may be misspecified. However, for reliable inference of gene flow it is important to include multiple samples per species, in particular, from hybridizing species.
Collapse
Affiliation(s)
| | | | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, United Kingdom
| | - James Mallet
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138
| | | |
Collapse
|
9
|
Multi-locus phylogeny and species delimitations of the striped-back shrew group (Eulipotyphla: Soricidae): Implications for cryptic diversity, taxonomy and multiple speciation patterns. Mol Phylogenet Evol 2022; 177:107619. [PMID: 36007821 DOI: 10.1016/j.ympev.2022.107619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 08/01/2022] [Accepted: 08/17/2022] [Indexed: 11/23/2022]
Abstract
The striped-back shrew group demonstrates remarkable variation in skull and body size, tail length, and brightness of the dorsal stripe; and karyotypic and DNA variation has been reported in recent years. In this study, we investigated the phylogenetic structure of the group, as well as speciation patterns and demographic history in Mountains of Southwestern China and adjacent mountains, including the southern Himalayas, Mts. Bashan, Wushan, and Qinling. We sequenced a total of 462 specimens from 126 localities in the known range of the group, which were sequenced and analyzed based on 6.2 kb of sequence data from two mitochondrial, six nuclear, and two Y chromosome markers. Phylogenetic analyses of the concatenated mtDNA data revealed 14 sympatric and independently evolving lineages within the striped-back shrew group, including Sorex bedfordiae, S. cylindricauda, S. excelsus, S. sinalis and several cryptic species. All concatenated data (ten genes) showed a consistent genetic structure compared to the mtDNA lineages for the group, whereas the nuclear and the Y chromosome data showed a discordant genetic structure compared to the mtDNA lineages for the striped-back shrew group. Species delimitation analyses and deep genetic distance clearly support the species status of the 14 evolving lineages. The divergence time estimation suggested that the striped-back shrew group began to diversify from the middle Pleistocene (2.34 Ma), then flourished at approximately 2.14 Ma, followed by a series of rapid diversifications through the Pleistocene. Our results also revealed multiple mechanisms of speciation in the Mountains of Southwestern China and Adjacent Mountains with complex landscapes and climate. The uplifting of the Qinghai-Tibetan Plateau, Quaternary climate oscillations, riverine barriers, ecological elevation gradients, topographical diversity, and their own low dispersal capacity may have driven the speciation, genetic structure, and phylogeographic patterns of the striped-back shrew group.
Collapse
|
10
|
Flouri T, Huang J, Jiao X, Kapli P, Rannala B, Yang Z. Bayesian phylogenetic inference using relaxed-clocks and the multispecies coalescent. Mol Biol Evol 2022; 39:6652437. [PMID: 35907248 PMCID: PMC9366188 DOI: 10.1093/molbev/msac161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
The multispecies coalescent (MSC) model accommodates both species divergences and within-species coalescent and provides a natural framework for phylogenetic analysis of genomic data when the gene trees vary across the genome. The MSC model implemented in the program bpp assumes a molecular clock and the Jukes–Cantor model, and is suitable for analyzing genomic data from closely related species. Here we extend our implementation to more general substitution models and relaxed clocks to allow the rate to vary among species. The MSC-with-relaxed-clock model allows the estimation of species divergence times and ancestral population sizes using genomic sequences sampled from contemporary species when the strict clock assumption is violated, and provides a simulation framework for evaluating species tree estimation methods. We conducted simulations and analyzed two real datasets to evaluate the utility of the new models. We confirm that the clock-JC model is adequate for inference of shallow trees with closely related species, but it is important to account for clock violation for distant species. Our simulation suggests that there is valuable phylogenetic information in the gene-tree branch lengths even if the molecular clock assumption is seriously violated, and the relaxed-clock models implemented in bpp are able to extract such information. Our Markov chain Monte Carlo algorithms suffer from mixing problems when used for species tree estimation under the relaxed clock and we discuss possible improvements. We conclude that the new models are currently most effective for estimating population parameters such as species divergence times when the species tree is fixed.
Collapse
Affiliation(s)
- Tomáš Flouri
- Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Jun Huang
- Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK.,School of Biomedical Engineering, Capital Medical University, Beijing, 100069, China
| | - Xiyun Jiao
- Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK.,Department of Statistics and Data Science, China Southern University of Science and Technology, Shenzhen, Guangdong 518055, China
| | - Paschalia Kapli
- Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Bruce Rannala
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
| | - Ziheng Yang
- Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
11
|
Vujovic F, Hunter N, Farahani RM. Cellular self-organization: An overdrive in Cambrian diversity? Bioessays 2022; 44:e2200033. [PMID: 35900058 DOI: 10.1002/bies.202200033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 07/09/2022] [Accepted: 07/11/2022] [Indexed: 11/10/2022]
Abstract
During the early Cambrian period metazoan life forms diverged at an accelerated rate to occupy multiple ecological niches on earth. A variety of explanations have been proposed to address this major evolutionary phenomenon termed the "Cambrian explosion." While most hypotheses address environmental, developmental, and ecological factors that facilitated evolutionary innovations, the biological basis for accelerated emergence of species diversity in the Cambrian period remains largely conjectural. Herein, we posit that morphogenesis by self-organization enables the uncoupling of genomic mutational landscape from phenotypic diversification. Evidence is provided for a two-tiered interpretation of genomic changes in metazoan animals wherein mutations not only impact upon function of individual cells, but also alter the self-organization outcome during developmental morphogenesis. We provide evidence that the morphological impacts of mutations on self-organization could remain repressed if associated with an unmet negative energetic cost. We posit that accelerated morphological diversification in transition to the Cambrian period has occurred by emergence of dormant (i.e., reserved) morphological novelties whose molecular underpinnings were seeded in the Precambrian period.
Collapse
Affiliation(s)
- Filip Vujovic
- IDR/Westmead Institute for Medical Research, Sydney, New South Wales, Australia.,School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia
| | - Neil Hunter
- IDR/Westmead Institute for Medical Research, Sydney, New South Wales, Australia
| | - Ramin M Farahani
- IDR/Westmead Institute for Medical Research, Sydney, New South Wales, Australia.,School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia
| |
Collapse
|
12
|
Zhu T, Flouri T, Yang Z. A simulation study to examine the impact of recombination on phylogenomic inferences under the multispecies coalescent model. Mol Ecol 2022; 31:2814-2829. [PMID: 35313033 PMCID: PMC9321900 DOI: 10.1111/mec.16433] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 01/25/2022] [Accepted: 02/28/2022] [Indexed: 11/28/2022]
Affiliation(s)
- Tianqi Zhu
- Institute of Applied Mathematics Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing 100190 China
- Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences Beijing 100190 China
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment University College London London WC1E 6BT UK
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment University College London London WC1E 6BT UK
| |
Collapse
|
13
|
Jiao X, Flouri T, Yang Z. Multispecies coalescent and its applications to infer species phylogenies and cross-species gene flow. Natl Sci Rev 2022; 8:nwab127. [PMID: 34987842 PMCID: PMC8692950 DOI: 10.1093/nsr/nwab127] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 07/10/2021] [Accepted: 07/11/2021] [Indexed: 02/06/2023] Open
Abstract
Multispecies coalescent (MSC) is the extension of the single-population coalescent model to multiple species. It integrates the phylogenetic process of species divergences and the population genetic process of coalescent, and provides a powerful framework for a number of inference problems using genomic sequence data from multiple species, including estimation of species divergence times and population sizes, estimation of species trees accommodating discordant gene trees, inference of cross-species gene flow and species delimitation. In this review, we introduce the major features of the MSC model, discuss full-likelihood and heuristic methods of species tree estimation and summarize recent methodological advances in inference of cross-species gene flow. We discuss the statistical and computational challenges in the field and research directions where breakthroughs may be likely in the next few years.
Collapse
Affiliation(s)
- Xiyun Jiao
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| |
Collapse
|
14
|
Soni V, Eyre-Walker A. OUP accepted manuscript. Genome Biol Evol 2022; 14:6528851. [PMID: 35166775 PMCID: PMC8882387 DOI: 10.1093/gbe/evac028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/09/2022] [Indexed: 12/05/2022] Open
Abstract
The rate of amino acid substitution has been shown to be correlated to a number of factors including the rate of recombination, the age of the gene, the length of the protein, mean expression level, and gene function. However, the extent to which these correlations are due to adaptive and nonadaptive evolution has not been studied in detail, at least not in hominids. We find that the rate of adaptive evolution is significantly positively correlated to the rate of recombination, protein length and gene expression level, and negatively correlated to gene age. These correlations remain significant when each factor is controlled for in turn, except when controlling for expression in an analysis of protein length; and they also generally remain significant when biased gene conversion is taken into account. However, the positive correlations could be an artifact of population size contraction. We also find that the rate of nonadaptive evolution is negatively correlated to each factor, and all these correlations survive controlling for each other and biased gene conversion. Finally, we examine the effect of gene function on rates of adaptive and nonadaptive evolution; we confirm that virus-interacting proteins (VIPs) have higher rates of adaptive and lower rates of nonadaptive evolution, but we also demonstrate that there is significant variation in the rate of adaptive and nonadaptive evolution between GO categories when removing VIPs. We estimate that the VIP/non-VIP axis explains about 5–8 fold more of the variance in evolutionary rate than GO categories.
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Adam Eyre-Walker
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
- Corresponding author: E-mail:
| |
Collapse
|
15
|
Abstract
It is known that methods to estimate the rate of adaptive evolution, which are based on the McDonald–Kreitman test, can be biased by changes in effective population size. Here, we demonstrate theoretically that changes in population size can also generate an artifactual correlation between the rate of adaptive evolution and any factor that is correlated to the strength of selection acting against deleterious mutations. In this context, we have investigated whether several site-level factors influence the rate of adaptive evolution in the divergence of humans and chimpanzees, two species that have been inferred to have undergone population size contraction since they diverged. We find that the rate of adaptive evolution, relative to the rate of mutation, is higher for more exposed amino acids, lower for amino acid pairs that are more dissimilar in terms of their polarity, volume, and lower for amino acid pairs that are subject to stronger purifying selection, as measured by the ratio of the numbers of nonsynonymous to synonymous polymorphisms (pN/pS). All of these correlations are opposite to the artifactual correlations expected under contracting population size. We therefore conclude that these correlations are genuine.
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Ana Filipa Moutinho
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
- Department for Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plon, Germany
| | - Adam Eyre-Walker
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
- Corresponding author: E-mail:
| |
Collapse
|
16
|
Shauli T, Brandes N, Linial M. Evolutionary and functional lessons from human-specific amino acid substitution matrices. NAR Genom Bioinform 2021; 3:lqab079. [PMID: 34541526 PMCID: PMC8445205 DOI: 10.1093/nargab/lqab079] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 08/02/2021] [Accepted: 09/14/2021] [Indexed: 12/26/2022] Open
Abstract
Human genetic variation in coding regions is fundamental to the study of protein structure and function. Most methods for interpreting missense variants consider substitution measures derived from homologous proteins across different species. In this study, we introduce human-specific amino acid (AA) substitution matrices that are based on genetic variations in the modern human population. We analyzed the frequencies of >4.8M single nucleotide variants (SNVs) at codon and AA resolution and compiled human-centric substitution matrices that are fundamentally different from classic cross-species matrices (e.g. BLOSUM, PAM). Our matrices are asymmetric, with some AA replacements showing significant directional preference. Moreover, these AA matrices are only partly predicted by nucleotide substitution rates. We further test the utility of our matrices in exposing functional signals of experimentally-validated protein annotations. A significant reduction in AA transition frequencies was observed across nine post-translational modification (PTM) types and four ion-binding sites. Our results propose a purifying selection signal in the human proteome across a diverse set of functional protein annotations and provide an empirical baseline for interpreting human genetic variation in coding regions.
Collapse
Affiliation(s)
- Tair Shauli
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, 91904, Israel
| | - Nadav Brandes
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, 91904, Israel
| | - Michal Linial
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, 91904, Israel
| |
Collapse
|
17
|
Campbell CR, Tiley GP, Poelstra JW, Hunnicutt KE, Larsen PA, Lee HJ, Thorne JL, Dos Reis M, Yoder AD. Pedigree-based and phylogenetic methods support surprising patterns of mutation rate and spectrum in the gray mouse lemur. Heredity (Edinb) 2021; 127:233-244. [PMID: 34272504 PMCID: PMC8322134 DOI: 10.1038/s41437-021-00446-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Revised: 05/25/2021] [Accepted: 05/26/2021] [Indexed: 02/06/2023] Open
Abstract
Mutations are the raw material on which evolution acts, and knowledge of their frequency and genomic distribution is crucial for understanding how evolution operates at both long and short timescales. At present, the rate and spectrum of de novo mutations have been directly characterized in relatively few lineages. Our study provides the first direct mutation-rate estimate for a strepsirrhine (i.e., the lemurs and lorises), which comprises nearly half of the primate clade. Using high-coverage linked-read sequencing for a focal quartet of gray mouse lemurs (Microcebus murinus), we estimated the mutation rate to be among the highest calculated for a mammal at 1.52 × 10-8 (95% credible interval: 1.28 × 10-8-1.78 × 10-8) mutations/site/generation. Further, we found an unexpectedly low count of paternal mutations, and only a modest overrepresentation of mutations at CpG sites. Despite the surprising nature of these results, we found both the rate and spectrum to be robust to the manipulation of a wide range of computational filtering criteria. We also sequenced a technical replicate to estimate a false-negative and false-positive rate for our data and show that any point estimate of a de novo mutation rate should be considered with a large degree of uncertainty. For validation, we conducted an independent analysis of context-dependent substitution types for gray mouse lemur and five additional primate species for which de novo mutation rates have also been estimated. These comparisons revealed general consistency of the mutation spectrum between the pedigree-based and the substitution-rate analyses for all species compared.
Collapse
Affiliation(s)
- C Ryan Campbell
- Department of Biology, Duke University, Durham, NC, USA
- Department of Evolutionary Anthropology, Duke University, Durham, NC, USA
| | | | | | - Kelsie E Hunnicutt
- Department of Biology, Duke University, Durham, NC, USA
- Department of Biological Sciences, University of Denver, Denver, CO, USA
| | - Peter A Larsen
- Department of Biology, Duke University, Durham, NC, USA
- Department of Veterinary and Biomedical Sciences, University of Minnesota, St. Paul, MN, USA
| | - Hui-Jie Lee
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Jeffrey L Thorne
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA
| | - Mario Dos Reis
- School of Biological and Chemical Sciences, Queen Mary University of London, London, UK
| | - Anne D Yoder
- Department of Biology, Duke University, Durham, NC, USA.
| |
Collapse
|
18
|
Vázquez-Miranda H, Barker FK. Autosomal, sex-linked and mitochondrial loci resolve evolutionary relationships among wrens in the genus Campylorhynchus. Mol Phylogenet Evol 2021; 163:107242. [PMID: 34224849 DOI: 10.1016/j.ympev.2021.107242] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Revised: 06/14/2021] [Accepted: 06/29/2021] [Indexed: 01/18/2023]
Abstract
Although there is general consensus that sampling of multiple genetic loci is critical in accurate reconstruction of species trees, the exact numbers and the best types of molecular markers remain an open question. In particular, the phylogenetic utility of sex-linked loci is underexplored. Here, we sample all species and 70% of the named diversity of the New World wren genus Campylorhynchus using sequences from 23 loci, to evaluate the effects of linkage on efficiency in recovering a well-supported tree for the group. At a tree-wide level, we found that most loci supported fewer than half the possible clades and that sex-linked loci produced similar resolution to slower-coalescing autosomal markers, controlling for locus length. By contrast, we did find evidence that linkage affected the efficiency of recovery of individual relationships; as few as two sex-linked loci were necessary to resolve a selection of clades with long to medium subtending branches, whereas 4-6 autosomal loci were necessary to achieve comparable results. These results support an expanded role for sampling of the avian Z chromosome in phylogenetic studies, including target enrichment approaches. Our concatenated and species tree analyses represent significant improvements in our understanding of diversification in Campylorhynchus, and suggest a relatively complex scenario for its radiation across the Miocene/Pliocene boundary, with multiple invasions of South America.
Collapse
Affiliation(s)
- Hernán Vázquez-Miranda
- Departamento de Zoología, Instituto de Biología, Universidad Nacional Autónoma de México, Ciudad de México C.P. 04510, Mexico
| | - F Keith Barker
- Department of Ecology, Evolution and Behavior, Bell Museum of Natural History, University of Minnesota, 40 Gortner Laboratory, 1479 Gortner Avenue, Saint Paul, MN 55108, USA
| |
Collapse
|
19
|
Huang J, Bennett J, Flouri T, Leaché AD, Yang Z. Phase Resolution of Heterozygous Sites in Diploid Genomes is Important to Phylogenomic Analysis under the Multispecies Coalescent Model. Syst Biol 2021; 71:334-352. [PMID: 34143216 PMCID: PMC8977997 DOI: 10.1093/sysbio/syab047] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Revised: 06/03/2021] [Accepted: 06/21/2021] [Indexed: 01/01/2023] Open
Abstract
Genome sequencing projects routinely generate haploid consensus sequences from diploid
genomes, which are effectively chimeric sequences with the phase at heterozygous sites
resolved at random. The impact of phasing errors on phylogenomic analyses under the
multispecies coalescent (MSC) model is largely unknown. Here, we conduct a computer
simulation to evaluate the performance of four phase-resolution strategies (the true phase
resolution, the diploid analytical integration algorithm which averages over all phase
resolutions, computational phase resolution using the program PHASE, and random
resolution) on estimation of the species tree and evolutionary parameters in analysis of
multilocus genomic data under the MSC model. We found that species tree estimation is
robust to phasing errors when species divergences were much older than average coalescent
times but may be affected by phasing errors when the species tree is shallow. Estimation
of parameters under the MSC model with and without introgression is affected by phasing
errors. In particular, random phase resolution causes serious overestimation of population
sizes for modern species and biased estimation of cross-species introgression probability.
In general, the impact of phasing errors is greater when the mutation rate is higher, the
data include more samples per species, and the species tree is shallower with recent
divergences. Use of phased sequences inferred by the PHASE program produced small biases
in parameter estimates. We analyze two real data sets, one of East Asian brown frogs and
another of Rocky Mountains chipmunks, to demonstrate that heterozygote phase-resolution
strategies have similar impacts on practical data analyses. We suggest that genome
sequencing projects should produce unphased diploid genotype sequences if fully phased
data are too challenging to generate, and avoid haploid consensus sequences, which have
heterozygous sites phased at random. In case the analytical integration algorithm is
computationally unfeasible, computational phasing prior to population genomic analyses is
an acceptable alternative. [BPP; introgression; multispecies coalescent; phase; species
tree.]
Collapse
Affiliation(s)
- Jun Huang
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK.,Department of Mathematics, Beijing Jiaotong University, Beijing, 100044, P.R. China
| | - Jeremy Bennett
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK.,Department of Ecology and Evolutionary Biology, University of Connecticut, 75 N. Eagleville Road, Unit 3043, Storrs, CT 06269-3043, USA
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Adam D Leaché
- Department of Biology & Burke Museum of Natural History and Culture, University of Washington, Seattle, WA 98195-1800, USA
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
20
|
Fontanella FM, Miles E, Strott P. Integrated analysis of the ringneck snake Diadophis punctatus complex (Colubridae: Dipsadidae) in a biodiversity hotspot provides the foundation for conservation reassessment. Biol J Linn Soc Lond 2021. [DOI: 10.1093/biolinnean/blab028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Abstract
Species classification may not reflect the underlying/cryptic genetic diversity and focusing on groups that do not represent historically independent units can misdirect conservation efforts. The identification of evolutionarily significant units (ESUs) allows cryptic genetic diversity to be accounted for when designating conservation priorities. We used multi-locus coalescent-based species delimitation methods and multivariate analyses of morphological data to examine whether the subspecies merit conservation recognition and infer the ESUs in ringneck snakes (Diadophis punctatus) throughout the California Floristic Province. Species delimitation methods failed to recover groups consistent with designated subspecies and instead inferred three well supported, mostly geographically isolated lineages. Divergence time estimates suggest that the divergences were driven by historical isolation associated with Pleistocene climate shifts. We found a correlation between increased morphological differentiation and time since divergence, and greater niche similarity between the more recently diverged eastern California and western California groups. Based on these results, we propose that the morphological similarities are due to a combination of morphological conservatism and evolutionary stasis. Our study provides the foundation necessary to re-assess the biodiversity and conservation status of ringneck snakes and offers an important step in unveiling the diversity within the western portion of the genus’ range.
Collapse
Affiliation(s)
- Frank M Fontanella
- Department of Biology, University of West Georgia, Carrollton, GA 30118,USA
| | - Emily Miles
- Department of Biology, University of West Georgia, Carrollton, GA 30118,USA
| | - Polly Strott
- Department of Biology, University of West Georgia, Carrollton, GA 30118,USA
| |
Collapse
|
21
|
Flouri T, Jiao X, Rannala B, Yang Z. A Bayesian Implementation of the Multispecies Coalescent Model with Introgression for Phylogenomic Analysis. Mol Biol Evol 2021; 37:1211-1223. [PMID: 31825513 PMCID: PMC7086182 DOI: 10.1093/molbev/msz296] [Citation(s) in RCA: 58] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Recent analyses suggest that cross-species gene flow or introgression is common in nature, especially during species divergences. Genomic sequence data can be used to infer introgression events and to estimate the timing and intensity of introgression, providing an important means to advance our understanding of the role of gene flow in speciation. Here, we implement the multispecies-coalescent-with-introgression model, an extension of the multispecies-coalescent model to incorporate introgression, in our Bayesian Markov chain Monte Carlo program Bpp. The multispecies-coalescent-with-introgression model accommodates deep coalescence (or incomplete lineage sorting) and introgression and provides a natural framework for inference using genomic sequence data. Computer simulation confirms the good statistical properties of the method, although hundreds or thousands of loci are typically needed to estimate introgression probabilities reliably. Reanalysis of data sets from the purple cone spruce confirms the hypothesis of homoploid hybrid speciation. We estimated the introgression probability using the genomic sequence data from six mosquito species in the Anopheles gambiae species complex, which varies considerably across the genome, likely driven by differential selection against introgressed alleles.
Collapse
Affiliation(s)
- Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Xiyun Jiao
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Bruce Rannala
- Department of Evolution and Ecology, University of California, Davis, Davis, CA
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| |
Collapse
|
22
|
Zhu T, Yang Z. Complexity of the simplest species tree problem. Mol Biol Evol 2021; 38:3993-4009. [PMID: 33492385 PMCID: PMC8382899 DOI: 10.1093/molbev/msab009] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Revised: 01/04/2021] [Accepted: 01/13/2021] [Indexed: 02/06/2023] Open
Abstract
The multispecies coalescent model provides a natural framework for species tree estimation accounting for gene-tree conflicts. Although a number of species tree methods under the multispecies coalescent have been suggested and evaluated using simulation, their statistical properties remain poorly understood. Here, we use mathematical analysis aided by computer simulation to examine the identifiability, consistency, and efficiency of different species tree methods in the case of three species and three sequences under the molecular clock. We consider four major species-tree methods including concatenation, two-step, independent-sites maximum likelihood, and maximum likelihood. We develop approximations that predict that the probit transform of the species tree estimation error decreases linearly with the square root of the number of loci. Even in this simplest case, major differences exist among the methods. Full-likelihood methods are considerably more efficient than summary methods such as concatenation and two-step. They also provide estimates of important parameters such as species divergence times and ancestral population sizes,whereas these parameters are not identifiable by summary methods. Our results highlight the need to improve the statistical efficiency of summary methods and the computational efficiency of full likelihood methods of species tree estimation.
Collapse
Affiliation(s)
- Tianqi Zhu
- Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Ziheng Yang
- Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,Department of Genetics, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
23
|
Koch H, DeGiorgio M. Maximum Likelihood Estimation of Species Trees from Gene Trees in the Presence of Ancestral Population Structure. Genome Biol Evol 2020; 12:3977-3995. [PMID: 32022857 PMCID: PMC7061232 DOI: 10.1093/gbe/evaa022] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/23/2020] [Indexed: 11/12/2022] Open
Abstract
Though large multilocus genomic data sets have led to overall improvements in phylogenetic inference, they have posed the new challenge of addressing conflicting signals across the genome. In particular, ancestral population structure, which has been uncovered in a number of diverse species, can skew gene tree frequencies, thereby hindering the performance of species tree estimators. Here we develop a novel maximum likelihood method, termed TASTI (Taxa with Ancestral structure Species Tree Inference), that can infer phylogenies under such scenarios, and find that it has increasing accuracy with increasing numbers of input gene trees, contrasting with the relatively poor performances of methods not tailored for ancestral structure. Moreover, we propose a supertree approach that allows TASTI to scale computationally with increasing numbers of input taxa. We use genetic simulations to assess TASTI's performance in the three- and four-taxon settings and demonstrate the application of TASTI on a six-species Afrotropical mosquito data set. Finally, we have implemented TASTI in an open-source software package for ease of use by the scientific community.
Collapse
Affiliation(s)
- Hillary Koch
- Department of Statistics, Pennsylvania State University
| | - Michael DeGiorgio
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University
| |
Collapse
|
24
|
Zhen Y, Huber CD, Davies RW, Lohmueller KE. Greater strength of selection and higher proportion of beneficial amino acid changing mutations in humans compared with mice and Drosophila melanogaster. Genome Res 2020; 31:110-120. [PMID: 33208456 PMCID: PMC7849390 DOI: 10.1101/gr.256636.119] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2019] [Accepted: 11/10/2020] [Indexed: 12/19/2022]
Abstract
Quantifying and comparing the amount of adaptive evolution among different species is key to understanding how evolution works. Previous studies have shown differences in adaptive evolution across species; however, their specific causes remain elusive. Here, we use improved modeling of weakly deleterious mutations and the demographic history of the outgroup species and ancestral population and estimate that at least 20% of nonsynonymous substitutions between humans and an outgroup species were fixed by positive selection. This estimate is much higher than previous estimates, which did not correct for the sizes of the outgroup species and ancestral population. Next, we jointly estimate the proportion and selection coefficient (p+ and s+, respectively) of newly arising beneficial nonsynonymous mutations in humans, mice, and Drosophila melanogaster by examining patterns of polymorphism and divergence. We develop a novel composite likelihood framework to test whether these parameters differ across species. Overall, we reject a model with the same p+ and s+ of beneficial mutations across species and estimate that humans have a higher p+s+ compared with that of D. melanogaster and mice. We show that this result cannot be caused by biased gene conversion or hypermutable CpG sites. We discuss possible biological explanations that could generate the observed differences in the amount of adaptive evolution across species.
Collapse
Affiliation(s)
- Ying Zhen
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095, USA.,Zhejiang Provincial Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, Zhejiang, 310024, China.,Institute of Biology, Westlake Institute for Advanced Study, Hangzhou, Zhejiang, 310024, China
| | - Christian D Huber
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095, USA.,School of Biological Sciences, The University of Adelaide, Adelaide, South Australia 5005, Australia
| | - Robert W Davies
- Program in Genetics and Genome Biology and The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, Ontario, M5G 0A4, Canada.,Department of Statistics, University of Oxford, Oxford, OX1 3LB, United Kingdom
| | - Kirk E Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095, USA.,Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California 90095, USA
| |
Collapse
|
25
|
Hánová A, Konečný A, Nicolas V, Denys C, Granjon L, Lavrenchenko LA, Šumbera R, Mikula O, Bryja J. Multilocus phylogeny of African striped grass mice (Lemniscomys): Stripe pattern only partly reflects evolutionary relationships. Mol Phylogenet Evol 2020; 155:107007. [PMID: 33160039 DOI: 10.1016/j.ympev.2020.107007] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Revised: 10/20/2020] [Accepted: 10/29/2020] [Indexed: 12/18/2022]
Abstract
Murine rodents are one of the most evolutionary successful groups of extant mammals. They are also important for human as vectors and reservoirs of zoonoses and agricultural pests. Unfortunately, their fast and relatively recent diversification impedes our understanding of phylogenetic relationships and species limits of many murine taxa, including those with very conspicuous phenotype that has been frequently used for taxonomic purposes. One of such groups are the striped grass mice (genus Lemniscomys), distributed across sub-Saharan Africa in 11 currently recognized species. These are traditionally classified into three morphological groups according to different pelage colouration on the back: (a) L. barbarus group (three species) with several continuous pale longitudinal stripes; (b) L. striatus group (four species) with pale stripes diffused into short lines or dots; and (c) L. griselda group (four species) with a single mid-dorsal black stripe. Here we reconstructed the most comprehensive molecular phylogeny of the genus Lemniscomys to date, using the largest currently available multi-locus genetic dataset of all but two species. The results show four main lineages (=species complexes) with the distribution corresponding to the major biogeographical regions of Africa. Surprisingly, the four phylogenetic lineages are only in partial agreement with the morphological classification, suggesting that the single-stripe and/or multi-striped phenotypes evolved independently in multiple lineages. Divergence dating showed the split of Lemniscomys and Arvicanthis genera at the beginning of Pleistocene; most of subsequent speciation processes within Lemniscomys were affected by Pleistocene climate oscillations, with predominantly allopatric diversification in fragmented savanna biome. We propose taxonomic suggestions and directions for future research of this striking group of African rodents.
Collapse
Affiliation(s)
- Alexandra Hánová
- Institute of Vertebrate Biology of the Czech Academy of Sciences, Květná 8, 603 65 Brno, Czech Republic; Department of Botany and Zoology, Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic.
| | - Adam Konečný
- Department of Botany and Zoology, Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic.
| | - Violaine Nicolas
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum national d'Histoire naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, CP51, 75005 Paris, France.
| | - Christiane Denys
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum national d'Histoire naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, CP51, 75005 Paris, France.
| | - Laurent Granjon
- CBGP, IRD, CIRAD, INRAE, Institut Agro, Univ Montpellier, 755 avenue du Campus Agropolis, CS 30016, 34988 Montferrier-sur-Lez cedex, France.
| | - Leonid A Lavrenchenko
- Severtsov Institute of Ecology and Evolution, Russian Academy of Sciences, Leninskii pr. 33, Moscow 119071, Russia.
| | - Radim Šumbera
- Department of Zoology, Faculty of Science, University of South Bohemia, Branišovská 1760, 370 05 České Budějovice, Czech Republic.
| | - Ondřej Mikula
- Institute of Vertebrate Biology of the Czech Academy of Sciences, Květná 8, 603 65 Brno, Czech Republic.
| | - Josef Bryja
- Institute of Vertebrate Biology of the Czech Academy of Sciences, Květná 8, 603 65 Brno, Czech Republic; Department of Botany and Zoology, Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic.
| |
Collapse
|
26
|
Košuthová A, Bergsten J, Westberg M, Wedin M. Species delimitation in the cyanolichen genus Rostania. BMC Evol Biol 2020; 20:115. [PMID: 32912146 PMCID: PMC7488055 DOI: 10.1186/s12862-020-01681-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Accepted: 08/31/2020] [Indexed: 11/24/2022] Open
Abstract
Background In this study, we investigate species limits in the cyanobacterial lichen genus Rostania (Collemataceae, Peltigerales, Lecanoromycetes). Four molecular markers (mtSSU rDNA, β-tubulin, MCM7, RPB2) were sequenced and analysed with two coalescent-based species delimitation methods: the Generalized Mixed Yule Coalescent model (GMYC) and a Bayesian species delimitation method (BPP) using a multispecies coalescence model (MSC), the latter with or without an a priori defined guide tree. Results Species delimitation analyses indicate the presence of eight strongly supported candidate species. Conclusive correlation between morphological/ecological characters and genetic delimitation could be found for six of these. Of the two additional candidate species, one is represented by a single sterile specimen and the other currently lacks morphological or ecological supporting evidence. Conclusions We conclude that Rostania includes a minimum of six species: R. ceranisca, R. multipunctata, R. occultata 1, R. occultata 2, R. occultata 3, and R. occultata 4,5,6. Three distinct Nostoc morphotypes occur in Rostania, and there is substantial correlation between these morphotypes and Rostania thallus morphology.
Collapse
Affiliation(s)
- Alica Košuthová
- Department of Botany, Swedish Museum of Natural History, P.O. Box 50007, SE-104 05, Stockholm, Sweden.
| | - Johannes Bergsten
- Department of Zoology, Swedish Museum of Natural History, P.O. Box 50007, SE-104 05, Stockholm, Sweden
| | - Martin Westberg
- Museum of Evolution, Uppsala University, Norbyvägen 16, SE-752 36, Uppsala, Sweden
| | - Mats Wedin
- Department of Botany, Swedish Museum of Natural History, P.O. Box 50007, SE-104 05, Stockholm, Sweden
| |
Collapse
|
27
|
Molecular Clocks without Rocks: New Solutions for Old Problems. Trends Genet 2020; 36:845-856. [PMID: 32709458 DOI: 10.1016/j.tig.2020.06.002] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2020] [Revised: 06/02/2020] [Accepted: 06/11/2020] [Indexed: 02/07/2023]
Abstract
Molecular data have been used to date species divergences ever since they were described as documents of evolutionary history in the 1960s. Yet, an inadequate fossil record and discordance between gene trees and species trees are persistently problematic. We examine how, by accommodating gene tree discordance and by scaling branch lengths to absolute time using mutation rate and generation time, multispecies coalescent (MSC) methods can potentially overcome these challenges. We find that time estimates can differ - in some cases, substantially - depending on whether MSC methods or traditional phylogenetic methods that apply concatenation are used, and whether the tree is calibrated with pedigree-based mutation rates or with fossils. We discuss the advantages and shortcomings of both approaches and provide practical guidance for data analysis when using these methods.
Collapse
|
28
|
Huang J, Flouri T, Yang Z. A Simulation Study to Examine the Information Content in Phylogenomic Data Sets under the Multispecies Coalescent Model. Mol Biol Evol 2020; 37:3211-3224. [DOI: 10.1093/molbev/msaa166] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
AbstractWe use computer simulation to examine the information content in multilocus data sets for inference under the multispecies coalescent model. Inference problems considered include estimation of evolutionary parameters (such as species divergence times, population sizes, and cross-species introgression probabilities), species tree estimation, and species delimitation based on Bayesian comparison of delimitation models. We found that the number of loci is the most influential factor for almost all inference problems examined. Although the number of sequences per species does not appear to be important to species tree estimation, it is very influential to species delimitation. Increasing the number of sites and the per-site mutation rate both increase the mutation rate for the whole locus and these have the same effect on estimation of parameters, but the sequence length has a greater effect than the per-site mutation rate for species tree estimation. We discuss the computational costs when the data size increases and provide guidelines concerning the subsampling of genomic data to enable the application of full-likelihood methods of inference.
Collapse
Affiliation(s)
- Jun Huang
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
- Department of Mathematics, Beijing Jiaotong University, Beijing, P.R. China
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| |
Collapse
|
29
|
Jiao X, Yang Z. Defining Species When There is Gene Flow. Syst Biol 2020; 70:108-119. [PMID: 32617579 DOI: 10.1093/sysbio/syaa052] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 06/23/2020] [Accepted: 06/23/2020] [Indexed: 12/20/2022] Open
Abstract
Whatever one's definition of species, it is generally expected that individuals of the same species should be genetically more similar to each other than they are to individuals of another species. Here, we show that in the presence of cross-species gene flow, this expectation may be incorrect. We use the multispecies coalescent model with continuous-time migration or episodic introgression to study the impact of gene flow on genetic differences within and between species and highlight a surprising but plausible scenario in which different population sizes and asymmetrical migration rates cause a genetic sequence to be on average more closely related to a sequence from another species than to a sequence from the same species. Our results highlight the extraordinary impact that even a small amount of gene flow may have on the genetic history of the species. We suggest that contrasting long-term migration rate and short-term hybridization rate, both of which can be estimated using genetic data, may be a powerful approach to detecting the presence of reproductive barriers and to define species boundaries.[Gene flow; introgression; migration; multispecies coalescent; species concept; species delimitation.].
Collapse
Affiliation(s)
- Xiyun Jiao
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
30
|
Liu J, Liu Q, Yang Q. mstree: A Multispecies Coalescent Approach for Estimating Ancestral Population Size and Divergence Time during Speciation with Gene Flow. Genome Biol Evol 2020; 12:715-719. [PMID: 32365209 PMCID: PMC7259675 DOI: 10.1093/gbe/evaa087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/27/2020] [Indexed: 11/28/2022] Open
Abstract
Gene flow between species may cause variations in branch length and topology of gene tree, which are beyond the expected variations from ancestral processes. These additional variations make it difficult to estimate parameters during speciation with gene flow, as the pattern of these additional variations differs with the relationship between isolation and migration. As far as we know, most methods rely on the assumption about the relationship between isolation and migration by a given model, such as the isolation-with-migration model, when estimating parameters during speciation with gene flow. In this article, we develop a multispecies coalescent approach which does not rely on any assumption about the relationship between isolation and migration when estimating parameters and is called mstree. mstree is available at https://github.com/liujunfengtop/MStree/ and uses some mathematical inequalities among several factors, which include the species divergence time, the ancestral population size, and the number of gene trees, to estimate parameters during speciation with gene flow. Using simulations, we show that the estimated values of ancestral population sizes and species divergence times are close to the true values when analyzing the simulation data sets, which are generated based on the isolation-with-initial-migration model, secondary contact model, and isolation-with-migration model. Therefore, our method is able to estimate ancestral population sizes and speciation times in the presence of different modes of gene flow and may be helpful to test different theories of speciation.
Collapse
Affiliation(s)
- Junfeng Liu
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Qiao Liu
- Department of Automation, Tsinghua University, Beijing, China
| | - Qingzhu Yang
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,Department of Automation, Tsinghua University, Beijing, China
| |
Collapse
|
31
|
Smith ML, Carstens BC. Process-based species delimitation leads to identification of more biologically relevant species. Evolution 2019; 74:216-229. [PMID: 31705650 DOI: 10.1111/evo.13878] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Revised: 10/08/2019] [Accepted: 10/12/2019] [Indexed: 12/23/2022]
Abstract
Most approaches to species delimitation to date have considered divergence-only models. Although these models are appropriate for allopatric speciation, their failure to incorporate many of the population-level processes that drive speciation, such as gene flow (e.g., in sympatric speciation), places an unnecessary limit on our collective understanding of the processes that produce biodiversity. To consider these processes while inferring species boundaries, we introduce the R-package delimitR and apply it to identify species boundaries in the reticulate taildropper slug (Prophysaon andersoni). Results suggest that secondary contact is an important mechanism driving speciation in this system. By considering process, we both avoid erroneous inferences that can be made when population-level processes such as secondary contact drive speciation but only divergence is considered, and gain insight into the process of speciation in terrestrial slugs. Further, we apply delimitR to three published empirical datasets and find results corroborating previous findings. Finally, we evaluate the performance of delimitR using simulation studies, and find that error rates are near zero when comparing models that include lineage divergence and gene flow for three populations with a modest number of Single Nucleotide Polymorphisms (SNPs; 1500) and moderate divergence times (<100,000 generations). When we apply delimitR to a complex model set (i.e., including divergence, gene flow, and population size changes), error rates are moderate (∼0.15; 10,000 SNPs), and, when present, misclassifications occur among highly similar models.
Collapse
Affiliation(s)
- Megan L Smith
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, Ohio, 43210
| | - Bryan C Carstens
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, Ohio, 43210
| |
Collapse
|
32
|
Nevado B, Wong ELY, Osborne OG, Filatov DA. Adaptive Evolution Is Common in Rapid Evolutionary Radiations. Curr Biol 2019; 29:3081-3086.e5. [PMID: 31495580 DOI: 10.1016/j.cub.2019.07.059] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Revised: 07/08/2019] [Accepted: 07/19/2019] [Indexed: 10/26/2022]
Abstract
One of the most long-standing and important mysteries in evolutionary biology is why biological diversity is so unevenly distributed across space and taxonomic lineages. Nowhere is this disparity more evident than in the multitude of rapid evolutionary radiations found on oceanic islands and mountain ranges across the globe [1-5]. The evolutionary processes driving these rapid diversification events remain unclear [6-8]. Recent genome-wide studies suggest that natural selection may be frequent during rapid evolutionary radiations, as inferred from work in cichlid fish [9], white-eye birds [10], new world lupins [11], and wild tomatoes [12]. However, whether frequent adaptive evolution is a general feature of rapid evolutionary radiations remains untested. Here we show that adaptive evolution is significantly more frequent in rapid evolutionary radiations compared to background levels in more slowly diversifying lineages. This result is consistent across a wide range of angiosperm lineages analyzed: 12 evolutionary radiations, which together comprise 1,377 described species, originating from some of the most biologically diverse systems on Earth. In addition, we find a significant negative correlation between population size and frequency of adaptive evolution in rapid evolutionary radiations. A possible explanation for this pattern is that more frequent adaptive evolution is at least partly driven by positive selection for advantageous mutations that compensate for the fixation of slightly deleterious mutations in smaller populations.
Collapse
Affiliation(s)
- Bruno Nevado
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK.
| | - Edgar L Y Wong
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK
| | - Owen G Osborne
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK
| | - Dmitry A Filatov
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK
| |
Collapse
|
33
|
Flouri T, Jiao X, Rannala B, Yang Z. Species Tree Inference with BPP Using Genomic Sequences and the Multispecies Coalescent. Mol Biol Evol 2019; 35:2585-2593. [PMID: 30053098 PMCID: PMC6188564 DOI: 10.1093/molbev/msy147] [Citation(s) in RCA: 189] [Impact Index Per Article: 37.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
The multispecies coalescent provides a natural framework for accommodating ancestral genetic polymorphism and coalescent processes that can cause different genomic regions to have different genealogical histories. The Bayesian program BPP includes a full-likelihood implementation of the multispecies coalescent, using transmodel Markov chain Monte Carlo to calculate the posterior probabilities of different species trees. BPP is suitable for analyzing multilocus sequence data sets and it accommodates the heterogeneity of gene trees (both the topology and branch lengths) among loci and gene tree uncertainties due to limited phylogenetic information at each locus. Here, we provide a practical guide to the use of BPP in species tree estimation. BPP is a command-line program that runs on linux, macosx, and windows. This protocol shows how to use both BPP 3.4 (http://abacus.gene.ucl.ac.uk/software/) and BPP 4.0 (https://github.com/bpp/).
Collapse
Affiliation(s)
- Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Xiyun Jiao
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Bruce Rannala
- Department of Ecology and Evolution, University of California, Davis, CA
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| |
Collapse
|
34
|
Thawornwattana Y, Dalquen D, Yang Z. Coalescent Analysis of Phylogenomic Data Confidently Resolves the Species Relationships in the Anopheles gambiae Species Complex. Mol Biol Evol 2019; 35:2512-2527. [PMID: 30102363 PMCID: PMC6188554 DOI: 10.1093/molbev/msy158] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Deep coalescence and introgression make it challenging to infer phylogenetic relationships among closely related species that arose through radiative speciation events. Despite numerous phylogenetic analyses and the availability of whole genomes, the phylogeny in the Anopheles gambiae species complex has not been confidently resolved. Here we extract over 80, 000 coding and noncoding short segments (called loci) from the genomes of six members of the species complex and use a Bayesian method under the multispecies coalescent model to infer the species tree, which takes into account genealogical heterogeneity across the genome and uncertainty in the gene trees. We obtained a robust estimate of the species tree from the distal region of the X chromosome: (A. merus, ((A. melas, (A. arabiensis, A. quadriannulatus)), (A. gambiae, A. coluzzii))), with A. merus to be the earliest branching species. This species tree agrees with the chromosome inversion phylogeny and provides a parsimonious interpretation of inversion and introgression events. Simulation informed by the real data suggest that the coalescent approach is reliable while the sliding-window analysis used in a previous phylogenomic study generates artifactual species trees. Likelihood ratio test of gene flow revealed strong evidence of autosomal introgression from A. arabiensis into A. gambiae (at the average rate of ∼0.2 migrants per generation), but not in the opposite direction, and introgression of the 3 L chromosomal region from A. merus into A. quadriannulatus. Our results highlight the importance of accommodating incomplete lineage sorting and introgression in phylogenomic analyses of species that arose through recent radiative speciation events.
Collapse
Affiliation(s)
- Yuttapong Thawornwattana
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom.,Department of Microbiology, Faculty of Science, Mahidol University, Bangkok, Thailand
| | - Daniel Dalquen
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom.,Radcliffe Institute for Advanced Studies, Harvard University, Cambridge, MA
| |
Collapse
|
35
|
Shi CM, Yang Z. Coalescent-Based Analyses of Genomic Sequence Data Provide a Robust Resolution of Phylogenetic Relationships among Major Groups of Gibbons. Mol Biol Evol 2019; 35:159-179. [PMID: 29087487 PMCID: PMC5850733 DOI: 10.1093/molbev/msx277] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The phylogenetic relationships among extant gibbon species remain unresolved despite numerous efforts using morphological, behavorial, and genetic data and the sequencing of whole genomes. A major challenge in reconstructing the gibbon phylogeny is the radiative speciation process, which resulted in extremely short internal branches in the species phylogeny and extensive incomplete lineage sorting with extensive gene-tree heterogeneity across the genome. Here, we analyze two genomic-scale data sets, with ∼10,000 putative noncoding and exonic loci, respectively, to estimate the species tree for the major groups of gibbons. We used the Bayesian full-likelihood method bpp under the multispecies coalescent model, which naturally accommodates incomplete lineage sorting and uncertainties in the gene trees. For comparison, we included three heuristic coalescent-based methods (mp-est, SVDQuartets, and astral) as well as concatenation. From both data sets, we infer the phylogeny for the four extant gibbon genera to be (Hylobates, (Nomascus, (Hoolock, Symphalangus))). We used simulation guided by the real data to evaluate the accuracy of the methods used. Astral, while not as efficient as bpp, performed well in estimation of the species tree even in presence of excessive incomplete lineage sorting. Concatenation, mp-est and SVDQuartets were unreliable when the species tree contains very short internal branches. Likelihood ratio test of gene flow suggests a small amount of migration from Hylobates moloch to H. pileatus, while cross-genera migration is absent or rare. Our results highlight the utility of coalescent-based methods in addressing challenging species tree problems characterized by short internal branches and rampant gene tree-species tree discordance.
Collapse
Affiliation(s)
- Cheng-Min Shi
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom.,Radcliffe Institute for Advanced Studies, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
36
|
Dutheil JY, Hobolth A. Ancestral Population Genomics. Methods Mol Biol 2019; 1910:555-589. [PMID: 31278677 DOI: 10.1007/978-1-4939-9074-0_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Borrowing both from population genetics and phylogenetics, the field of population genomics emerged as full genomes of several closely related species were available. Providing we can properly model sequence evolution within populations undergoing speciation events, this resource enables us to estimate key population genetics parameters such as ancestral population sizes and split times. Furthermore we can enhance our understanding of the recombination process and investigate various selective forces. With the advent of resequencing technologies, genome-wide patterns of diversity in extant populations have now come to complement this picture, offering an increasing power to study more recent genetic history.We discuss the basic models of genomes in populations, including speciation models for closely related species. A major point in our discussion is that only a few complete genomes contain much information about the whole population. The reason being that recombination unlinks genomic regions, and therefore a few genomes contain many segments with distinct histories. The challenge of population genomics is to decode this mosaic of histories in order to infer scenarios of demography and selection. We survey modeling strategies for understanding genetic variation in ancestral populations and species. The underlying models build on the coalescent with recombination process and introduce further assumptions to scale the analyses to genomic data sets.
Collapse
Affiliation(s)
- Julien Y Dutheil
- Department of Evolutionary Genetics, Max Planck Institute of Evolutionary Biology, Plön, Germany.
| | - Asger Hobolth
- Bioinformatics Research Center (BiRC), Aarhus University, Aarhus, Denmark
| |
Collapse
|
37
|
Leaché AD, Zhu T, Rannala B, Yang Z. The Spectre of Too Many Species. Syst Biol 2019; 68:168-181. [PMID: 29982825 PMCID: PMC6292489 DOI: 10.1093/sysbio/syy051] [Citation(s) in RCA: 142] [Impact Index Per Article: 28.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Revised: 06/29/2018] [Accepted: 06/29/2018] [Indexed: 11/21/2022] Open
Abstract
Recent simulation studies examining the performance of Bayesian species delimitation as implemented in the bpp program have suggested that bpp may detect population splits but not species divergences and that it tends to over-split when data of many loci are analyzed. Here, we confirm these results and provide the mathematical justifications. We point out that the distinction between population and species splits made in the protracted speciation model (PSM) has no influence on the generation of gene trees and sequence data, which explains why no method can use such data to distinguish between population splits and speciation. We suggest that the PSM is unrealistic as its mechanism for assigning species status assumes instantaneous speciation, contradicting prevailing taxonomic practice. We confirm the suggestion, based on simulation, that in the case of speciation with gene flow, Bayesian model selection as implemented in bpp tends to detect population splits when the amount of data (the number of loci) increases. We discuss the use of a recently proposed empirical genealogical divergence index (gdi) for species delimitation and illustrate that parameter estimates produced by a full likelihood analysis as implemented in bpp provide much more reliable inference under the gdi than the approximate method phrapl. We distinguish between Bayesian model selection and parameter estimation and suggest that the model selection approach is useful for identifying sympatric cryptic species, while the parameter estimation approach may be used to implement empirical criteria for determining species status among allopatric populations.
Collapse
Affiliation(s)
- Adam D Leaché
- Department of Biology & Burke Museum of Natural History and Culture, University of Washington, Seattle, USA
| | - Tianqi Zhu
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, China
- Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, China
| | - Bruce Rannala
- Department of Evolution and Ecology, University of California Davis, One Shields Avenue, Davis, USA
| | - Ziheng Yang
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, China
- Department of Genetics, University College London, London, UK
- Radcliffe Institute for Advanced Studies, Harvard University, Cambridge, USA
| |
Collapse
|
38
|
Toussaint EFA, Turlin B, Balke M. Biogeographical, molecular and morphological evidence unveils cryptic diversity in the Oriental black rajahCharaxes solon(Fabricius, 1793) (Lepidoptera: Nymphalidae: Charaxinae). Biol J Linn Soc Lond 2018. [DOI: 10.1093/biolinnean/bly169] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Affiliation(s)
| | - Bernard Turlin
- SNSB-Bavarian State Collection of Zoology, Münchhausenstraße, Munich, Germany
| | | |
Collapse
|
39
|
Beichman AC, Huerta-Sanchez E, Lohmueller KE. Using Genomic Data to Infer Historic Population Dynamics of Nonmodel Organisms. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2018. [DOI: 10.1146/annurev-ecolsys-110617-062431] [Citation(s) in RCA: 89] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Genome sequence data are now being routinely obtained from many nonmodel organisms. These data contain a wealth of information about the demographic history of the populations from which they originate. Many sophisticated statistical inference procedures have been developed to infer the demographic history of populations from this type of genomic data. In this review, we discuss the different statistical methods available for inference of demography, providing an overview of the underlying theory and logic behind each approach. We also discuss the types of data required and the pros and cons of each method. We then discuss how these methods have been applied to a variety of nonmodel organisms. We conclude by presenting some recommendations for researchers looking to use genomic data to infer demographic history.
Collapse
Affiliation(s)
- Annabel C. Beichman
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095, USA
| | - Emilia Huerta-Sanchez
- Department of Molecular and Cell Biology, University of California, Merced, California 95343, USA
- Current affiliation: Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island 02912, USA
| | - Kirk E. Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095, USA
- Interdepartmental Program in Bioinformatics and Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California 90095, USA
| |
Collapse
|
40
|
Luo A, Ling C, Ho SYW, Zhu CD. Comparison of Methods for Molecular Species Delimitation Across a Range of Speciation Scenarios. Syst Biol 2018; 67:830-846. [PMID: 29462495 PMCID: PMC6101526 DOI: 10.1093/sysbio/syy011] [Citation(s) in RCA: 191] [Impact Index Per Article: 31.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2017] [Accepted: 02/10/2018] [Indexed: 11/14/2022] Open
Abstract
Species are fundamental units in biological research and can be defined on the basis of various operational criteria. There has been growing use of molecular approaches for species delimitation. Among the most widely used methods, the generalized mixed Yule-coalescent (GMYC) and Poisson tree processes (PTP) were designed for the analysis of single-locus data but are often applied to concatenations of multilocus data. In contrast, the Bayesian multispecies coalescent approach in the software Bayesian Phylogenetics and Phylogeography (BPP) explicitly models the evolution of multilocus data. In this study, we compare the performance of GMYC, PTP, and BPP using synthetic data generated by simulation under various speciation scenarios. We show that in the absence of gene flow, the main factor influencing the performance of these methods is the ratio of population size to divergence time, while number of loci and sample size per species have smaller effects. Given appropriate priors and correct guide trees, BPP shows lower rates of species overestimation and underestimation, and is generally robust to various potential confounding factors except high levels of gene flow. The single-threshold GMYC and the best strategy that we identified in PTP generally perform well for scenarios involving more than a single putative species when gene flow is absent, but PTP outperforms GMYC when fewer species are involved. Both methods are more sensitive than BPP to the effects of gene flow and potential confounding factors. Case studies of bears and bees further validate some of the findings from our simulation study, and reveal the importance of using an informed starting point for molecular species delimitation. Our results highlight the key factors affecting the performance of molecular species delimitation, with potential benefits for using these methods within an integrative taxonomic framework.
Collapse
Affiliation(s)
- Arong Luo
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China.,School of Life and Environmental Sciences, University of Sydney, Sydney, New South Wales 2006, Australia
| | - Cheng Ling
- Department of Computer Science and Technology, College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
| | - Simon Y W Ho
- School of Life and Environmental Sciences, University of Sydney, Sydney, New South Wales 2006, Australia
| | - Chao-Dong Zhu
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China.,College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
41
|
Degnan JH. Modeling Hybridization Under the Network Multispecies Coalescent. Syst Biol 2018; 67:786-799. [PMID: 29846734 PMCID: PMC6101600 DOI: 10.1093/sysbio/syy040] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2017] [Revised: 05/13/2018] [Accepted: 05/16/2018] [Indexed: 11/13/2022] Open
Abstract
Simultaneously modeling hybridization and the multispecies coalescent is becoming increasingly common, and inference of species networks in this context is now implemented in several software packages. This article addresses some of the conceptual issues and decisions to be made in this modeling, including whether or not to use branch lengths and issues with model identifiability. This article is based on a talk given at a Spotlight Session at Evolution 2017 meeting in Portland, Oregon. This session included several talks about modeling hybridization and gene flow in the presence of incomplete lineage sorting. Other talks given at this meeting are also included in this special issue of Systematic Biology.
Collapse
Affiliation(s)
- James H Degnan
- Department of Mathematics and Statistics, University of New Mexico, Albuquerque, NM 87131, USA
| |
Collapse
|
42
|
Reis MD, Gunnell GF, Barba-Montoya J, Wilkins A, Yang Z, Yoder AD. Using Phylogenomic Data to Explore the Effects of Relaxed Clocks and Calibration Strategies on Divergence Time Estimation: Primates as a Test Case. Syst Biol 2018; 67:594-615. [PMID: 29342307 PMCID: PMC6005039 DOI: 10.1093/sysbio/syy001] [Citation(s) in RCA: 94] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2017] [Revised: 12/26/2017] [Accepted: 01/05/2018] [Indexed: 11/13/2022] Open
Abstract
Primates have long been a test case for the development of phylogenetic methods for divergence time estimation. Despite a large number of studies, however, the timing of origination of crown Primates relative to the Cretaceous-Paleogene (K-Pg) boundary and the timing of diversification of the main crown groups remain controversial. Here, we analysed a data set of 372 taxa (367 Primates and 5 outgroups, 3.4 million aligned base pairs) that includes nine primate genomes. We systematically explore the effect of different interpretations of fossil calibrations and molecular clock models on primate divergence time estimates. We find that even small differences in the construction of fossil calibrations can have a noticeable impact on estimated divergence times, especially for the oldest nodes in the tree. Notably, choice of molecular rate model (autocorrelated or independently distributed rates) has an especially strong effect on estimated times, with the independent rates model producing considerably more ancient age estimates for the deeper nodes in the phylogeny. We implement thermodynamic integration, combined with Gaussian quadrature, in the program MCMCTree, and use it to calculate Bayes factors for clock models. Bayesian model selection indicates that the autocorrelated rates model fits the primate data substantially better, and we conclude that time estimates under this model should be preferred. We show that for eight core nodes in the phylogeny, uncertainty in time estimates is close to the theoretical limit imposed by fossil uncertainties. Thus, these estimates are unlikely to be improved by collecting additional molecular sequence data. All analyses place the origin of Primates close to the K-Pg boundary, either in the Cretaceous or straddling the boundary into the Palaeogene.
Collapse
Affiliation(s)
- Mario Dos Reis
- School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Gregg F Gunnell
- Division of Fossil Primates, Duke University Lemur Center, Durham, 1013 Broad Street, NC 27705, USA
| | - Jose Barba-Montoya
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Alex Wilkins
- Division of Fossil Primates, Duke University Lemur Center, Durham, 1013 Broad Street, NC 27705, USA
- Department of Anthropology, The Ohio State University, Columbus, OH 43210, USA
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Anne D Yoder
- Department of Biology, Duke University, Durham, NC 27708, USA
| |
Collapse
|
43
|
Smith TCA, Arndt PF, Eyre-Walker A. Large scale variation in the rate of germ-line de novo mutation, base composition, divergence and diversity in humans. PLoS Genet 2018; 14:e1007254. [PMID: 29590096 PMCID: PMC5891062 DOI: 10.1371/journal.pgen.1007254] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Revised: 04/09/2018] [Accepted: 02/13/2018] [Indexed: 01/17/2023] Open
Abstract
It has long been suspected that the rate of mutation varies across the human genome at a large scale based on the divergence between humans and other species. However, it is now possible to directly investigate this question using the large number of de novo mutations (DNMs) that have been discovered in humans through the sequencing of trios. We investigate a number of questions pertaining to the distribution of mutations using more than 130,000 DNMs from three large datasets. We demonstrate that the amount and pattern of variation differs between datasets at the 1MB and 100KB scales probably as a consequence of differences in sequencing technology and processing. In particular, datasets show different patterns of correlation to genomic variables such as replication time. Never-the-less there are many commonalities between datasets, which likely represent true patterns. We show that there is variation in the mutation rate at the 100KB, 1MB and 10MB scale that cannot be explained by variation at smaller scales, however the level of this variation is modest at large scales-at the 1MB scale we infer that ~90% of regions have a mutation rate within 50% of the mean. Different types of mutation show similar levels of variation and appear to vary in concert which suggests the pattern of mutation is relatively constant across the genome. We demonstrate that variation in the mutation rate does not generate large-scale variation in GC-content, and hence that mutation bias does not maintain the isochore structure of the human genome. We find that genomic features explain less than 40% of the explainable variance in the rate of DNM. As expected the rate of divergence between species is correlated to the rate of DNM. However, the correlations are weaker than expected if all the variation in divergence was due to variation in the mutation rate. We provide evidence that this is due the effect of biased gene conversion on the probability that a mutation will become fixed. In contrast to divergence, we find that most of the variation in diversity can be explained by variation in the mutation rate. Finally, we show that the correlation between divergence and DNM density declines as increasingly divergent species are considered.
Collapse
Affiliation(s)
| | - Peter F. Arndt
- Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Adam Eyre-Walker
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| |
Collapse
|
44
|
Rannala B, Yang Z. Efficient Bayesian Species Tree Inference under the Multispecies Coalescent. Syst Biol 2018; 66:823-842. [PMID: 28053140 PMCID: PMC8562347 DOI: 10.1093/sysbio/syw119] [Citation(s) in RCA: 93] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2015] [Accepted: 12/10/2016] [Indexed: 11/12/2022] Open
Abstract
We develop a Bayesian method for inferring the species phylogeny under the multispecies coalescent (MSC) model. To improve the mixing properties of the Markov chain Monte Carlo (MCMC) algorithm that traverses the space of species trees, we implement two efficient MCMC proposals: the first is based on the Subtree Pruning and Regrafting (SPR) algorithm and the second is based on a node-slider algorithm. Like the Nearest-Neighbor Interchange (NNI) algorithm we implemented previously, both new algorithms propose changes to the species tree, while simultaneously altering the gene trees at multiple genetic loci to automatically avoid conflicts with the newly proposed species tree. The method integrates over gene trees, naturally taking account of the uncertainty of gene tree topology and branch lengths given the sequence data. A simulation study was performed to examine the statistical properties of the new method. The method was found to show excellent statistical performance, inferring the correct species tree with near certainty when 10 loci were included in the dataset. The prior on species trees has some impact, particularly for small numbers of loci. We analyzed several previously published datasets (both real and simulated) for rattlesnakes and Philippine shrews, in comparison with alternative methods. The results suggest that the Bayesian coalescent-based method is statistically more efficient than heuristic methods based on summary statistics, and that our implementation is computationally more efficient than alternative full-likelihood methods under the MSC. Parameter estimates for the rattlesnake data suggest drastically different evolutionary dynamics between the nuclear and mitochondrial loci, even though they support largely consistent species trees. We discuss the different challenges facing the marginal likelihood calculation and transmodel MCMC as alternative strategies for estimating posterior probabilities for species trees. [Bayes factor; Bayesian inference; MCMC; multispecies coalescent; nodeslider; species tree; SPR.].
Collapse
Affiliation(s)
- Bruce Rannala
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| |
Collapse
|
45
|
Dalquen DA, Zhu T, Yang Z. Maximum Likelihood Implementation of an Isolation-with-Migration Model for Three Species. Syst Biol 2018; 66:379-398. [PMID: 27486180 DOI: 10.1093/sysbio/syw063] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Accepted: 07/08/2016] [Indexed: 01/03/2023] Open
Abstract
We develop a maximum likelihood (ML) method for estimating migration rates between species using genomic sequence data. A species tree is used to accommodate the phylogenetic relationships among three species, allowing for migration between the two sister species, while the third species is used as an out-group. A Markov chain characterization of the genealogical process of coalescence and migration is used to integrate out the migration histories at each locus analytically, whereas Gaussian quadrature is used to integrate over the coalescent times on each genealogical tree numerically. This is an extension of our early implementation of the symmetrical isolation-with-migration model for three species to accommodate arbitrary loci with two or three sequences per locus and to allow asymmetrical migration rates. Our implementation can accommodate tens of thousands of loci, making it feasible to analyze genome-scale data sets to test for gene flow. We calculate the posterior probabilities of gene trees at individual loci to identify genomic regions that are likely to have been transferred between species due to gene flow. We conduct a simulation study to examine the statistical properties of the likelihood ratio test for gene flow between the two in-group species and of the ML estimates of model parameters such as the migration rate. Inclusion of data from a third out-group species is found to increase dramatically the power of the test and the precision of parameter estimation. We compiled and analyzed several genomic data sets from the Drosophila fruit flies. Our analyses suggest no migration from D. melanogaster to D. simulans, and a significant amount of gene flow from D. simulans to D. melanogaster, at the rate of ~0.02 migrant individuals per generation. We discuss the utility of the multispecies coalescent model for species tree estimation, accounting for incomplete lineage sorting and migration.
Collapse
Affiliation(s)
- Daniel A Dalquen
- Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK
| | - Tianqi Zhu
- Center for Computational Genomics, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK.,Center for Computational Genomics, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
46
|
Schrago CG, Mello B, Pereira AG, Furtado C, Seuánez HN. Impact of long-term chromosomal shuffling on the multispecies coalescent analysis of two anthropoid primate lineages. Ecol Evol 2017; 8:1206-1216. [PMID: 29375791 PMCID: PMC5773316 DOI: 10.1002/ece3.3736] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Revised: 11/21/2017] [Accepted: 11/27/2017] [Indexed: 01/03/2023] Open
Abstract
Multispecies coalescent (MSC) theory assumes that gene trees inferred from individual loci are independent trials of the MSC process. As genes might be physically close in syntenic associations spanning along chromosome regions, these assumptions might be flawed in evolutionary lineages with substantial karyotypic shuffling. Neotropical primates (NP) represent an ideal case for assessing the performance of MSC methods in such scenarios because chromosome diploid number varies significantly in this lineage. To this end, we investigated the effect of sequence length on the theoretical expectations of MSC model, as well as the results of coalescent‐based tree inference methods. This was carried out by comparing NP with hominids, a lineage in which chromosome macrostructure has been stable for at least 15 million years. We found that departure from the MSC model in Neotropical primates decreased with smaller sequence fragments, where sites sharing the same evolutionary history were more frequently found than in longer fragments. This scenario probably resulted from extensive karyotypic rearrangement occurring during the radiation of NP, contrary to the comparatively stable chromosome evolution in hominids.
Collapse
Affiliation(s)
- Carlos G Schrago
- Department of Genetics Federal University of Rio de Janeiro Rio de Janeiro RJ Brazil
| | - Beatriz Mello
- Department of Genetics Federal University of Rio de Janeiro Rio de Janeiro RJ Brazil
| | - Anieli G Pereira
- Department of Genetics Federal University of Rio de Janeiro Rio de Janeiro RJ Brazil
| | - Carolina Furtado
- Division of Genetics National Cancer Institute Rio de Janeiro Brazil
| | - Hector N Seuánez
- Department of Genetics Federal University of Rio de Janeiro Rio de Janeiro RJ Brazil.,Division of Genetics National Cancer Institute Rio de Janeiro Brazil
| |
Collapse
|
47
|
Jennings WB. On the independent gene trees assumption in phylogenomic studies. Mol Ecol 2017; 26:4862-4871. [PMID: 28752599 DOI: 10.1111/mec.14274] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Revised: 07/13/2017] [Accepted: 07/24/2017] [Indexed: 11/28/2022]
Abstract
Multilocus coalescent methods for inferring species trees or historical demographic parameters typically require the assumption that gene trees for sampled SNPs or DNA sequence loci are conditionally independent given their species tree. In practice, researchers have used different criteria to delimit "independent loci." One criterion identifies sampled loci as being independent of each other if they undergo Mendelian independent assortment (IA criterion). O'Neill et al. (2013, Molecular Ecology, 22, 111-129) used this approach in their phylogeographic study of North American tiger salamander species complex. In two other studies, researchers developed a pair of related methods that employ an independent genealogies criterion (IG criterion), which considers the effects of population-level recombination on correlations between the gene trees of intrachromosomal loci. Here, I explain these three methods, illustrate their use with example data, and evaluate their efficacies. I show that the IA approach is more conservative, is simpler to use and requires fewer assumptions than the IG approaches. However, IG approaches can identify much larger numbers of independent loci than the IA method, which, in turn, allows researchers to obtain more precise and accurate estimates of species trees and historical demographic parameters. A disadvantage of the IG methods is that they require an estimate of the population recombination rate. Despite their drawbacks, IA and IG approaches provide molecular ecologists with promising a priori methods for selecting SNPs or DNA sequence loci that likely meet the independence assumption in coalescent-based phylogenomic studies.
Collapse
Affiliation(s)
- W Bryan Jennings
- Departamento de Vertebrados, Museu Nacional, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
48
|
Challenges in Species Tree Estimation Under the Multispecies Coalescent Model. Genetics 2017; 204:1353-1368. [PMID: 27927902 DOI: 10.1534/genetics.116.190173] [Citation(s) in RCA: 89] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Accepted: 09/25/2016] [Indexed: 11/18/2022] Open
Abstract
The multispecies coalescent (MSC) model has emerged as a powerful framework for inferring species phylogenies while accounting for ancestral polymorphism and gene tree-species tree conflict. A number of methods have been developed in the past few years to estimate the species tree under the MSC. The full likelihood methods (including maximum likelihood and Bayesian inference) average over the unknown gene trees and accommodate their uncertainties properly but involve intensive computation. The approximate or summary coalescent methods are computationally fast and are applicable to genomic datasets with thousands of loci, but do not make an efficient use of information in the multilocus data. Most of them take the two-step approach of reconstructing the gene trees for multiple loci by phylogenetic methods and then treating the estimated gene trees as observed data, without accounting for their uncertainties appropriately. In this article we review the statistical nature of the species tree estimation problem under the MSC, and explore the conceptual issues and challenges of species tree estimation by focusing mainly on simple cases of three or four closely related species. We use mathematical analysis and computer simulation to demonstrate that large differences in statistical performance may exist between the two classes of methods. We illustrate that several counterintuitive behaviors may occur with the summary methods but they are due to inefficient use of information in the data by summary methods and vanish when the data are analyzed using full-likelihood methods. These include (i) unidentifiability of parameters in the model, (ii) inconsistency in the so-called anomaly zone, (iii) singularity on the likelihood surface, and (iv) deterioration of performance upon addition of more data. We discuss the challenges and strategies of species tree inference for distantly related species when the molecular clock is violated, and highlight the need for improving the computational efficiency and model realism of the likelihood methods as well as the statistical efficiency of the summary methods.
Collapse
|
49
|
Marchetti M, Liuzzi A, Fermi B, Corsini R, Folli C, Speranzini V, Gandolfi F, Bettati S, Ronda L, Cendron L, Berni R, Zanotti G, Percudani R. Catalysis and Structure of Zebrafish Urate Oxidase Provide Insights into the Origin of Hyperuricemia in Hominoids. Sci Rep 2016; 6:38302. [PMID: 27922051 PMCID: PMC5138847 DOI: 10.1038/srep38302] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Accepted: 11/03/2016] [Indexed: 01/24/2023] Open
Abstract
Urate oxidase (Uox) catalyses the first reaction of oxidative uricolysis, a three-step enzymatic pathway that allows some animals to eliminate purine nitrogen through a water-soluble compound. Inactivation of the pathway in hominoids leads to elevated levels of sparingly soluble urate and puts humans at risk of hyperuricemia and gout. The uricolytic activities lost during evolution can be replaced by enzyme therapy. Here we report on the functional and structural characterization of Uox from zebrafish and the effects on the enzyme of the missense mutation (F216S) that preceded Uox pseudogenization in hominoids. Using a kinetic assay based on the enzymatic suppression of the spectroscopic interference of the Uox reaction product, we found that the F216S mutant has the same turnover number of the wild-type enzyme but a much-reduced affinity for the urate substrate and xanthine inhibitor. Our results indicate that the last functioning Uox in hominoid evolution had an increased Michaelis constant, possibly near to upper end of the normal range of urate in the human serum (~300 μM). Changes in the renal handling of urate during primate evolution can explain the genetic modification of uricolytic activities in the hominoid lineage without the need of assuming fixation of deleterious mutations.
Collapse
Affiliation(s)
| | - Anastasia Liuzzi
- Department of Life Sciences, University of Parma, 43124, Parma, Italy
| | - Beatrice Fermi
- Department of Life Sciences, University of Parma, 43124, Parma, Italy
| | - Romina Corsini
- Department of Life Sciences, University of Parma, 43124, Parma, Italy
| | - Claudia Folli
- Department of Food Science University of Parma, 43124, Parma, Italy
| | | | | | - Stefano Bettati
- Department of Neurosciences, University of Parma, 43124, Parma, Italy
| | - Luca Ronda
- Department of Neurosciences, University of Parma, 43124, Parma, Italy
| | - Laura Cendron
- Department of Biology, University of Padova, 35121, Padova, Italy
| | - Rodolfo Berni
- Department of Life Sciences, University of Parma, 43124, Parma, Italy
| | - Giuseppe Zanotti
- Department of Biology, University of Padova, 35121, Padova, Italy
| | | |
Collapse
|
50
|
Distribution of coalescent histories under the coalescent model with gene flow. Mol Phylogenet Evol 2016; 105:177-192. [DOI: 10.1016/j.ympev.2016.08.024] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2016] [Revised: 08/16/2016] [Accepted: 08/31/2016] [Indexed: 12/19/2022]
|