1
|
On the effects of selection and mutation on species tree inference. Mol Phylogenet Evol 2023; 179:107650. [PMID: 36441104 DOI: 10.1016/j.ympev.2022.107650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 10/17/2022] [Accepted: 10/18/2022] [Indexed: 11/24/2022]
Abstract
The effect of selection acting on regions of the genome on the accuracy of species-level phylogenetic inference using methods that do not explicitly model selection is an open question that is relevant to most, if not all, phylogenomic studies. To address this, we derive a mathematical approximation to the Wright-Fisher model with mutation and selection in the limit as the population size becomes large. In contrast to previous approximations based on diffusion processes, our approximation can be used to study the distribution of coalescent times for an arbitrary number of lineages, allowing calculation of the probability distribution of gene genealogies under the coalescent model. We use these calculations to show that direct selection at strengths typically encountered in practice has only a small effect on the distribution of coalescent times, and hence on the distribution of gene trees. This implies that many coalescent-based methods for estimating the species tree topology will be robust to the presence of selection in a subset of the underlying genes. Selection will, however, bias the estimation of speciation times, causing them to underestimate the true speciation times. Our model captures the effects of selection on the genealogies that generate the observed sequence data, but does not model selective pressures that act only on the subsequent sequences or that negatively impact gene tree estimation.
Collapse
|
2
|
Weyna A, Bourouina L, Galtier N, Romiguier J. Detection of F1 hybrids from single-genome data reveals frequent hybridization in Hymenoptera and particularly ants. Mol Biol Evol 2022; 39:6562163. [PMID: 35363317 PMCID: PMC9021736 DOI: 10.1093/molbev/msac071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Hybridization occupies a central role in many fundamental evolutionary processes, such as speciation or adaptation. Yet, despite its pivotal importance in evolution, little is known about the actual prevalence and distribution of current hybridization across the tree of life. Here we develop and implement a new statistical method enabling the detection of F1 hybrids from single-individual genome sequencing data. Using simulations and sequencing data from known hybrid systems, we first demonstrate the specificity of the method, and identify its statistical limits. Next, we showcase the method by applying it to available sequencing data from more than 1,500 species of Arthropods, including Hymenoptera, Hemiptera, Coleoptera, Diptera, and Archnida. Among these taxa, we find Hymenoptera, and especially ants, to display the highest number of candidate F1 hybrids, suggesting higher rates of recent hybridization between previously isolated gene pools in these groups. The prevalence of F1 hybrids was heterogeneously distributed across ants, with taxa including many candidates tending to harbor specific ecological and life-history traits. This work shows how large-scale genomic comparative studies of recent hybridization can be implemented, uncovering the determinants of first-generation hybridization across whole taxa.
Collapse
Affiliation(s)
- Arthur Weyna
- Institut des Sciences de l'Evolution (UMR 5554), University of Montpellier, CNRS
| | - Lucille Bourouina
- Institut des Sciences de l'Evolution (UMR 5554), University of Montpellier, CNRS
| | - Nicolas Galtier
- Institut des Sciences de l'Evolution (UMR 5554), University of Montpellier, CNRS
| | - Jonathan Romiguier
- Institut des Sciences de l'Evolution (UMR 5554), University of Montpellier, CNRS
| |
Collapse
|
3
|
Zhu T, Flouri T, Yang Z. A simulation study to examine the impact of recombination on phylogenomic inferences under the multispecies coalescent model. Mol Ecol 2022; 31:2814-2829. [PMID: 35313033 PMCID: PMC9321900 DOI: 10.1111/mec.16433] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 01/25/2022] [Accepted: 02/28/2022] [Indexed: 11/28/2022]
Affiliation(s)
- Tianqi Zhu
- Institute of Applied Mathematics Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing 100190 China
- Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences Beijing 100190 China
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment University College London London WC1E 6BT UK
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment University College London London WC1E 6BT UK
| |
Collapse
|
4
|
Silliman K, Indorf JL, Knowlton N, Browne WE, Hurt C. Base-substitution mutation rate across the nuclear genome of Alpheus snapping shrimp and the timing of isolation by the Isthmus of Panama. BMC Ecol Evol 2021; 21:104. [PMID: 34049492 PMCID: PMC8164322 DOI: 10.1186/s12862-021-01836-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 04/06/2021] [Indexed: 11/17/2022] Open
Abstract
Background The formation of the Isthmus of Panama and final closure of the Central American Seaway (CAS) provides an independent calibration point for examining the rate of DNA substitutions. This vicariant event has been widely used to estimate the substitution rate across mitochondrial genomes and to date evolutionary events in other taxonomic groups. Nuclear sequence data is increasingly being used to complement mitochondrial datasets for phylogenetic and evolutionary investigations; these studies would benefit from information regarding the rate and pattern of DNA substitutions derived from the nuclear genome. Results To estimate the genome-wide neutral mutation rate (µ), genotype-by-sequencing (GBS) datasets were generated for three transisthmian species pairs in Alpheus snapping shrimp. A range of bioinformatic filtering parameters were evaluated in order to minimize potential bias in mutation rate estimates that may result from SNP filtering. Using a Bayesian coalescent approach (G-PhoCS) applied to 44,960 GBS loci, we estimated µ to be 2.64E−9 substitutions/site/year, when calibrated with the closure of the CAS at 3 Ma. Post-divergence gene flow was detected in one species pair. Failure to account for this post-split migration inflates our substitution rate estimates, emphasizing the importance of demographic methods that can accommodate gene flow. Conclusions Results from our study, both parameter estimates and bioinformatic explorations, have broad-ranging implications for phylogeographic studies in other non-model taxa using reduced representation datasets. Our best estimate of µ that accounts for coalescent and demographic processes is remarkably similar to experimentally derived mutation rates in model arthropod systems. These results contradicted recent suggestions that the closure of the Isthmus was completed much earlier (around 10 Ma), as mutation rates based on an early calibration resulted in uncharacteristically low genomic mutation rates. Also, stricter filtering parameters resulted in biased datasets that generated lower mutation rate estimates and influenced demographic parameters, serving as a cautionary tale for the adherence to conservative bioinformatic strategies when generating reduced-representation datasets at the species level. To our knowledge this is the first use of transisthmian species pairs to calibrate the rate of molecular evolution from GBS data. Supplementary Information The online version contains supplementary material available at 10.1186/s12862-021-01836-3.
Collapse
Affiliation(s)
- Katherine Silliman
- School of Fisheries, Aquaculture, and Aquatic Sciences, Auburn University, Auburn, AL, 36849, USA. .,Committee on Evolutionary Biology, University of Chicago, Chicago, IL, 60637, USA.
| | - Jane L Indorf
- Department of Biology, University of Miami, Coral Gables, FL, 33146, USA
| | - Nancy Knowlton
- National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - William E Browne
- Department of Biology, University of Miami, Coral Gables, FL, 33146, USA
| | - Carla Hurt
- Department of Biology, University of Miami, Coral Gables, FL, 33146, USA.,Department of Biology, Tennessee Tech University, Cookeville, TN, 38505, USA
| |
Collapse
|
5
|
Gopalan S, Atkinson EG, Buck LT, Weaver TD, Henn BM. Inferring archaic introgression from hominin genetic data. Evol Anthropol 2021; 30:199-220. [PMID: 33951239 PMCID: PMC8360192 DOI: 10.1002/evan.21895] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Revised: 08/03/2020] [Accepted: 03/29/2021] [Indexed: 01/05/2023]
Abstract
Questions surrounding the timing, extent, and evolutionary consequences of archaic admixture into human populations have a long history in evolutionary anthropology. More recently, advances in human genetics, particularly in the field of ancient DNA, have shed new light on the question of whether or not Homo sapiens interbred with other hominin groups. By the late 1990s, published genetic work had largely concluded that archaic groups made no lasting genetic contribution to modern humans; less than a decade later, this conclusion was reversed following the successful DNA sequencing of an ancient Neanderthal. This reversal of consensus is noteworthy, but the reasoning behind it is not widely understood across all academic communities. There remains a communication gap between population geneticists and paleoanthropologists. In this review, we endeavor to bridge this gap by outlining how technological advancements, new statistical methods, and notable controversies ultimately led to the current consensus.
Collapse
Affiliation(s)
- Shyamalika Gopalan
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, New York, USA.,Department of Evolutionary Anthropology, Duke University, Durham, North Carolina, USA
| | - Elizabeth G Atkinson
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, New York, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital and Stanley Center for Psychiatric Research, Broad Institute, Boston, Massachusetts, USA
| | - Laura T Buck
- Research Centre in Evolutionary Anthropology and Palaeoecology, Liverpool John Moores University, Liverpool, UK
| | - Timothy D Weaver
- Department of Anthropology, University of California, Davis, California, USA
| | - Brenna M Henn
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, New York, USA.,Department of Anthropology, University of California, Davis, California, USA.,UC Davis Genome Center, University of California, Davis, California, USA
| |
Collapse
|
6
|
Wang Y, Ogilvie HA, Nakhleh L. Practical Speedup of Bayesian Inference of Species Phylogenies by Restricting the Space of Gene Trees. Mol Biol Evol 2021; 37:1809-1818. [PMID: 32077947 PMCID: PMC7253205 DOI: 10.1093/molbev/msaa045] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Species tree inference from multilocus data has emerged as a powerful paradigm in the postgenomic era, both in terms of the accuracy of the species tree it produces as well as in terms of elucidating the processes that shaped the evolutionary history. Bayesian methods for species tree inference are desirable in this area as they have been shown not only to yield accurate estimates, but also to naturally provide measures of confidence in those estimates. However, the heavy computational requirements of Bayesian inference have limited the applicability of such methods to very small data sets. In this article, we show that the computational efficiency of Bayesian inference under the multispecies coalescent can be improved in practice by restricting the space of the gene trees explored during the random walk, without sacrificing accuracy as measured by various metrics. The idea is to first infer constraints on the trees of the individual loci in the form of unresolved gene trees, and then to restrict the sampler to consider only resolutions of the constrained trees. We demonstrate the improvements gained by such an approach on both simulated and biological data.
Collapse
Affiliation(s)
- Yaxuan Wang
- Computer Science Department, Rice University, Houston, TX
| | - Huw A Ogilvie
- Computer Science Department, Rice University, Houston, TX
| | - Luay Nakhleh
- Computer Science Department, Rice University, Houston, TX
| |
Collapse
|
7
|
Inference of gene flow in the process of speciation: Efficient maximum-likelihood implementation of a generalised isolation-with-migration model. Theor Popul Biol 2021; 140:1-15. [PMID: 33736959 DOI: 10.1016/j.tpb.2021.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 02/28/2021] [Accepted: 03/01/2021] [Indexed: 11/21/2022]
Abstract
The 'isolation with migration' (IM) model has been extensively used in the literature to detect gene flow during the process of speciation. In this model, an ancestral population split into two or more descendant populations which subsequently exchanged migrants at a constant rate until the present. Of course, the assumption of constant gene flow until the present is often over-simplistic in the context of speciation. In this paper, we consider a 'generalised IM' (GIM) model: a two-population IM model in which migration rates and population sizes are allowed to change at some point in the past. By developing a maximum-likelihood implementation of this model, we enable inference on both historical and contemporary rates of gene flow between two closely related populations or species. The GIM model encompasses both the standard two-population IM model and the 'isolation with initial migration' (IIM) model as special cases, as well as a model of secondary contact. We examine for simulated data how our method can be used, by means of likelihood ratio tests or AIC scores, to distinguish between the following scenarios of population divergence: (a) divergence in complete isolation; (b) divergence with a period of gene flow followed by isolation; (c) divergence with a period of isolation followed by secondary contact; (d) divergence with ongoing gene flow. Our method is based on the coalescent and is suitable for data sets consisting of the number of nucleotide differences between one pair of DNA sequences at each of a large number of independent loci. As our method relies on an explicit expression for the likelihood, it is computationally very fast.
Collapse
|
8
|
Zhu T, Yang Z. Complexity of the simplest species tree problem. Mol Biol Evol 2021; 38:3993-4009. [PMID: 33492385 PMCID: PMC8382899 DOI: 10.1093/molbev/msab009] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Revised: 01/04/2021] [Accepted: 01/13/2021] [Indexed: 02/06/2023] Open
Abstract
The multispecies coalescent model provides a natural framework for species tree estimation accounting for gene-tree conflicts. Although a number of species tree methods under the multispecies coalescent have been suggested and evaluated using simulation, their statistical properties remain poorly understood. Here, we use mathematical analysis aided by computer simulation to examine the identifiability, consistency, and efficiency of different species tree methods in the case of three species and three sequences under the molecular clock. We consider four major species-tree methods including concatenation, two-step, independent-sites maximum likelihood, and maximum likelihood. We develop approximations that predict that the probit transform of the species tree estimation error decreases linearly with the square root of the number of loci. Even in this simplest case, major differences exist among the methods. Full-likelihood methods are considerably more efficient than summary methods such as concatenation and two-step. They also provide estimates of important parameters such as species divergence times and ancestral population sizes,whereas these parameters are not identifiable by summary methods. Our results highlight the need to improve the statistical efficiency of summary methods and the computational efficiency of full likelihood methods of species tree estimation.
Collapse
Affiliation(s)
- Tianqi Zhu
- Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Ziheng Yang
- Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,Department of Genetics, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
9
|
Hassan S, Surakka I, Taskinen MR, Salomaa V, Palotie A, Wessman M, Tukiainen T, Pirinen M, Palta P, Ripatti S. High-resolution population-specific recombination rates and their effect on phasing and genotype imputation. Eur J Hum Genet 2020; 29:615-624. [PMID: 33249422 PMCID: PMC8114909 DOI: 10.1038/s41431-020-00768-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 10/01/2020] [Accepted: 10/20/2020] [Indexed: 11/24/2022] Open
Abstract
Previous research has shown that using population-specific reference panels has a significant effect on downstream population genomic analyses like haplotype phasing, genotype imputation, and association, especially in the context of population isolates. Here, we developed a high-resolution recombination rate mapping at 10 and 50 kb scale using high-coverage (20–30×) whole-genome sequenced data of 55 family trios from Finland and compared it to recombination rates of non-Finnish Europeans (NFE). We tested the downstream effects of the population-specific recombination rates in statistical phasing and genotype imputation in Finns as compared to the same analyses performed by using the NFE-based recombination rates. We found that Finnish recombination rates have a moderately high correlation (Spearman’s ρ = 0.67–0.79) with NFE, although on average (across all autosomal chromosomes), Finnish rates (2.268 ± 0.4209 cM/Mb) are 12–14% lower than NFE (2.641 ± 0.5032 cM/Mb). Finnish recombination map was found to have no significant effect in haplotype phasing accuracy (switch error rates ~2%) and average imputation concordance rates (97–98% for common, 92–96% for low frequency and 78–90% for rare variants). Our results suggest that haplotype phasing and genotype imputation mostly depend on population-specific contexts like appropriate reference panels and their sample size, but not on population-specific recombination maps. Even though recombination rate estimates had some differences between the Finnish and NFE populations, haplotyping and imputation had not been noticeably affected by the recombination map used. Therefore, the currently available HapMap recombination maps seem robust for population-specific phasing and imputation pipelines, even in the context of relatively isolated populations like Finland.
Collapse
Affiliation(s)
- Shabbeer Hassan
- Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Helsinki, Finland
| | - Ida Surakka
- Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Helsinki, Finland
| | - Marja-Riitta Taskinen
- Clinical and molecular metabolism, Research program unit, University of Helsinki, Helsinki, Finland
| | - Veikko Salomaa
- Finnish Institute for Health and Welfare, Helsinki, Finland
| | - Aarno Palotie
- Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Helsinki, Finland.,Massachusetts General Hospital & Harvard Medical School, Boston, MA, USA.,Broad Institute of the Massachusetts Institute of Technology and Harvard University, Cambridge, MA, USA
| | - Maija Wessman
- Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Helsinki, Finland
| | - Taru Tukiainen
- Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Helsinki, Finland
| | - Matti Pirinen
- Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Helsinki, Finland.,Department of Public Health, Faculty of Medicine, Clinicum, University of Helsinki, Helsinki, Finland.,Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Priit Palta
- Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Helsinki, Finland.,Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Samuli Ripatti
- Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Helsinki, Finland. .,Broad Institute of the Massachusetts Institute of Technology and Harvard University, Cambridge, MA, USA. .,Department of Public Health, Faculty of Medicine, Clinicum, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
10
|
Cheng X, DeGiorgio M. Flexible Mixture Model Approaches That Accommodate Footprint Size Variability for Robust Detection of Balancing Selection. Mol Biol Evol 2020; 37:3267-3291. [PMID: 32462188 PMCID: PMC7820363 DOI: 10.1093/molbev/msaa134] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Long-term balancing selection typically leaves narrow footprints of increased genetic diversity, and therefore most detection approaches only achieve optimal performances when sufficiently small genomic regions (i.e., windows) are examined. Such methods are sensitive to window sizes and suffer substantial losses in power when windows are large. Here, we employ mixture models to construct a set of five composite likelihood ratio test statistics, which we collectively term B statistics. These statistics are agnostic to window sizes and can operate on diverse forms of input data. Through simulations, we show that they exhibit comparable power to the best-performing current methods, and retain substantially high power regardless of window sizes. They also display considerable robustness to high mutation rates and uneven recombination landscapes, as well as an array of other common confounding scenarios. Moreover, we applied a specific version of the B statistics, termed B2, to a human population-genomic data set and recovered many top candidates from prior studies, including the then-uncharacterized STPG2 and CCDC169-SOHLH2, both of which are related to gamete functions. We further applied B2 on a bonobo population-genomic data set. In addition to the MHC-DQ genes, we uncovered several novel candidate genes, such as KLRD1, involved in viral defense, and SCN9A, associated with pain perception. Finally, we show that our methods can be extended to account for multiallelic balancing selection and integrated the set of statistics into open-source software named BalLeRMix for future applications by the scientific community.
Collapse
Affiliation(s)
- Xiaoheng Cheng
- Huck Institutes of Life Sciences, Pennsylvania State University, University Park, PA
- Department of Biology, Pennsylvania State University, University Park, PA
| | - Michael DeGiorgio
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL
| |
Collapse
|
11
|
Košuthová A, Bergsten J, Westberg M, Wedin M. Species delimitation in the cyanolichen genus Rostania. BMC Evol Biol 2020; 20:115. [PMID: 32912146 PMCID: PMC7488055 DOI: 10.1186/s12862-020-01681-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Accepted: 08/31/2020] [Indexed: 11/24/2022] Open
Abstract
Background In this study, we investigate species limits in the cyanobacterial lichen genus Rostania (Collemataceae, Peltigerales, Lecanoromycetes). Four molecular markers (mtSSU rDNA, β-tubulin, MCM7, RPB2) were sequenced and analysed with two coalescent-based species delimitation methods: the Generalized Mixed Yule Coalescent model (GMYC) and a Bayesian species delimitation method (BPP) using a multispecies coalescence model (MSC), the latter with or without an a priori defined guide tree. Results Species delimitation analyses indicate the presence of eight strongly supported candidate species. Conclusive correlation between morphological/ecological characters and genetic delimitation could be found for six of these. Of the two additional candidate species, one is represented by a single sterile specimen and the other currently lacks morphological or ecological supporting evidence. Conclusions We conclude that Rostania includes a minimum of six species: R. ceranisca, R. multipunctata, R. occultata 1, R. occultata 2, R. occultata 3, and R. occultata 4,5,6. Three distinct Nostoc morphotypes occur in Rostania, and there is substantial correlation between these morphotypes and Rostania thallus morphology.
Collapse
Affiliation(s)
- Alica Košuthová
- Department of Botany, Swedish Museum of Natural History, P.O. Box 50007, SE-104 05, Stockholm, Sweden.
| | - Johannes Bergsten
- Department of Zoology, Swedish Museum of Natural History, P.O. Box 50007, SE-104 05, Stockholm, Sweden
| | - Martin Westberg
- Museum of Evolution, Uppsala University, Norbyvägen 16, SE-752 36, Uppsala, Sweden
| | - Mats Wedin
- Department of Botany, Swedish Museum of Natural History, P.O. Box 50007, SE-104 05, Stockholm, Sweden
| |
Collapse
|
12
|
Harris AM, DeGiorgio M. Identifying and Classifying Shared Selective Sweeps from Multilocus Data. Genetics 2020; 215:143-171. [PMID: 32152048 PMCID: PMC7198270 DOI: 10.1534/genetics.120.303137] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 02/29/2020] [Indexed: 11/18/2022] Open
Abstract
Positive selection causes beneficial alleles to rise to high frequency, resulting in a selective sweep of the diversity surrounding the selected sites. Accordingly, the signature of a selective sweep in an ancestral population may still remain in its descendants. Identifying signatures of selection in the ancestor that are shared among its descendants is important to contextualize the timing of a sweep, but few methods exist for this purpose. We introduce the statistic SS-H12, which can identify genomic regions under shared positive selection across populations and is based on the theory of the expected haplotype homozygosity statistic H12, which detects recent hard and soft sweeps from the presence of high-frequency haplotypes. SS-H12 is distinct from comparable statistics because it requires a minimum of only two populations, and properly identifies and differentiates between independent convergent sweeps and true ancestral sweeps, with high power and robustness to a variety of demographic models. Furthermore, we can apply SS-H12 in conjunction with the ratio of statistics we term [Formula: see text] and [Formula: see text] to further classify identified shared sweeps as hard or soft. Finally, we identified both previously reported and novel shared sweep candidates from human whole-genome sequences. Previously reported candidates include the well-characterized ancestral sweeps at LCT and SLC24A5 in Indo-Europeans, as well as GPHN worldwide. Novel candidates include an ancestral sweep at RGS18 in sub-Saharan Africans involved in regulating the platelet response and implicated in sudden cardiac death, and a convergent sweep at C2CD5 between European and East Asian populations that may explain their different insulin responses.
Collapse
Affiliation(s)
- Alexandre M Harris
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802
- Molecular, Cellular, and Integrative Biosciences at the Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802
| | - Michael DeGiorgio
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, Florida 33431
| |
Collapse
|
13
|
Guo Y, Peng Z, Liu J, Yuan N, Wang Z, Du J. Systematic Comparisons of Positively Selected Genes between Gossypium arboreum and Gossypium raimondii Genomes. Curr Bioinform 2019. [DOI: 10.2174/1574893614666190227151013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Studies of Positively Selected Genes (PSGs) in microorganisms and
mammals have provided insights into the dynamics of genome evolution and the genetic basis of
differences between species by using whole genome-wide scans. Systematic investigations and
comparisons of PSGs in plants, however, are still limited.
Objective:
A systematic comparison of PSGs between the genomes of two cotton species,
Gossypium arboreum (G. arboreum) and G. raimondii, will give the key answer for revealing
molecular evolutionary differences in plants.
Methods:
Genome sequences of G. arboreum and G. raimondii were compared, including Whole
Genome Duplication (WGD) events and genomic features such as gene number, gene length,
codon bias index, evolutionary rate, number of expressed genes, and retention of duplicated
copies.
Results:
Unlike the PSGs in G. raimondii, G. arboreum comprised more PSGs, smaller gene size
and fewer expressed gene. In addition, the PSGs evolved at a higher rate of synonymous
substitutions, but were subjected to lower selection pressure. The PSGs in G. arboreum were also
retained with a lower number of duplicate gene copies than G. raimondii after a single WGD event
involving Gossypium.
Conclusion:
These data indicate that PSGs in G. arboreum and G. raimondii differ not only in
Ka/Ks, but also in their evolutionary, structural, and expression properties, indicating that
divergence of G. arboreum and G. raimondii was associated with differences in PSGs in terms of
evolutionary rates, gene length, expression patterns, and WGD retention in Gossypium.
Collapse
Affiliation(s)
- Yue Guo
- Provincial Key Laboratory of Agrobiology, Institute of Crop Germplasm and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
| | - Zhen Peng
- Provincial Key Laboratory of Agrobiology, Institute of Crop Germplasm and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
| | - Jing Liu
- Provincial Key Laboratory of Agrobiology, Institute of Crop Germplasm and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
| | - Na Yuan
- Provincial Key Laboratory of Agrobiology, Institute of Crop Germplasm and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
| | - Zhen Wang
- Provincial Key Laboratory of Agrobiology, Institute of Crop Germplasm and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
| | - Jianchang Du
- Provincial Key Laboratory of Agrobiology, Institute of Crop Germplasm and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
| |
Collapse
|
14
|
Pie MR, Bornschein MR, Ribeiro LF, Faircloth BC, McCormack JE. Phylogenomic species delimitation in microendemic frogs of the Brazilian Atlantic Forest. Mol Phylogenet Evol 2019; 141:106627. [PMID: 31539606 DOI: 10.1016/j.ympev.2019.106627] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 08/17/2019] [Accepted: 09/17/2019] [Indexed: 10/26/2022]
Abstract
The advent of next-generation sequencing allows researchers to use large-scale datasets for species delimitation analyses, yet one can envision an inflection point where the added accuracy of including more loci does not offset the increased computational burden. One alternative to including all loci could be to prioritize the analysis of loci for which there is an expectation of high informativeness. Here, we explore the issue of species delimitation and locus selection with montane species from two anuran genera that have been isolated in sky islands across the southern Brazilian Atlantic Forest: Melanophryniscus (Bufonidae) and Brachycephalus (Brachycephalidae). To delimit species, we obtained genetic data using target enrichment of ultraconserved elements from 32 populations (13 for Melanophryniscus and 19 for Brachycephalus), and we were able to create datasets that included over 800 loci with no missing data. We ranked loci according to their number of parsimony-informative sites, and we performed species delimitation analyses using BPP with the most informative 10, 20, 40, 80, 160, 320, and 640 loci. We identified three types of phylogenetic node: nodes with either consistently high or low support regardless of the number of loci or their informativeness and nodes that were initially poorly supported where support became stronger as we included more data. When viewed across all sensitivity analyses, our results suggest that the current species richness in both genera is likely underestimated. In addition, our results show the effects of different sampling strategies on species delimitation using phylogenomic datasets.
Collapse
Affiliation(s)
- Marcio R Pie
- Departamento de Zoologia, Universidade Federal do Paraná, CEP 81531-980 Curitiba, Paraná, Brazil; Mater Natura - Instituto de Estudos Ambientais, CEP 80250-020 Curitiba, Paraná, Brazil.
| | - Marcos R Bornschein
- Mater Natura - Instituto de Estudos Ambientais, CEP 80250-020 Curitiba, Paraná, Brazil; Instituto de Biociências, Universidade Estadual Paulista, Praça Infante Dom Henrique s/no, Parque Bitaru, CEP 11330-900 São Vicente, São Paulo, Brazil
| | - Luiz F Ribeiro
- Mater Natura - Instituto de Estudos Ambientais, CEP 80250-020 Curitiba, Paraná, Brazil; Escola de Ciências da Vida, Pontifícia Universidade Católica do Paraná, CEP 80215-901 Curitiba, Paraná, Brazil
| | - Brant C Faircloth
- Department of Biological Sciences and Museum of Natural Science, Louisiana State University, Baton Rouge, LA 70803, USA
| | - John E McCormack
- Moore Laboratory of Zoology, Occidental College, 1600 Campus Road, Los Angeles, CA 90041, USA
| |
Collapse
|
15
|
Williams AC, Hill LJ. Nicotinamide as Independent Variable for Intelligence, Fertility, and Health: Origin of Human Creative Explosions? Int J Tryptophan Res 2019; 12:1178646919855944. [PMID: 31258332 PMCID: PMC6585247 DOI: 10.1177/1178646919855944] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Accepted: 05/03/2019] [Indexed: 12/28/2022] Open
Abstract
Meat and nicotinamide acquisition was a defining force during the 2-million-year evolution of the big brains necessary for, anatomically modern, Homo sapiens to survive. Our next move was down the food chain during the Mesolithic 'broad spectrum', then horticultural, followed by the Neolithic agricultural revolutions and progressively lower average 'doses' of nicotinamide. We speculate that a fertility crisis and population bottleneck around 40 000 years ago, at the time of the Last Glacial Maximum, was overcome by Homo (but not the Neanderthals) by concerted dietary change plus profertility genes and intense sexual selection culminating in behaviourally modern Homo sapiens. Increased reliance on the 'de novo' synthesis of nicotinamide from tryptophan conditioned the immune system to welcome symbionts, such as TB (that excrete nicotinamide), and to increase tolerance of the foetus and thereby fertility. The trade-offs during the warmer Holocene were physical and mental stunting and more infectious diseases and population booms and busts. Higher nicotinamide exposure could be responsible for recent demographic and epidemiological transitions to lower fertility and higher longevity, but with more degenerative and auto-immune disease.
Collapse
Affiliation(s)
- Adrian C Williams
- Department of Neurology, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Lisa J Hill
- School of Biomedical Sciences, Institute of Clinical Sciences, University of Birmingham, Birmingham, UK
| |
Collapse
|
16
|
Flouri T, Jiao X, Rannala B, Yang Z. Species Tree Inference with BPP Using Genomic Sequences and the Multispecies Coalescent. Mol Biol Evol 2019; 35:2585-2593. [PMID: 30053098 PMCID: PMC6188564 DOI: 10.1093/molbev/msy147] [Citation(s) in RCA: 184] [Impact Index Per Article: 36.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
The multispecies coalescent provides a natural framework for accommodating ancestral genetic polymorphism and coalescent processes that can cause different genomic regions to have different genealogical histories. The Bayesian program BPP includes a full-likelihood implementation of the multispecies coalescent, using transmodel Markov chain Monte Carlo to calculate the posterior probabilities of different species trees. BPP is suitable for analyzing multilocus sequence data sets and it accommodates the heterogeneity of gene trees (both the topology and branch lengths) among loci and gene tree uncertainties due to limited phylogenetic information at each locus. Here, we provide a practical guide to the use of BPP in species tree estimation. BPP is a command-line program that runs on linux, macosx, and windows. This protocol shows how to use both BPP 3.4 (http://abacus.gene.ucl.ac.uk/software/) and BPP 4.0 (https://github.com/bpp/).
Collapse
Affiliation(s)
- Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Xiyun Jiao
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Bruce Rannala
- Department of Ecology and Evolution, University of California, Davis, CA
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| |
Collapse
|
17
|
Buse E, Markert UR. The immunology of the macaque placenta: A detailed analysis and critical comparison with the human placenta. Crit Rev Clin Lab Sci 2019; 56:118-145. [PMID: 30632863 DOI: 10.1080/10408363.2018.1538200] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
The cynomolgus monkey is increasingly considered in toxicological research as the most appropriate model for humans due to the species' close physiological contiguity, including reproductive physiology. Here, literature on the cynomolgus monkey placenta is reviewed in regards to its similarity to the human placenta and particularly for its immunological role, which is not entirely mirrored in humans. Pertinent original data are included in this article. The cynomolgus monkey placenta is evaluated based on three aspects: first, morphological development; second, the spatial and temporal appearance of maternal and fetal immune cells and certain immune cell products of the innate and adaptive immune systems; and third, the expression of relevant immune tolerance-related molecules including the homologs of anti-human leucocyte antigen, indoleamine 2,3-dioxygenase, FAS/FAS-L, annexin II, and progesterone. Parameters relevant to the immunological role of the placenta are evaluated from the immunologically immature stage of gestational day (GD) 50 until more mature stages close to birth. Selected comparisons are drawn with human and other laboratory animal placentas. In conclusion, the cynomolgus monkey placenta has a high degree of morphological and physiological similarity to the human placenta. However, there are differences in the topographical distribution of cell types and immune tolerance-related molecules. Three basic features are recognized: (1) the immunological capacity of the placenta changes throughout the lifetime of the organ; (2) these immunological changes include multiple parameters such as morphological adaptations, cell type involvement, and changes in immune-relevant molecule expression; and (3) the immune systems of two genetically disparate individuals (mother and child) are functionally intertwined at the maternal-fetal interface.
Collapse
Affiliation(s)
| | - Udo R Markert
- b Placenta Lab, Department of Obstetrics , University Hospital Jena , Jena , Germany
| |
Collapse
|
18
|
Leaché AD, Zhu T, Rannala B, Yang Z. The Spectre of Too Many Species. Syst Biol 2019; 68:168-181. [PMID: 29982825 PMCID: PMC6292489 DOI: 10.1093/sysbio/syy051] [Citation(s) in RCA: 137] [Impact Index Per Article: 27.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Revised: 06/29/2018] [Accepted: 06/29/2018] [Indexed: 11/21/2022] Open
Abstract
Recent simulation studies examining the performance of Bayesian species delimitation as implemented in the bpp program have suggested that bpp may detect population splits but not species divergences and that it tends to over-split when data of many loci are analyzed. Here, we confirm these results and provide the mathematical justifications. We point out that the distinction between population and species splits made in the protracted speciation model (PSM) has no influence on the generation of gene trees and sequence data, which explains why no method can use such data to distinguish between population splits and speciation. We suggest that the PSM is unrealistic as its mechanism for assigning species status assumes instantaneous speciation, contradicting prevailing taxonomic practice. We confirm the suggestion, based on simulation, that in the case of speciation with gene flow, Bayesian model selection as implemented in bpp tends to detect population splits when the amount of data (the number of loci) increases. We discuss the use of a recently proposed empirical genealogical divergence index (gdi) for species delimitation and illustrate that parameter estimates produced by a full likelihood analysis as implemented in bpp provide much more reliable inference under the gdi than the approximate method phrapl. We distinguish between Bayesian model selection and parameter estimation and suggest that the model selection approach is useful for identifying sympatric cryptic species, while the parameter estimation approach may be used to implement empirical criteria for determining species status among allopatric populations.
Collapse
Affiliation(s)
- Adam D Leaché
- Department of Biology & Burke Museum of Natural History and Culture, University of Washington, Seattle, USA
| | - Tianqi Zhu
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, China
- Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, China
| | - Bruce Rannala
- Department of Evolution and Ecology, University of California Davis, One Shields Avenue, Davis, USA
| | - Ziheng Yang
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, China
- Department of Genetics, University College London, London, UK
- Radcliffe Institute for Advanced Studies, Harvard University, Cambridge, USA
| |
Collapse
|
19
|
Camacho GP, Pie MR, Feitosa RM, Barbeitos MS. Exploring gene tree incongruence at the origin of ants and bees (Hymenoptera). ZOOL SCR 2018. [DOI: 10.1111/zsc.12332] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Gabriela P. Camacho
- Programa de Pós‐Graduação em Entomologia, Departamento de Zoologia Universidade Federal do Paraná Curitiba Brazil
- Department of Entomology, National Museum of Natural History Smithsonian Institution Washington District of Columbia
| | - Marcio R. Pie
- Programa de Pós‐Graduação em Entomologia, Departamento de Zoologia Universidade Federal do Paraná Curitiba Brazil
- Programa de Pós‐Graduação em Zoologia, Departamento de Zoologia Universidade Federal do Paraná Curitiba Brazil
| | - Rodrigo M. Feitosa
- Programa de Pós‐Graduação em Entomologia, Departamento de Zoologia Universidade Federal do Paraná Curitiba Brazil
| | - Marcos S. Barbeitos
- Programa de Pós‐Graduação em Zoologia, Departamento de Zoologia Universidade Federal do Paraná Curitiba Brazil
| |
Collapse
|
20
|
Suárez-Villota EY, Quercia CA, Díaz LM, Vera-Sovier V, Nuñez JJ. Speciation in a biodiversity hotspot: Phylogenetic relationships, species delimitation, and divergence times of Patagonian ground frogs from the Eupsophus roseus group (Alsodidae). PLoS One 2018; 13:e0204968. [PMID: 30543633 PMCID: PMC6292574 DOI: 10.1371/journal.pone.0204968] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 11/27/2018] [Indexed: 11/19/2022] Open
Abstract
The alsodid ground frogs of the Eupsophus genus are divided into two groups, the roseus (2n = 30) and vertebralis (2n = 28), which are distributed throughout the temperate Nothofagus forests of South America. Currently, the roseus group is composed by four species, while the vertebralis group consists of two. Phylogenetic relationships and species delimitation within each group are controversial. In fact, previous analyses considered that the roseus group was composed of between four to nine species. In this work, we evaluated phylogenetic relationships, diversification times, and species delimitation within the roseus group using a multi-locus dataset. For this purpose, mitochondrial (D-loop, Cyt b, and COI) and nuclear (POMC and CRYBA1) partial sequences from 164 individuals were amplified, representing all species. Maximum Likelihood (ML) and Bayesian approaches were used to reconstruct phylogenetic relationships. Species tree was estimated using BEAST and singular value decomposition scores for species quartets (SVDquartets). Species limits were evaluated with six coalescent approaches. Diversification times were estimated using mitochondrial and nuclear rates with LogNormal relaxed clock in BEAST. Nine well-supported monophyletic lineages were recovered in Bayesian, ML, and SVDquartets, including eight named species and a lineage composed by specimens from the Villarrica population (Bootstrap:>70, PP:> 0.99). Single-locus species delimitation analyses overestimated the species number in E. migueli, E. calcaratus, and E. roseus lineages, while multi-locus analyses recovered as species the nine lineages observed in phylogenetic analyses (Ctax = 0.69). It is hypothesized that Eupsophus diversification occurred during Mid-Pleistocene (0.42-0.14 Mya), with most species having originated after the Last Southern Patagonian Glaciation (0.18 Mya). Our results revitalize the hypothesis that the E. roseus group is composed of eight species and support the Villarrica lineage as a new putative species.
Collapse
Affiliation(s)
| | - Camila A. Quercia
- Instituto de Ciencias Marinas y Limnológicas, Universidad Austral de Chile, Valdivia, Chile
| | - Leila M. Díaz
- Instituto de Ciencias Marinas y Limnológicas, Universidad Austral de Chile, Valdivia, Chile
| | - Victoria Vera-Sovier
- Instituto de Ciencias Marinas y Limnológicas, Universidad Austral de Chile, Valdivia, Chile
| | - José J. Nuñez
- Instituto de Ciencias Marinas y Limnológicas, Universidad Austral de Chile, Valdivia, Chile
| |
Collapse
|
21
|
Appropriate Assignment of Fossil Calibration Information Minimizes the Difference between Phylogenetic and Pedigree Mutation Rates in Humans. Life (Basel) 2018; 8:life8040049. [PMID: 30360410 PMCID: PMC6316143 DOI: 10.3390/life8040049] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Revised: 10/18/2018] [Accepted: 10/18/2018] [Indexed: 12/24/2022] Open
Abstract
Studies that measured mutation rates in human populations using pedigrees have reported values that differ significantly from rates estimated from the phylogenetic comparison of humans and chimpanzees. Consequently, exchanges between mutation rate values across different timescales lead to conflicting divergence time estimates. It has been argued that this variation of mutation rate estimates across hominoid evolution is in part caused by incorrect assignment of calibration information to the mean coalescent time among loci, instead of the true genetic isolation (speciation) time between humans and chimpanzees. In this study, we investigated the feasibility of estimating the human pedigree mutation rate using phylogenetic data from the genomes of great apes. We found that, when calibration information was correctly assigned to the human⁻chimpanzee speciation time (and not to the coalescent time), estimates of phylogenetic mutation rates were statistically equivalent to the estimates previously reported using studies of human pedigrees. We conclude that, within the range of biologically realistic ancestral generation times, part of the difference between whole-genome phylogenetic and pedigree mutation rates is due to inappropriate assignment of fossil calibration information to the mean coalescent time instead of the speciation time. Although our results focus on the human⁻chimpanzee divergence, our findings are general, and relevant to the inference of the timescale of the tree of life.
Collapse
|
22
|
Detection and Classification of Hard and Soft Sweeps from Unphased Genotypes by Multilocus Genotype Identity. Genetics 2018; 210:1429-1452. [PMID: 30315068 DOI: 10.1534/genetics.118.301502] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2018] [Accepted: 10/08/2018] [Indexed: 11/18/2022] Open
Abstract
Positive natural selection can lead to a decrease in genomic diversity at the selected site and at linked sites, producing a characteristic signature of elevated expected haplotype homozygosity. These selective sweeps can be hard or soft. In the case of a hard selective sweep, a single adaptive haplotype rises to high population frequency, whereas multiple adaptive haplotypes sweep through the population simultaneously in a soft sweep, producing distinct patterns of genetic variation in the vicinity of the selected site. Measures of expected haplotype homozygosity have previously been used to detect sweeps in multiple study systems. However, these methods are formulated for phased haplotype data, typically unavailable for nonmodel organisms, and some may have reduced power to detect soft sweeps due to their increased genetic diversity relative to hard sweeps. To address these limitations, we applied the H12 and H2/H1 statistics proposed in 2015 by Garud et al., which have power to detect both hard and soft sweeps, to unphased multilocus genotypes, denoting them as G12 and G2/G1. G12 (and the more direct expected homozygosity analog to H12, denoted G123) has comparable power to H12 for detecting both hard and soft sweeps. G2/G1 can be used to classify hard and soft sweeps analogously to H2/H1, conditional on a genomic region having high G12 or G123 values. The reason for this power is that, under random mating, the most frequent haplotypes will yield the most frequent multilocus genotypes. Simulations based on parameters compatible with our recent understanding of human demographic history suggest that expected homozygosity methods are best suited for detecting recent sweeps, and increase in power under recent population expansions. Finally, we find candidates for selective sweeps within the 1000 Genomes CEU, YRI, GIH, and CHB populations, which corroborate and complement existing studies.
Collapse
|
23
|
Pei J, Chu C, Li X, Lu B, Wu Y. CLADES: A classification-based machine learning method for species delimitation from population genetic data. Mol Ecol Resour 2018; 18:1144-1156. [PMID: 29667323 DOI: 10.1111/1755-0998.12887] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Revised: 03/30/2018] [Accepted: 04/03/2018] [Indexed: 11/30/2022]
Abstract
Species are considered to be the basic unit of ecological and evolutionary studies. As multilocus genomic data are increasingly available, there have been considerable interests in the use of DNA sequence data to delimit species. In this study, we show that machine learning can be used for species delimitation. Our method treats the species delimitation problem as a classification problem for identifying the category of a new observation on the basis of training data. Extensive simulation is first conducted over a broad range of evolutionary parameters for training purposes. Each pair of known populations is combined to form training samples with a label of "same species" or "different species". We use support vector machine (SVM) to train a classifier using a set of summary statistics computed from training samples as features. The trained classifier can classify a test sample to two outcomes: "same species" or "different species". Given multilocus genomic data of multiple related organisms or populations, our method (called CLADES) performs species delimitation by first classifying pairs of populations. CLADES then delimits species by maximizing the likelihood of species assignment for multiple populations. CLADES is evaluated through extensive simulation and also tested on real genetic data. We show that CLADES is both accurate and efficient for species delimitation when compared with existing methods. CLADES can be useful especially when existing methods have difficulty in delimitation, for example with short species divergence time and gene flow.
Collapse
Affiliation(s)
- Jingwen Pei
- Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut
| | - Chong Chu
- Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
| | - Xin Li
- Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut
| | - Bin Lu
- Chengdu Institute of Biology, Chinese Academy of Science, Chengdu, China
| | - Yufeng Wu
- Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut
| |
Collapse
|
24
|
Abstract
Human obesity has a large genetic component, yet has many serious negative consequences. How this state of affairs has evolved has generated wide debate. The thrifty gene hypothesis was the first attempt to explain obesity as a consequence of adaptive responses to an ancient environment that in modern society become disadvantageous. The idea is that genes (or more precisely, alleles) predisposing to obesity may have been selected for by repeated exposure to famines. However, this idea has many flaws: for instance, selection of the supposed magnitude over the duration of human evolution would fix any thrifty alleles (famines kill the old and young, not the obese) and there is no evidence that hunter-gatherer populations become obese between famines. An alternative idea (called thrifty late) is that selection in famines has only happened since the agricultural revolution. However, this is inconsistent with the absence of strong signatures of selection at single nucleotide polymorphisms linked to obesity. In parallel to discussions about the origin of obesity, there has been much debate regarding the regulation of body weight. There are three basic models: the set-point, settling point and dual-intervention point models. Selection might act against low and high levels of adiposity because food unpredictability and the risk of starvation selects against low adiposity whereas the risk of predation selects against high adiposity. Although evidence for the latter is quite strong, evidence for the former is relatively weak. The release from predation ∼2-million years ago is suggested to have led to the upper intervention point drifting in evolutionary time, leading to the modern distribution of obesity: the drifty gene hypothesis. Recent critiques of the dual-intervention point/drifty gene idea are flawed and inconsistent with known aspects of energy balance physiology. Here, I present a new formulation of the dual-intervention point model. This model includes the novel suggestion that food unpredictability and starvation are insignificant factors driving fat storage, and that the main force driving up fat storage is the risk of disease and the need to survive periods of pathogen-induced anorexia. This model shows why two independent intervention points are more likely to evolve than a single set point. The molecular basis of the lower intervention point is likely based around the leptin pathway signalling. Determining the molecular basis of the upper intervention point is a crucial key target for future obesity research. A potential definitive test to separate the different models is also described.
Collapse
Affiliation(s)
- John R Speakman
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China .,Institute of Biological and Environmental Sciences, University of Aberdeen, Aberdeen, Scotland, UK
| |
Collapse
|
25
|
Durden C, Sullivant S. Identifiability of Phylogenetic Parameters from k-mer Data Under the Coalescent. Bull Math Biol 2018; 81:431-451. [PMID: 29392644 DOI: 10.1007/s11538-018-0399-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2017] [Accepted: 01/19/2018] [Indexed: 11/30/2022]
Abstract
Distances between sequences based on their k-mer frequency counts can be used to reconstruct phylogenies without first computing a sequence alignment. Past work has shown that effective use of k-mer methods depends on (1) model-based corrections to distances based on k-mers and (2) breaking long sequences into blocks to obtain repeated trials from the sequence-generating process. Good performance of such methods is based on having many high-quality blocks with many homologous sites, which can be problematic to guarantee a priori. Nature provides natural blocks of sequences into homologous regions-namely, the genes. However, directly using past work in this setting is problematic because of possible discordance between different gene trees and the underlying species tree. Using the multispecies coalescent model as a basis, we derive model-based moment formulas that involve the species divergence times and the coalescent parameters. From this setting, we prove identifiability results for the tree and branch length parameters under the Jukes-Cantor model of sequence mutations.
Collapse
Affiliation(s)
- Chris Durden
- Department of Mathematics, North Carolina State University, Raleigh, NC, USA
| | - Seth Sullivant
- Department of Mathematics, North Carolina State University, Raleigh, NC, USA.
| |
Collapse
|
26
|
Dalquen DA, Zhu T, Yang Z. Maximum Likelihood Implementation of an Isolation-with-Migration Model for Three Species. Syst Biol 2018; 66:379-398. [PMID: 27486180 DOI: 10.1093/sysbio/syw063] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Accepted: 07/08/2016] [Indexed: 01/03/2023] Open
Abstract
We develop a maximum likelihood (ML) method for estimating migration rates between species using genomic sequence data. A species tree is used to accommodate the phylogenetic relationships among three species, allowing for migration between the two sister species, while the third species is used as an out-group. A Markov chain characterization of the genealogical process of coalescence and migration is used to integrate out the migration histories at each locus analytically, whereas Gaussian quadrature is used to integrate over the coalescent times on each genealogical tree numerically. This is an extension of our early implementation of the symmetrical isolation-with-migration model for three species to accommodate arbitrary loci with two or three sequences per locus and to allow asymmetrical migration rates. Our implementation can accommodate tens of thousands of loci, making it feasible to analyze genome-scale data sets to test for gene flow. We calculate the posterior probabilities of gene trees at individual loci to identify genomic regions that are likely to have been transferred between species due to gene flow. We conduct a simulation study to examine the statistical properties of the likelihood ratio test for gene flow between the two in-group species and of the ML estimates of model parameters such as the migration rate. Inclusion of data from a third out-group species is found to increase dramatically the power of the test and the precision of parameter estimation. We compiled and analyzed several genomic data sets from the Drosophila fruit flies. Our analyses suggest no migration from D. melanogaster to D. simulans, and a significant amount of gene flow from D. simulans to D. melanogaster, at the rate of ~0.02 migrant individuals per generation. We discuss the utility of the multispecies coalescent model for species tree estimation, accounting for incomplete lineage sorting and migration.
Collapse
Affiliation(s)
- Daniel A Dalquen
- Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK
| | - Tianqi Zhu
- Center for Computational Genomics, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK.,Center for Computational Genomics, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
27
|
Comparison of Single Genome and Allele Frequency Data Reveals Discordant Demographic Histories. G3-GENES GENOMES GENETICS 2017; 7:3605-3620. [PMID: 28893846 PMCID: PMC5677151 DOI: 10.1534/g3.117.300259] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Inference of demographic history from genetic data is a primary goal of population genetics of model and nonmodel organisms. Whole genome-based approaches such as the pairwise/multiple sequentially Markovian coalescent methods use genomic data from one to four individuals to infer the demographic history of an entire population, while site frequency spectrum (SFS)-based methods use the distribution of allele frequencies in a sample to reconstruct the same historical events. Although both methods are extensively used in empirical studies and perform well on data simulated under simple models, there have been only limited comparisons of them in more complex and realistic settings. Here we use published demographic models based on data from three human populations (Yoruba, descendants of northwest-Europeans, and Han Chinese) as an empirical test case to study the behavior of both inference procedures. We find that several of the demographic histories inferred by the whole genome-based methods do not predict the genome-wide distribution of heterozygosity, nor do they predict the empirical SFS. However, using simulated data, we also find that the whole genome methods can reconstruct the complex demographic models inferred by SFS-based methods, suggesting that the discordant patterns of genetic variation are not attributable to a lack of statistical power, but may reflect unmodeled complexities in the underlying demography. More generally, our findings indicate that demographic inference from a small number of genomes, routine in genomic studies of nonmodel organisms, should be interpreted cautiously, as these models cannot recapitulate other summaries of the data.
Collapse
|
28
|
Tatsumoto S, Go Y, Fukuta K, Noguchi H, Hayakawa T, Tomonaga M, Hirai H, Matsuzawa T, Agata K, Fujiyama A. Direct estimation of de novo mutation rates in a chimpanzee parent-offspring trio by ultra-deep whole genome sequencing. Sci Rep 2017; 7:13561. [PMID: 29093469 PMCID: PMC5666008 DOI: 10.1038/s41598-017-13919-7] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Accepted: 10/04/2017] [Indexed: 12/30/2022] Open
Abstract
Mutations generate genetic variation and are a major driving force of evolution. Therefore, examining mutation rates and modes are essential for understanding the genetic basis of the physiology and evolution of organisms. Here, we aim to identify germline de novo mutations through the whole-genome surveyance of Mendelian inheritance error sites (MIEs), those not inherited through the Mendelian inheritance manner from either of the parents, using ultra-deep whole genome sequences (>150-fold) from a chimpanzee parent-offspring trio. We identified such 889 MIEs and classified them into four categories based on the pattern of inheritance and the sequence read depth: [i] de novo single nucleotide variants (SNVs), [ii] copy number neutral inherited variants, [iii] hemizygous deletion inherited variants, and [iv] de novo copy number variants (CNVs). From de novo SNV candidates, we estimated a germline de novo SNV mutation rate as 1.48 × 10-8 per site per generation or 0.62 × 10-9 per site per year. In summary, this study demonstrates the significance of ultra-deep whole genome sequencing not only for the direct estimation of mutation rates but also for discerning various mutation modes including de novo allelic conversion and de novo CNVs by identifying MIEs through the transmission of genomes from parents to offspring.
Collapse
Affiliation(s)
- Shoji Tatsumoto
- Department of Brain Sciences, Center for Novel Science Initiatives, National Institutes of Natural Sciences, Okazaki, Aichi, 444-8585, Japan
| | - Yasuhiro Go
- Department of Brain Sciences, Center for Novel Science Initiatives, National Institutes of Natural Sciences, Okazaki, Aichi, 444-8585, Japan. .,Department of System Neuroscience, National Institute for Physiological Sciences, Okazaki, Aichi, 444-8585, Japan. .,Department of Physiological Sciences, School of Life Science, SOKENDAI (The Graduate University for Advanced Studies), Okazaki, Aichi, 484-8585, Japan.
| | - Kentaro Fukuta
- Center for Genome Informatics, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Mishima, Shizuoka, 411-8540, Japan.,Advanced Genomics Center, National Institute of Genetics, Mishima, Shizuoka, 411-8540, Japan
| | - Hideki Noguchi
- Center for Genome Informatics, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Mishima, Shizuoka, 411-8540, Japan.,Advanced Genomics Center, National Institute of Genetics, Mishima, Shizuoka, 411-8540, Japan
| | - Takashi Hayakawa
- Department of Wildlife Science (Nagoya Railroad Co., Ltd.), Primate Research Institute, Kyoto University, Inuyama, Aichi, 484-8506, Japan.,Japan Monkey Centre, Inuyama, Aichi, 484-0081, Japan
| | - Masaki Tomonaga
- Department of Wildlife Science (Nagoya Railroad Co., Ltd.), Primate Research Institute, Kyoto University, Inuyama, Aichi, 484-8506, Japan.,Japan Monkey Centre, Inuyama, Aichi, 484-0081, Japan.,Language and Intelligence Section, Department of Cognitive Sciences, Primate Research Institute, Kyoto University, Inuyama, Aichi, 484-8506, Japan
| | - Hirohisa Hirai
- Molecular Biology Section, Department of Cellular and Molecular Biology, Primate Research Institute, Kyoto University, Inuyama, Aichi, 484-8506, Japan
| | - Tetsuro Matsuzawa
- Department of Wildlife Science (Nagoya Railroad Co., Ltd.), Primate Research Institute, Kyoto University, Inuyama, Aichi, 484-8506, Japan.,Japan Monkey Centre, Inuyama, Aichi, 484-0081, Japan.,Language and Intelligence Section, Department of Cognitive Sciences, Primate Research Institute, Kyoto University, Inuyama, Aichi, 484-8506, Japan.,Institute of Advanced Study, Kyoto University, Kyoto, 606-8501, Japan
| | - Kiyokazu Agata
- Laboratory for Biodiversity, Global COE Program, Graduate School of Science, Kyoto University, Kyoto, 606-8502, Japan.,Laboratory for Molecular Developmental Biology, Graduate School of Science, Kyoto University, Kyoto, 606-8502, Japan.,Graduate Course in Life Science, Gakushuin University, Tokyo, 171-8585, Japan
| | - Asao Fujiyama
- Center for Genome Informatics, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Mishima, Shizuoka, 411-8540, Japan. .,Advanced Genomics Center, National Institute of Genetics, Mishima, Shizuoka, 411-8540, Japan. .,Department of Genetics, School of Life Science, SOKENDAI (The Graduate University for Advanced Studies), Mishima, Shizuoka, 411-8540, Japan.
| |
Collapse
|
29
|
Reiner WB, Masao F, Sholts SB, Songita AV, Stanistreet I, Stollhofen H, Taylor RE, Hlusko LJ. OH 83: A new early modern human fossil cranium from the Ndutu beds of Olduvai Gorge, Tanzania. AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 2017; 164:533-545. [PMID: 28786473 DOI: 10.1002/ajpa.23292] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Revised: 05/02/2017] [Accepted: 07/23/2017] [Indexed: 01/23/2023]
Abstract
OBJECTIVE Herein we introduce a newly recovered partial calvaria, OH 83, from the upper Ndutu Beds of Olduvai Gorge, Tanzania. We present the geological context of its discovery and a comparative analysis of its morphology, placing OH 83 within the context of our current understanding of the origins and evolution of Homo sapiens. MATERIALS AND METHODS We comparatively assessed the morphology of OH 83 using quantitative and qualitative data from penecontemporaneous fossils and the W.W. Howells modern human craniometric dataset. RESULTS OH 83 is geologically dated to ca. 60-32 ka. Its morphology is indicative of an early modern human, falling at the low end of the range of variation for post-orbital cranial breadth, the high end of the range for bifrontal breadth, and near average in frontal length. DISCUSSION There have been numerous attempts to use cranial anatomy to define the species Homo sapiens and identify it in the fossil record. These efforts have not met wide agreement by the scientific community due, in part, to the mosaic patterns of cranial variation represented by the fossils. The variable, mosaic pattern of trait expression in the crania of Middle and Late Pleistocene fossils implies that morphological modernity did not occur at once. However, OH 83 demonstrates that by ca. 60-32 ka modern humans in Africa included individuals that are at the fairly small and gracile range of modern human cranial variation.
Collapse
Affiliation(s)
- Whitney B Reiner
- Department of Integrative Biology, University of California Berkeley, MC 3140, Berkeley, California, 94720
| | - Fidelis Masao
- University of Dar es Salaam, Dar es Salaam, TZ, 35091.,Conservation Olduvai Project, Dar es Salaam, TZ, 35091
| | - Sabrina B Sholts
- Department of Anthropology, National Museum of Natural History, Smithsonian Institution, Washington, DC, 20560
| | | | - Ian Stanistreet
- University of Liverpool, Liverpool, L69 3GP, UK.,The Stone Age Institute, Bloomington, Indiana, 47407
| | - Harald Stollhofen
- GeoZentrum Nordbayern, Universität Erlangen-Nürnberg, Erlangen, 91054, Germany
| | - R E Taylor
- University of California Riverside, Riverside, California, 92521
| | - Leslea J Hlusko
- Department of Integrative Biology, University of California Berkeley, MC 3140, Berkeley, California, 94720
| |
Collapse
|
30
|
Abstract
Major histocompatibility complex (MHC) class I genes are critically involved in the defense against intracellular pathogens. MHC diversity comparisons among samples of closely related taxa may reveal traces of past or ongoing selective processes. The bonobo and chimpanzee are the closest living evolutionary relatives of humans and last shared a common ancestor some 1 mya. However, little is known concerning MHC class I diversity in bonobos or in central chimpanzees, the most numerous and genetically diverse chimpanzee subspecies. Here, we used a long-read sequencing technology (PacBio) to sequence the classical MHC class I genes A, B, C, and A-like in 20 and 30 wild-born bonobos and chimpanzees, respectively, with a main focus on central chimpanzees to assess and compare diversity in those two species. We describe in total 21 and 42 novel coding region sequences for the two species, respectively. In addition, we found evidence for a reduced MHC class I diversity in bonobos as compared to central chimpanzees as well as to western chimpanzees and humans. The reduced bonobo MHC class I diversity may be the result of a selective process in their evolutionary past since their split from chimpanzees.
Collapse
Affiliation(s)
- Vincent Maibach
- Department of Primatology, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103, Leipzig, Germany.
| | - Jörg B Hans
- Department of Primatology, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103, Leipzig, Germany
| | | | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003, Barcelona, Catalonia, Spain
- Catalan Institution of Research and Advanced Studies (ICREA), Passeig de Lluís Companys, 23, 08010, Barcelona, Spain
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028, Barcelona, Spain
| | - Linda Vigilant
- Department of Primatology, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103, Leipzig, Germany
| |
Collapse
|
31
|
Dutta R, Mainsah J, Yatskiv Y, Chakrabortty S, Brennan P, Khuder B, Qiu S, Fedorova L, Fedorov A. Intricacies in arrangement of SNP haplotypes suggest "Great Admixture" that created modern humans. BMC Genomics 2017; 18:433. [PMID: 28583085 PMCID: PMC5741169 DOI: 10.1186/s12864-017-3776-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Accepted: 05/09/2017] [Indexed: 12/22/2022] Open
Abstract
Background Inferring history from genomic sequences is challenging and problematic because chromosomes are mosaics of thousands of small Identicalby-descent (IBD) fragments, each of them having their own unique story. However, the main events in recent evolution might be deciphered from comparative analysis of numerous loci. A paradox of why humans, whose effective population size is only 104, have nearly three million frequent SNPs is formulated and examined. Results We studied 5398 loci evenly covering all human autosomes. Common haplotypes built from frequent SNPs that are present in people from various populations have been examined. We demonstrated highly non-random arrangement of alleles in common haplotypes. Abundance of mutually exclusive pairs of common haplotypes that have different alleles at every polymorphic position (so-called Yin/Yang haplotypes) was found in 56% of loci. A novel widely spread category of common haplotypes named Mosaic has been described. Mosaic consists of numerous pieces of Yin/Yang haplotypes and represents an ancestral stage of one of them. Scenarios of possible appearance of large number of frequent human SNPs and their habitual arrangement in Yin/Yang common haplotypes have been evaluated with an advanced genomic simulation algorithm. Conclusions Computer modeling demonstrated that the observed arrangement of 2.9 million frequent SNPs could not originate from a sole stand-alone population. A “Great Admixture” event has been proposed that can explain peculiarities with frequent SNP distributions. This Great Admixture presumably occurred 100–300 thousand years ago between two ancestral populations that had been separated from each other about a million years ago. Our programs and algorithms can be applied to other species to perform evolutionary and comparative genomics. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3776-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rajib Dutta
- Program in Biomedical Sciences, University of Toledo, Health Science Campus, Toledo, 43614, OH, USA.,Department of Medicine, University of Toledo, Health Science Campus, Toledo, 43614, OH, USA
| | - Joseph Mainsah
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo, Health Science Campus, Toledo, 43614, OH, USA
| | - Yuriy Yatskiv
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo, Health Science Campus, Toledo, 43614, OH, USA
| | - Sharmistha Chakrabortty
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo, Health Science Campus, Toledo, 43614, OH, USA
| | - Patrick Brennan
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo, Health Science Campus, Toledo, 43614, OH, USA
| | - Basil Khuder
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo, Health Science Campus, Toledo, 43614, OH, USA
| | - Shuhao Qiu
- Department of Medicine, University of Toledo, Health Science Campus, Toledo, 43614, OH, USA
| | | | - Alexei Fedorov
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo, Health Science Campus, Toledo, 43614, OH, USA. .,Department of Medicine, University of Toledo, Health Science Campus, Toledo, 43614, OH, USA.
| |
Collapse
|
32
|
Challenges in Species Tree Estimation Under the Multispecies Coalescent Model. Genetics 2017; 204:1353-1368. [PMID: 27927902 DOI: 10.1534/genetics.116.190173] [Citation(s) in RCA: 89] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Accepted: 09/25/2016] [Indexed: 11/18/2022] Open
Abstract
The multispecies coalescent (MSC) model has emerged as a powerful framework for inferring species phylogenies while accounting for ancestral polymorphism and gene tree-species tree conflict. A number of methods have been developed in the past few years to estimate the species tree under the MSC. The full likelihood methods (including maximum likelihood and Bayesian inference) average over the unknown gene trees and accommodate their uncertainties properly but involve intensive computation. The approximate or summary coalescent methods are computationally fast and are applicable to genomic datasets with thousands of loci, but do not make an efficient use of information in the multilocus data. Most of them take the two-step approach of reconstructing the gene trees for multiple loci by phylogenetic methods and then treating the estimated gene trees as observed data, without accounting for their uncertainties appropriately. In this article we review the statistical nature of the species tree estimation problem under the MSC, and explore the conceptual issues and challenges of species tree estimation by focusing mainly on simple cases of three or four closely related species. We use mathematical analysis and computer simulation to demonstrate that large differences in statistical performance may exist between the two classes of methods. We illustrate that several counterintuitive behaviors may occur with the summary methods but they are due to inefficient use of information in the data by summary methods and vanish when the data are analyzed using full-likelihood methods. These include (i) unidentifiability of parameters in the model, (ii) inconsistency in the so-called anomaly zone, (iii) singularity on the likelihood surface, and (iv) deterioration of performance upon addition of more data. We discuss the challenges and strategies of species tree inference for distantly related species when the molecular clock is violated, and highlight the need for improving the computational efficiency and model realism of the likelihood methods as well as the statistical efficiency of the summary methods.
Collapse
|
33
|
Yang M, He Z, Shi S, Wu CI. Can genomic data alone tell us whether speciation happened with gene flow? Mol Ecol 2017; 26:2845-2849. [PMID: 28345182 DOI: 10.1111/mec.14117] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2016] [Revised: 03/08/2017] [Accepted: 03/20/2017] [Indexed: 01/02/2023]
Abstract
The allopatric model, which requires a period of geographical isolation for speciation to complete, has been the standard model in the modern era. Recently, "speciation with gene flow" has been widely discussed in relation to the model of "strict allopatry" and the level of DNA divergence across genomic regions. We wish to caution that genomic data by themselves may only permit the rejection of the simplest form of allopatry. Even a slightly more complex and realistic model that starts with subdivided populations would be impossible to reject by the genomic data alone. To resolve this central issue of speciation, other forms of observations such as the sequencing of reproductive isolation genes or the identification of geographical barrier(s) will be necessary.
Collapse
Affiliation(s)
- Ming Yang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Ziwen He
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Suhua Shi
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Chung-I Wu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China.,Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
| |
Collapse
|
34
|
Yang Z, Rannala B. Bayesian species identification under the multispecies coalescent provides significant improvements to DNA barcoding analyses. Mol Ecol 2017; 26:3028-3036. [DOI: 10.1111/mec.14093] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Revised: 02/15/2017] [Accepted: 02/16/2017] [Indexed: 11/29/2022]
Affiliation(s)
- Ziheng Yang
- Department of Genetics, Evolution and Environment; University College London; Gower Street London WC1E 6BT UK
- College of Life Sciences; Beijing Normal University; Beijing 100875 China
| | - Bruce Rannala
- College of Life Sciences; Beijing Normal University; Beijing 100875 China
- Department of Evolution and Ecology; University of California at Davis; One Shields Avenue Davis CA 95616 USA
| |
Collapse
|
35
|
Choi SC. Methods for delimiting species via population genetics and phylogenetics using genotype data. Genes Genomics 2016. [DOI: 10.1007/s13258-016-0458-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
36
|
Lenz TL, Spirin V, Jordan DM, Sunyaev SR. Excess of Deleterious Mutations around HLA Genes Reveals Evolutionary Cost of Balancing Selection. Mol Biol Evol 2016; 33:2555-64. [PMID: 27436009 PMCID: PMC5026253 DOI: 10.1093/molbev/msw127] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Deleterious mutations are expected to evolve under negative selection and are usually purged from the population. However, deleterious alleles segregate in the human population and some disease-associated variants are maintained at considerable frequencies. Here, we test the hypothesis that balancing selection may counteract purifying selection in neighboring regions and thus maintain deleterious variants at higher frequency than expected from their detrimental fitness effect. We first show in realistic simulations that balancing selection reduces the density of polymorphic sites surrounding a locus under balancing selection, but at the same time markedly increases the population frequency of the remaining variants, including even substantially deleterious alleles. To test the predictions of our simulations empirically, we then use whole-exome sequencing data from 6,500 human individuals and focus on the most established example for balancing selection in the human genome, the major histocompatibility complex (MHC). Our analysis shows an elevated frequency of putatively deleterious coding variants in nonhuman leukocyte antigen (non-HLA) genes localized in the MHC region. The mean frequency of these variants declined with physical distance from the classical HLA genes, indicating dependency on genetic linkage. These results reveal an indirect cost of the genetic diversity maintained by balancing selection, which has hitherto been perceived as mostly advantageous, and have implications both for the evolution of recombination and also for the epidemiology of various MHC-associated diseases.
Collapse
Affiliation(s)
- Tobias L Lenz
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School Evolutionary Immunogenomics, Department of Evolutionary Ecology, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Victor Spirin
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School
| | - Daniel M Jordan
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School
| | - Shamil R Sunyaev
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School Program in Medical and Population Genetics, The Broad Institute, Cambridge, MA
| |
Collapse
|
37
|
Kuchta SR, Brown AD, Converse PE, Highton R. Multilocus Phylogeography and Species Delimitation in the Cumberland Plateau Salamander, Plethodon kentucki: Incongruence among Data Sets and Methods. PLoS One 2016; 11:e0150022. [PMID: 26974148 PMCID: PMC4790894 DOI: 10.1371/journal.pone.0150022] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2015] [Accepted: 02/08/2016] [Indexed: 11/29/2022] Open
Abstract
Species are a fundamental unit of biodiversity, yet can be challenging to delimit objectively. This is particularly true of species complexes characterized by high levels of population genetic structure, hybridization between genetic groups, isolation by distance, and limited phenotypic variation. Previous work on the Cumberland Plateau Salamander, Plethodon kentucki, suggested that it might constitute a species complex despite occupying a relatively small geographic range. To examine this hypothesis, we sampled 135 individuals from 43 populations, and used four mitochondrial loci and five nuclear loci (5693 base pairs) to quantify phylogeographic structure and probe for cryptic species diversity. Rates of evolution for each locus were inferred using the multidistribute package, and time calibrated gene trees and species trees were inferred using BEAST 2 and *BEAST 2, respectively. Because the parameter space relevant for species delimitation is large and complex, and all methods make simplifying assumptions that may lead them to fail, we conducted an array of analyses. Our assumption was that strongly supported species would be congruent across methods. Putative species were first delimited using a Bayesian implementation of the GMYC model (bGMYC), Geneland, and Brownie. We then validated these species using the genealogical sorting index and BPP. We found substantial phylogeographic diversity using mtDNA, including four divergent clades and an inferred common ancestor at 14.9 myr (95% HPD: 10.8–19.7 myr). By contrast, this diversity was not corroborated by nuclear sequence data, which exhibited low levels of variation and weak phylogeographic structure. Species trees estimated a far younger root than did the mtDNA data, closer to 1.0 myr old. Mutually exclusive putative species were identified by the different approaches. Possible causes of data set discordance, and the problem of species delimitation in complexes with high levels of population structure and introgressive hybridization, are discussed.
Collapse
Affiliation(s)
- Shawn R. Kuchta
- Department of Biological Sciences, Ohio Center for Ecology and Evolutionary Studies, Ohio University, Athens, Ohio, United States of America
- * E-mail:
| | - Ashley D. Brown
- Department of Biological Sciences, Ohio Center for Ecology and Evolutionary Studies, Ohio University, Athens, Ohio, United States of America
| | - Paul E. Converse
- Department of Biological Sciences, Ohio Center for Ecology and Evolutionary Studies, Ohio University, Athens, Ohio, United States of America
| | - Richard Highton
- Department of Biology, University of Maryland, College Park, Maryland, United States of America
| |
Collapse
|
38
|
Genomic variation in a widespread Neotropical bird (Xenops minutus) reveals divergence, population expansion, and gene flow. Mol Phylogenet Evol 2014; 83:305-16. [PMID: 25450096 DOI: 10.1016/j.ympev.2014.10.023] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2014] [Revised: 10/29/2014] [Accepted: 10/30/2014] [Indexed: 11/20/2022]
Abstract
The demographic and phylogeographic histories of species provide insight into the processes responsible for generating biological diversity, and genomic datasets are now permitting the estimation of species histories with unprecedented accuracy. We used a genomic single nucleotide polymorphism (SNP) dataset generated using a RAD-Seq method to investigate the historical demography and phylogeography of a widespread lowland Neotropical bird (Xenops minutus). As expected, we found that prominent landscape features that act as dispersal barriers, such as Amazonian rivers and the Andes Mountains, are associated with the deepest phylogeographic breaks, and also that isolation by distance is limited in areas between these barriers. In addition, we inferred positive population growth for most populations and detected evidence of historical gene flow between populations that are now physically isolated. Although we were able to reconstruct the history of Xenops minutus with unprecedented resolution, we had difficulty conclusively relating this history to the landscape events implicated in many Neotropical diversification hypotheses. We suggest that even if many traditional diversification hypotheses remain untestable, investigations using genomic datasets will provide greater resolution of species histories in the Neotropics and elsewhere.
Collapse
|
39
|
Abstract
Recombination allows different parts of the genome to have different genealogical histories. When a species splits in two, allelic lineages sort into the two descendant species, and this lineage sorting varies along the genome. If speciation events are close in time, the lineage sorting process may be incomplete at the second speciation event and lead to gene genealogies that do not match the species phylogeny. We review different recent approaches to model lineage sorting along the genome and show how it is possible to learn about population sizes, natural selection, and recombination rates in ancestral species from application of these models to genome alignments of great ape species.
Collapse
Affiliation(s)
- Thomas Mailund
- Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus C, Denmark; , ,
| | | | | |
Collapse
|
40
|
The limiting distribution of the effective population size of the ancestor of humans and chimpanzees. J Theor Biol 2014; 357:55-61. [DOI: 10.1016/j.jtbi.2014.05.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2013] [Revised: 04/25/2014] [Accepted: 05/05/2014] [Indexed: 11/24/2022]
|
41
|
DeGiorgio M, Lohmueller KE, Nielsen R. A model-based approach for identifying signatures of ancient balancing selection in genetic data. PLoS Genet 2014; 10:e1004561. [PMID: 25144706 PMCID: PMC4140648 DOI: 10.1371/journal.pgen.1004561] [Citation(s) in RCA: 112] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2013] [Accepted: 06/26/2014] [Indexed: 01/19/2023] Open
Abstract
While much effort has focused on detecting positive and negative directional selection in the human genome, relatively little work has been devoted to balancing selection. This lack of attention is likely due to the paucity of sophisticated methods for identifying sites under balancing selection. Here we develop two composite likelihood ratio tests for detecting balancing selection. Using simulations, we show that these methods outperform competing methods under a variety of assumptions and demographic models. We apply the new methods to whole-genome human data, and find a number of previously-identified loci with strong evidence of balancing selection, including several HLA genes. Additionally, we find evidence for many novel candidates, the strongest of which is FANK1, an imprinted gene that suppresses apoptosis, is expressed during meiosis in males, and displays marginal signs of segregation distortion. We hypothesize that balancing selection acts on this locus to stabilize the segregation distortion and negative fitness effects of the distorter allele. Thus, our methods are able to reproduce many previously-hypothesized signals of balancing selection, as well as discover novel interesting candidates. In the past, balancing selection was a topic of great theoretical interest that received much attention. However, there has been little focus toward developing methods to identify regions of the genome that are under balancing selection. In this article, we present the first set of likelihood-based methods that explicitly model the spatial distribution of polymorphism expected near a site under long-term balancing selection. Simulation results show that our methods outperform commonly-used summary statistics for identifying regions under balancing selection. Finally, we performed a scan for balancing selection in Africans and Europeans using our new methods and identified a gene called FANK1 as our top candidate outside the HLA region. We hypothesize that the maintenance of polymorphism at FANK1 is the result of segregation distortion.
Collapse
Affiliation(s)
- Michael DeGiorgio
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania, United States of America
- * E-mail:
| | - Kirk E. Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Rasmus Nielsen
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
- Department of Statistics, University of California, Berkeley, Berkeley, California, United States of America
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
42
|
Yasukochi Y, Satta Y. A human-specific allelic group of the MHC DRB1 gene in primates. J Physiol Anthropol 2014; 33:14. [PMID: 24928070 PMCID: PMC4072476 DOI: 10.1186/1880-6805-33-14] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2013] [Accepted: 05/27/2014] [Indexed: 11/22/2022] Open
Abstract
Background Diversity among human leukocyte antigen (HLA) molecules has been maintained by host-pathogen coevolution over a long period of time. Reflecting this diversity, the HLA loci are the most polymorphic in the human genome. One characteristic of HLA diversity is long-term persistence of allelic lineages, which causes trans-species polymorphisms to be shared among closely related species. Modern humans have disseminated across the world after their exodus from Africa, while chimpanzees have remained in Africa since the speciation event between humans and chimpanzees. It is thought that modern humans have recently acquired resistance to novel pathogens outside Africa. In the present study, we investigated HLA alleles that could contribute to this local adaptation in humans and also studied the contribution of natural selection to human evolution by using molecular data. Results Phylogenetic analysis of HLA-DRB1 genes identified two major groups, HLA Groups A and B. Group A formed a monophyletic clade distinct from DRB1 alleles in other Catarrhini, suggesting that Group A is a human-specific allelic group. Our estimates of divergence time suggested that seven HLA-DRB1 Group A allelic lineages in humans have been maintained since before the speciation event between humans and chimpanzees, while chimpanzees possess only one DRB1 allelic lineage (Patr-DRB1*03), which is a sister group to Group A. Experimental data showed that some Group A alleles bound to peptides derived from human-specific pathogens. Of the Group A alleles, three exist at high frequencies in several local populations outside Africa. Conclusions HLA Group A alleles are likely to have been retained in human lineages for a long period of time and have not expanded since the divergence of humans and chimpanzees. On the other hand, most orthologs of HLA Group A alleles may have been lost in the chimpanzee due to differences in selective pressures. The presence of alleles with high frequency outside of Africa suggests these HLA molecules result from the local adaptations of humans. Our study helps elucidate the mechanism by which the human adaptive immune system has coevolved with pathogens over a long period of time.
Collapse
Affiliation(s)
- Yoshiki Yasukochi
- Molecular and Genetic Epidemiology, Faculty of Medicine, University of Tsukuba, 305-8575 Tsukuba, Ibaraki, Japan.
| | | |
Collapse
|
43
|
Elhassan N, Gebremeskel EI, Elnour MA, Isabirye D, Okello J, Hussien A, Kwiatksowski D, Hirbo J, Tishkoff S, Ibrahim ME. The episode of genetic drift defining the migration of humans out of Africa is derived from a large east African population size. PLoS One 2014; 9:e97674. [PMID: 24845801 PMCID: PMC4028218 DOI: 10.1371/journal.pone.0097674] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Accepted: 04/23/2014] [Indexed: 01/01/2023] Open
Abstract
Human genetic variation particularly in Africa is still poorly understood. This is despite a consensus on the large African effective population size compared to populations from other continents. Based on sequencing of the mitochondrial Cytochrome C Oxidase subunit II (MT-CO2), and genome wide microsatellite data we observe evidence suggesting the effective size (Ne) of humans to be larger than the current estimates, with a foci of increased genetic diversity in east Africa, and a population size of east Africans being at least 2-6 fold larger than other populations. Both phylogenetic and network analysis indicate that east Africans possess more ancestral lineages in comparison to various continental populations placing them at the root of the human evolutionary tree. Our results also affirm east Africa as the likely spot from which migration towards Asia has taken place. The study reflects the spectacular level of sequence variation within east Africans in comparison to the global sample, and appeals for further studies that may contribute towards filling the existing gaps in the database. The implication of these data to current genomic research, as well as the need to carry out defined studies of human genetic variation that includes more African populations; particularly east Africans is paramount.
Collapse
Affiliation(s)
- Nuha Elhassan
- Department of Molecular Biology, Institute of Endemic Diseases, University of Khartoum, Khartoum, Sudan
| | - Eyoab Iyasu Gebremeskel
- Department of Molecular Biology, Institute of Endemic Diseases, University of Khartoum, Khartoum, Sudan
- Department of Biology, Eritrea Institute of Technology, Mai-Nefhi, Eritrea
| | - Mohamed Ali Elnour
- Department of Molecular Biology, Institute of Endemic Diseases, University of Khartoum, Khartoum, Sudan
| | - Dan Isabirye
- Department of Biochemistry, Makerere University, Kampala, Uganda
| | - John Okello
- Department of Biochemistry, Makerere University, Kampala, Uganda
| | - Ayman Hussien
- Department of Molecular Biology, Institute of Endemic Diseases, University of Khartoum, Khartoum, Sudan
| | - Dominic Kwiatksowski
- Welcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Jibril Hirbo
- Department of Genetics and Biology, School of Medicine and School of Arts and Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Sara Tishkoff
- Department of Genetics and Biology, School of Medicine and School of Arts and Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Muntaser E. Ibrahim
- Department of Molecular Biology, Institute of Endemic Diseases, University of Khartoum, Khartoum, Sudan
- * E-mail:
| |
Collapse
|
44
|
Amei A, Smith BT. Robust estimates of divergence times and selection with a poisson random field model: a case study of comparative phylogeographic data. Genetics 2014; 196:225-33. [PMID: 24142896 PMCID: PMC3872187 DOI: 10.1534/genetics.113.157776] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2013] [Accepted: 10/11/2013] [Indexed: 11/18/2022] Open
Abstract
Mutation frequencies can be modeled as a Poisson random field (PRF) to estimate speciation times and the degree of selection on newly arisen mutations. This approach provides a quantitative theory for comparing intraspecific polymorphism with interspecific divergence in the presence of selection and can be used to estimate population genetic parameters. Although the original PRF model has been extended to more general biological settings to make statistical inference about selection and divergence among model organisms, it has not been incorporated into phylogeographic studies that focus on estimating population genetic parameters for nonmodel organisms. Here, we modified a recently developed time-dependent PRF model to independently estimate genetic parameters from a nuclear and mitochondrial DNA data set of 22 sister pairs of birds that have diverged across a biogeographic barrier. We found that species that inhabit humid habitats had more recent divergence times and larger effective population sizes than those that inhabit drier habitats, and divergence time estimated from the PRF model were similar to estimates from a coalescent species-tree approach. Selection coefficients were higher in sister pairs that inhabited drier habitats than in those in humid habitats, but overall the mitochondrial DNA was under weak selection. Our study indicates that PRF models are useful for estimating various population genetic parameters and serve as a framework for incorporating estimates of selection into comparative phylogeographic studies.
Collapse
Affiliation(s)
- Amei Amei
- Department of Mathematical Sciences, University of Nevada, Las Vegas, Nevada 89154
| | - Brian Tilston Smith
- Museum of Natural Science, Louisiana State University, Baton Rouge, Louisiana 70803
| |
Collapse
|
45
|
|
46
|
Schrago CG. The effective population sizes of the anthropoid ancestors of the human-chimpanzee lineage provide insights on the historical biogeography of the great apes. Mol Biol Evol 2013; 31:37-47. [PMID: 24124206 DOI: 10.1093/molbev/mst191] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The recent development of methods that apply coalescent theory to phylogenetic problems has enabled the study of the population-level phenomena that drove the diversification of anthropoid primates. Effective population size, Ne, is one of the main parameters that constitute the theoretical underpinning of these new analytical approaches. For this reason, the ancestral N(e) of selected primate lineages has been thoroughly investigated. However, for some of these lineages, the estimates of ancestral N(e) reported in several studies present significant variation. This is the case for the common ancestor of humans and chimpanzees. Moreover, several ancestral anthropoid lineages have been ignored in the studies conducted so far. Because N(e) is fundamental to understand historic species demography, it is a crucial component of a complete description of the historical scenario of primate evolution. It also provides information that is helpful for differentiating between competing biogeographical hypotheses. In this study, the effective population sizes of the anthropoid ancestors of the human-chimp lineage are inferred using data sets of coding and noncoding sequences. A general pattern of a serial decline of population sizes is found between the ancestral lineage of Anthropoidea and that of Homo and Pan. When the theoretical distribution of gene trees was derived from the parametric estimates obtained, it closely corresponded to the empirical frequency of inferred gene trees along the genome. The most abrupt decrease of N(e) was found between the ancestors of all great apes and those of the African great apes alone. This suggests the occurrence of a genetic bottleneck during the evolution of Homininae, which corroborates the origin of African apes from a Eurasian ancestor.
Collapse
Affiliation(s)
- Carlos G Schrago
- Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
47
|
Loire E, Higuet D, Netter P, Achaz G. Evolution of coding microsatellites in primate genomes. Genome Biol Evol 2013; 5:283-95. [PMID: 23315383 PMCID: PMC3590770 DOI: 10.1093/gbe/evt003] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Microsatellites (SSRs) are highly susceptible to expansions and contractions. When located in a coding sequence, the insertion or the deletion of a single unit for a mono-, di-, tetra-, or penta(nucleotide)-SSR creates a frameshift. As a consequence, one would expect to find only very few of these SSRs in coding sequences because of their strong deleterious potential. Unexpectedly, genomes contain many coding SSRs of all types. Here, we report on a study of their evolution in a phylogenetic context using the genomes of four primates: human, chimpanzee, orangutan, and macaque. In a set of 5,015 orthologous genes unambiguously aligned among the four species, we show that, except for tri- and hexa-SSRs, for which insertions and deletions are frequently observed, SSRs in coding regions evolve mainly by substitutions. We show that the rate of substitution in all types of coding SSRs is typically two times higher than in the rest of coding sequences. Additionally, we observe that although numerous coding SSRs are created and lost by substitutions in the lineages, their numbers remain constant. This last observation suggests that the coding SSRs have reached equilibrium. We hypothesize that this equilibrium involves a combination of mutation, drift, and selection. We thus estimated the fitness cost of mono-SSRs and show that it increases with the number of units. We finally show that the cost of coding mono-SSRs greatly varies from function to function, suggesting that the strength of the selection that acts against them can be correlated to gene functions.
Collapse
Affiliation(s)
- Etienne Loire
- UMR 7138, Systématique, Adaptation, Evolution (UPMC, CNRS, MNHN, IRD), Paris, France
| | | | | | | |
Collapse
|
48
|
Zou XH, Yang Z, Doyle JJ, Ge S. Multilocus estimation of divergence times and ancestral effective population sizes of Oryza species and implications for the rapid diversification of the genus. THE NEW PHYTOLOGIST 2013; 198:1155-1164. [PMID: 23574344 DOI: 10.1111/nph.12230] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/02/2013] [Accepted: 02/08/2013] [Indexed: 06/02/2023]
Abstract
· Despite substantial investigations into Oryza phylogeny and evolution, reliable estimates of the divergence times and ancestral effective population sizes of major lineages in Oryza are challenging. · We sampled sequences of 106 single-copy nuclear genes from all six diploid genomes of Oryza to investigate the divergence times through extensive relaxed molecular clock analyses and estimated the ancestral effective population sizes using maximum likelihood and Bayesian methods. · We estimated that Oryza originated in the middle Miocene (c. 13-15 million years ago; Ma) and obtained an explicit time frame for two rapid diversifications in this genus. The first diversification involving the extant F-/G-genomes and possibly the extinct H-/J-/K-genomes occurred in the middle Miocene immediately after (within < 1 Myr) the origin of Oryza. The second giving rise to the A-/B-/C-genomes happened c. 5-6 Ma. We found that ancestral effective population sizes were much larger than those of extant species in Oryza. · We suggest that the climate fluctuations during the period from the middle Miocene to Pliocene may have contributed to the two rapid diversifications of Oryza species. Such information helps better understand the evolutionary history of Oryza and provides further insights into the pattern and mechanism of diversification in plants in general.
Collapse
Affiliation(s)
- Xin-Hui Zou
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China
| | - Ziheng Yang
- Center for Computational and Evolutionary Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China
- Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London, WC1E 6BT, UK
| | - Jeff J Doyle
- Department of Plant Biology, Cornell University, 412 Mann Library Building, Ithaca, NY, 14853, USA
| | - Song Ge
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China
| |
Collapse
|
49
|
Pease JB, Hahn MW. More accurate phylogenies inferred from low-recombination regions in the presence of incomplete lineage sorting. Evolution 2013; 67:2376-84. [PMID: 23888858 DOI: 10.1111/evo.12118] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2012] [Accepted: 03/20/2013] [Indexed: 12/17/2022]
Abstract
When speciation events occur in rapid succession, incomplete lineage sorting (ILS) can cause disagreement among individual gene trees. The probability that ILS affects a given locus is directly related to its effective population size (Ne ), which in turn is proportional to the recombination rate if there is strong selection across the genome. Based on these expectations, we hypothesized that low-recombination regions of the genome, as well as sex chromosomes and nonrecombining chromosomes, should exhibit lower levels of ILS. We tested this hypothesis in phylogenomic datasets from primates, the Drosophila melanogaster clade, and the Drosophila simulans clade. In all three cases, regions of the genome with low or no recombination showed significantly stronger support for the putative species tree, although results from the X chromosome differed among clades. Our results suggest that recurrent selection is acting in these low-recombination regions, such that current levels of diversity also reflect past decreases in the effective population size at these same loci. The results also demonstrate how considering the genomic context of a gene tree can assist in more accurate determination of the true species phylogeny, especially in cases where a whole-genome phylogeny appears to be an unresolvable polytomy.
Collapse
Affiliation(s)
- James B Pease
- Department of Biology, Indiana University, Bloomington, Indiana 47405, USA.
| | | |
Collapse
|
50
|
Efficient computation in the IM model. J Math Biol 2013; 68:1423-51. [DOI: 10.1007/s00285-013-0671-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2012] [Revised: 03/01/2013] [Indexed: 10/27/2022]
|