1
|
Diamantidis D, Fan WTL, Birkner M, Wakeley J. Bursts of coalescence within population pedigrees whenever big families occur. Genetics 2024; 227:iyae030. [PMID: 38408329 DOI: 10.1093/genetics/iyae030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 01/23/2024] [Accepted: 02/18/2024] [Indexed: 02/28/2024] Open
Abstract
We consider a simple diploid population-genetic model with potentially high variability of offspring numbers among individuals. Specifically, against a backdrop of Wright-Fisher reproduction and no selection, there is an additional probability that a big family occurs, meaning that a pair of individuals has a number of offspring on the order of the population size. We study how the pedigree of the population generated under this model affects the ancestral genetic process of a sample of size two at a single autosomal locus without recombination. Our population model is of the type for which multiple-merger coalescent processes have been described. We prove that the conditional distribution of the pairwise coalescence time given the random pedigree converges to a limit law as the population size tends to infinity. This limit law may or may not be the usual exponential distribution of the Kingman coalescent, depending on the frequency of big families. But because it includes the number and times of big families, it differs from the usual multiple-merger coalescent models. The usual multiple-merger coalescent models are seen as describing the ancestral process marginal to, or averaging over, the pedigree. In the limiting ancestral process conditional on the pedigree, the intervals between big families can be modeled using the Kingman coalescent but each big family causes a discrete jump in the probability of coalescence. Analogous results should hold for larger samples and other population models. We illustrate these results with simulations and additional analysis, highlighting their implications for inference and understanding of multilocus data.
Collapse
Affiliation(s)
| | - Wai-Tong Louis Fan
- Department of Mathematics, Indiana University, Bloomington, IN 47405, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Matthias Birkner
- Institut für Mathematik, Johannes-Gutenberg-Universität, 55099 Mainz, Germany
| | - John Wakeley
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
2
|
Eldon B, Stephan W. Sweepstakes reproduction facilitates rapid adaptation in highly fecund populations. Mol Ecol 2024; 33:e16903. [PMID: 36896794 DOI: 10.1111/mec.16903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 02/21/2023] [Accepted: 02/23/2023] [Indexed: 03/11/2023]
Abstract
Adaptation enables natural populations to survive in a changing environment. Understanding the mechanics of adaptation is therefore crucial for learning about the evolution and ecology of natural populations. We focus on the impact of random sweepstakes on selection in highly fecund haploid and diploid populations partitioned into two genetic types, with one type conferring selective advantage. For the diploid populations, we incorporate various dominance mechanisms. We assume that the populations may experience recurrent bottlenecks. In random sweepstakes, the distribution of individual recruitment success is highly skewed, resulting in a huge variance in the number of offspring contributed by the individuals present in any given generation. Using computer simulations, we investigate the joint effects of random sweepstakes, recurrent bottlenecks and dominance mechanisms on selection. In our framework, bottlenecks allow random sweepstakes to have an effect on the time to fixation, and in diploid populations, the effect of random sweepstakes depends on the dominance mechanism. We describe selective sweepstakes that are approximated by recurrent sweeps of strongly beneficial allelic types arising by mutation. We demonstrate that both types of sweepstakes reproduction may facilitate rapid adaptation (as defined based on the average time to fixation of a type conferring selective advantage conditioned on fixation of the type). However, whether random sweepstakes cause rapid adaptation depends also on their interactions with bottlenecks and dominance mechanisms. Finally, we review a case study in which a model of recurrent sweeps is shown to essentially explain population genomic data from Atlantic cod.
Collapse
Affiliation(s)
- Bjarki Eldon
- Institute of Evolution and Biodiversity Science, Natural History Museum Berlin, Berlin, Germany
| | | |
Collapse
|
3
|
Weber MD, Richards TM, Sutton TT, Carter JE, Eytan RI. Deep-pelagic fishes: Demographic instability in a stable environment. Ecol Evol 2024; 14:e11267. [PMID: 38638366 PMCID: PMC11024635 DOI: 10.1002/ece3.11267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 03/10/2024] [Accepted: 04/01/2024] [Indexed: 04/20/2024] Open
Abstract
Demographic histories are frequently a product of the environment, as populations expand or contract in response to major environmental changes, often driven by changes in climate. Meso- and bathy-pelagic fishes inhabit some of the most temporally and spatially stable habitats on the planet. The stability of the deep-pelagic could make deep-pelagic fishes resistant to the demographic instability commonly reported in fish species inhabiting other marine habitats, however the demographic histories of deep-pelagic fishes are unknown. We reconstructed the historical demography of 11 species of deep-pelagic fishes using mitochondrial and nuclear DNA sequence data. We uncovered widespread evidence of population expansions in our study species, a counterintuitive result based on the nature of deep-pelagic ecosystems. Frequency-based methods detected potential demographic changes in nine species of fishes, while extended Bayesian skyline plots identified population expansions in four species. These results suggest that despite the relatively stable nature of the deep-pelagic environment, the fishes that reside here have likely been impacted by past changes in climate. Further investigation is necessary to better understand how deep-pelagic fishes, by far Earth's most abundant vertebrates, will respond to future climatic changes.
Collapse
Affiliation(s)
- Max D. Weber
- Texas A&M University at GalvestonGalvestonTexasUSA
| | | | | | | | - Ron I. Eytan
- Texas A&M University at GalvestonGalvestonTexasUSA
- Department of Biological SciencesLouisiana State UniversityBaton RougeLouisianaUSA
| |
Collapse
|
4
|
Miró Pina V, Joly É, Siri-Jégousse A. Estimating the Lambda measure in multiple-merger coalescents. Theor Popul Biol 2023; 154:94-101. [PMID: 37742787 DOI: 10.1016/j.tpb.2023.09.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 09/13/2023] [Accepted: 09/15/2023] [Indexed: 09/26/2023]
Abstract
Multiple-merger coalescents, also known as Λ-coalescents, have been used to describe the genealogy of populations that have a skewed offspring distribution or that undergo strong selection. Inferring the characteristic measure Λ, which describes the rates of the multiple-merger events, is key to understand these processes. So far, most inference methods only work for some particular families of Λ-coalescents that are described by only one parameter, but not for more general models. This article is devoted to the construction of a non-parametric estimator of the density of Λ that is based on the observation at a single time of the so-called Site Frequency Spectrum (SFS), which describes the allelic frequencies in a present population sample. First, we produce estimates of the multiple-merger rates by solving a linear system, whose coefficients are obtained by appropriately subsampling the SFS. Then, we use a technique that aggregates the information extracted from the previous step through a kernel type of re-construction to give a non-parametric estimation of the measure Λ. We give a consistency result of this estimator under mild conditions on the behavior of Λ around 0. We also show some numerical examples of how our method performs.
Collapse
Affiliation(s)
- Verónica Miró Pina
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain; Instituto de Investigaciones en Matemáticas Aplicadas y Sistemas, Universidad Nacional Autónoma de México, CDMX, Mexico
| | - Émilien Joly
- Centro de Investigación en Matemáticas, AC (CIMAT), Guanajuato, Mexico
| | - Arno Siri-Jégousse
- Instituto de Investigaciones en Matemáticas Aplicadas y Sistemas, Universidad Nacional Autónoma de México, CDMX, Mexico.
| |
Collapse
|
5
|
Gerard D. Bayesian tests for random mating in polyploids. Mol Ecol Resour 2023; 23:1812-1822. [PMID: 37578636 DOI: 10.1111/1755-0998.13856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 07/24/2023] [Accepted: 08/03/2023] [Indexed: 08/15/2023]
Abstract
Hardy-Weinberg proportions (HWP) are often explored to evaluate the assumption of random mating. However, in autopolyploids, organisms with more than two sets of homologous chromosomes, HWP and random mating are different hypotheses that require different statistical testing approaches. Currently, the only available methods to test for random mating in autopolyploids (i) heavily rely on asymptotic approximations and (ii) assume genotypes are known, ignoring genotype uncertainty. Furthermore, these approaches are all frequentist, and so do not carry the benefits of Bayesian analysis, including ease of interpretability, incorporation of prior information, and consistency under the null. Here, we present Bayesian approaches to test for random mating, bringing the benefits of Bayesian analysis to this problem. Our Bayesian methods also (i) do not rely on asymptotic approximations, being appropriate for small sample sizes, and (ii) optionally account for genotype uncertainty via genotype likelihoods. We validate our methods in simulations and demonstrate on two real datasets how testing for random mating is more useful for detecting genotyping errors than testing for HWP (in a natural population) and testing for Mendelian segregation (in an experimental S1 population). Our methods are implemented in Version 2.0.2 of the hwep R package on the Comprehensive R Archive Network https://cran.r-project.org/package=hwep.
Collapse
Affiliation(s)
- David Gerard
- Department of Mathematics and Statistics, American University, Washington DC, USA
| |
Collapse
|
6
|
Miller L, Pitters HH. Large-scale behaviour and hydrodynamic limit of beta coalescents. ANN APPL PROBAB 2023. [DOI: 10.1214/22-aap1782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
Affiliation(s)
- Luke Miller
- Department of Statistics, University of Oxford
| | | |
Collapse
|
7
|
Árnason E, Koskela J, Halldórsdóttir K, Eldon B. Sweepstakes reproductive success via pervasive and recurrent selective sweeps. eLife 2023; 12:80781. [PMID: 36806325 PMCID: PMC9940914 DOI: 10.7554/elife.80781] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Accepted: 12/28/2022] [Indexed: 02/22/2023] Open
Abstract
Highly fecund natural populations characterized by high early mortality abound, yet our knowledge about their recruitment dynamics is somewhat rudimentary. This knowledge gap has implications for our understanding of genetic variation, population connectivity, local adaptation, and the resilience of highly fecund populations. The concept of sweepstakes reproductive success, which posits a considerable variance and skew in individual reproductive output, is key to understanding the distribution of individual reproductive success. However, it still needs to be determined whether highly fecund organisms reproduce through sweepstakes and, if they do, the relative roles of neutral and selective sweepstakes. Here, we use coalescent-based statistical analysis of population genomic data to show that selective sweepstakes likely explain recruitment dynamics in the highly fecund Atlantic cod. We show that the Kingman coalescent (modelling no sweepstakes) and the Xi-Beta coalescent (modelling random sweepstakes), including complex demography and background selection, do not provide an adequate fit for the data. The Durrett-Schweinsberg coalescent, in which selective sweepstakes result from recurrent and pervasive selective sweeps of new mutations, offers greater explanatory power. Our results show that models of sweepstakes reproduction and multiple-merger coalescents are relevant and necessary for understanding genetic diversity in highly fecund natural populations. These findings have fundamental implications for understanding the recruitment variation of fish stocks and general evolutionary genomics of high-fecundity organisms.
Collapse
Affiliation(s)
- Einar Árnason
- Institute of Life- and environmental Sciences, University of IcelandReykjavikIceland,Department of Organismal and Evolutionary Biology, Harvard UniversityCambridgeUnited States
| | - Jere Koskela
- Department of Statistics, University of WarwickCoventryUnited Kingdom
| | - Katrín Halldórsdóttir
- Institute of Life- and environmental Sciences, University of IcelandReykjavikIceland
| | - Bjarki Eldon
- Leibniz Institute for Evolution and Biodiversity Science, Museum für NaturkundeBerlinGermany
| |
Collapse
|
8
|
Jain K, Kaushik S. Joint effect of changing selection and demography on the site frequency spectrum. Theor Popul Biol 2022; 146:46-60. [PMID: 35809866 DOI: 10.1016/j.tpb.2022.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 06/14/2022] [Accepted: 07/03/2022] [Indexed: 10/17/2022]
Abstract
The site frequency spectrum (SFS) is an important statistic that summarizes the molecular variation in a population, and is used to estimate population-genetic parameters and detect natural selection. Here, we study the SFS in a randomly mating, diploid population in which both the population size and selection coefficient vary periodically with time using a diffusion theory approach, and derive simple analytical expressions for the time-averaged SFS in slowly and rapidly changing environments. We show that for strong selection and in slowly changing environments where the population experiences both positive and negative cycles of the selection coefficient, the time-averaged SFS differs significantly from the equilibrium SFS in a constant environment. The deviation is found to depend on the time spent by the population in the deleterious part of the selection cycle and the phase difference between the selection coefficient and population size, and can be captured by an effective population size.
Collapse
Affiliation(s)
- Kavita Jain
- Theoretical Sciences Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore 560064, India.
| | - Sachin Kaushik
- Theoretical Sciences Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore 560064, India
| |
Collapse
|
9
|
Abstract
We discuss the genetic, demographic, and selective forces that are likely to be at play in restricting observed levels of DNA sequence variation in natural populations to a much smaller range of values than would be expected from the distribution of census population sizes alone-Lewontin's Paradox. While several processes that have previously been strongly emphasized must be involved, including the effects of direct selection and genetic hitchhiking, it seems unlikely that they are sufficient to explain this observation without contributions from other factors. We highlight a potentially important role for the less-appreciated contribution of population size change; specifically, the likelihood that many species and populations may be quite far from reaching the relatively high equilibrium diversity values that would be expected given their current census sizes.
Collapse
Affiliation(s)
- Brian Charlesworth
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
10
|
Baumdicker F, Bisschop G, Goldstein D, Gower G, Ragsdale AP, Tsambos G, Zhu S, Eldon B, Ellerman EC, Galloway JG, Gladstein AL, Gorjanc G, Guo B, Jeffery B, Kretzschumar WW, Lohse K, Matschiner M, Nelson D, Pope NS, Quinto-Cortés CD, Rodrigues MF, Saunack K, Sellinger T, Thornton K, van Kemenade H, Wohns AW, Wong Y, Gravel S, Kern AD, Koskela J, Ralph PL, Kelleher J. Efficient ancestry and mutation simulation with msprime 1.0. Genetics 2022; 220:iyab229. [PMID: 34897427 PMCID: PMC9176297 DOI: 10.1093/genetics/iyab229] [Citation(s) in RCA: 116] [Impact Index Per Article: 58.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 12/03/2021] [Indexed: 11/13/2022] Open
Abstract
Stochastic simulation is a key tool in population genetics, since the models involved are often analytically intractable and simulation is usually the only way of obtaining ground-truth data to evaluate inferences. Because of this, a large number of specialized simulation programs have been developed, each filling a particular niche, but with largely overlapping functionality and a substantial duplication of effort. Here, we introduce msprime version 1.0, which efficiently implements ancestry and mutation simulations based on the succinct tree sequence data structure and the tskit library. We summarize msprime's many features, and show that its performance is excellent, often many times faster and more memory efficient than specialized alternatives. These high-performance features have been thoroughly tested and validated, and built using a collaborative, open source development model, which reduces duplication of effort and promotes software quality via community engagement.
Collapse
Affiliation(s)
- Franz Baumdicker
- Cluster of Excellence “Controlling Microbes to Fight Infections”, Mathematical and Computational Population Genetics, University of Tübingen, 72076 Tübingen, Germany
| | - Gertjan Bisschop
- Institute of Evolutionary Biology, The University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Daniel Goldstein
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Graham Gower
- Lundbeck GeoGenetics Centre, Globe Institute, University of Copenhagen, 1350 Copenhagen K, Denmark
| | - Aaron P Ragsdale
- Department of Integrative Biology, University of Wisconsin–Madison, Madison, WI 53706, USA
| | - Georgia Tsambos
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Parkville, VIC 3010, Australia
| | - Sha Zhu
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| | - Bjarki Eldon
- Leibniz Institute for Evolution and Biodiversity Science, Museum für Naturkunde, Berlin 10115, Germany
| | | | - Jared G Galloway
- Department of Biology, Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403-5289, USA
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98102, USA
| | - Ariella L Gladstein
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7264, USA
- Embark Veterinary, Inc., Boston, MA 02111, USA
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh EH25 9RG, UK
| | - Bing Guo
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Ben Jeffery
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| | - Warren W Kretzschumar
- Center for Hematology and Regenerative Medicine, Karolinska Institute, 141 83 Huddinge, Sweden
| | - Konrad Lohse
- Institute of Evolutionary Biology, The University of Edinburgh, Edinburgh EH9 3FL, UK
| | | | - Dominic Nelson
- Department of Human Genetics, McGill University, Montréal, QC H3A 0C7, Canada
| | - Nathaniel S Pope
- Department of Entomology, Pennsylvania State University, State College, PA 16802, USA
| | - Consuelo D Quinto-Cortés
- National Laboratory of Genomics for Biodiversity (LANGEBIO), Unit of Advanced Genomics, CINVESTAV, Irapuato, Mexico
| | - Murillo F Rodrigues
- Department of Biology, Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403-5289, USA
| | | | - Thibaut Sellinger
- Professorship for Population Genetics, Department of Life Science Systems, Technical University of Munich, 85354 Freising, Germany
| | - Kevin Thornton
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA 92697, USA
| | | | - Anthony W Wohns
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Yan Wong
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montréal, QC H3A 0C7, Canada
| | - Andrew D Kern
- Department of Biology, Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403-5289, USA
| | - Jere Koskela
- Department of Statistics, University of Warwick, Coventry CV4 7AL, UK
| | - Peter L Ralph
- Department of Biology, Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403-5289, USA
- Department of Mathematics, University of Oregon, Eugene, OR 97403-5289, USA
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| |
Collapse
|
11
|
Disanto F, Wiehe T. Measuring the external branches of a Kingman tree: A discrete approach. Theor Popul Biol 2020; 134:92-105. [PMID: 32485202 DOI: 10.1016/j.tpb.2020.05.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Revised: 04/02/2020] [Accepted: 05/21/2020] [Indexed: 10/24/2022]
Abstract
The Kingman coalescent process is a classical model of gene genealogies in population genetics. It generates Yule-distributed, binary ranked tree topologies - also called histories - with a finite number of n leaves, together with n-1 exponentially distributed time lengths: one for each layer of the history. Using a discrete approach, we study the lengths of the external branches of Yule distributed histories, where the length of an external branch is defined as the rank of its parent node. We study the multiplicity of external branches of given length in a random history of n leaves. A correspondence between the external branches of the ordered histories of size n and the non-peak entries of the permutations of size n-1 provides easy access to the length distributions of the first and second longest external branches in a random Yule history and coalescent tree of size n. The length of the longest external branch is also studied in dependence of root balance of a random tree. As a practical application, we compare the observed and expected number of mutations on the longest external branches in samples from natural populations.
Collapse
Affiliation(s)
| | - Thomas Wiehe
- Institut für Genetik, Universität zu Köln, Germany.
| |
Collapse
|
12
|
Morales-Arce AY, Harris RB, Stone AC, Jensen JD. Evaluating the contributions of purifying selection and progeny-skew in dictating within-host Mycobacterium tuberculosis evolution. Evolution 2020; 74:992-1001. [PMID: 32233086 DOI: 10.1111/evo.13954] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 03/08/2020] [Indexed: 12/28/2022]
Abstract
The within-host evolutionary dynamics of tuberculosis (TB) remain unclear, and underlying biological characteristics render standard population genetic approaches based upon the Wright-Fisher model largely inappropriate. In addition, the compact genome combined with an absence of recombination is expected to result in strong purifying selection effects. Thus, it is imperative to establish a biologically relevant evolutionary framework incorporating these factors in order to enable an accurate study of this important human pathogen. Further, such a model is critical for inferring fundamental evolutionary parameters related to patient treatment, including mutation rates and the severity of infection bottlenecks. We here implement such a model and infer the underlying evolutionary parameters governing within-patient evolutionary dynamics. Results demonstrate that the progeny skew associated with the clonal nature of TB severely reduces genetic diversity and that the neglect of this parameter in previous studies has led to significant mis-inference of mutation rates. As such, our results suggest an underlying de novo mutation rate that is considerably faster than previously inferred, and a progeny distribution differing significantly from Wright-Fisher assumptions. This inference represents a more appropriate evolutionary null model, against which the periodic effects of positive selection, associated with drug-resistance for example, may be better assessed.
Collapse
Affiliation(s)
- Ana Y Morales-Arce
- Center for Evolution and Medicine, Arizona State University, Tempe, Arizona, USA
| | - Rebecca B Harris
- Center for Evolution and Medicine, Arizona State University, Tempe, Arizona, USA
| | - Anne C Stone
- Center for Evolution and Medicine, Arizona State University, Tempe, Arizona, USA.,School of Human Evolution and Social Change, Arizona State University, Tempe, Arizona, USA
| | - Jeffrey D Jensen
- Center for Evolution and Medicine, Arizona State University, Tempe, Arizona, USA.,School of Life Sciences, Arizona State University, Tempe, Arizona, USA
| |
Collapse
|
13
|
Diehl CS, Kersting G. Tree lengths for general $\Lambda $-coalescents and the asymptotic site frequency spectrum around the Bolthausen–Sznitman coalescent. ANN APPL PROBAB 2019. [DOI: 10.1214/19-aap1462] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
14
|
Abstract
A variety of methods based on coalescent theory have been developed to infer demographic history from gene sequences sampled from natural populations. The 'skyline plot' and related approaches are commonly employed as flexible prior distributions for phylogenetic trees in the Bayesian analysis of pathogen gene sequences. In this work we extend the classic and generalized skyline plot methods to phylogenies that contain one or more multifurcations (i.e. hard polytomies). We use the theory of Λ-coalescents (specifically, Beta ( 2 - α , α ) -coalescents) to develop the 'multifurcating skyline plot', which estimates a piecewise constant function of effective population size through time, conditional on a time-scaled multifurcating phylogeny. We implement a smoothing procedure and extend the method to serially sampled (heterochronous) data, but we do not address here the problem of estimating trees with multifurcations from gene sequence alignments. We validate our estimator on simulated data using maximum likelihood and find that parameters of the Beta ( 2 - α , α ) -coalescent process can be estimated accurately. Furthermore, we apply the multifurcating skyline plot to simulated trees generated by tracking transmissions in an individual-based model of epidemic superspreading. We find that high levels of superspreading are consistent with the high-variance assumptions underlying Λ-coalescents and that the estimated parameters of the Λ-coalescent model contain information about the degree of superspreading.
Collapse
Affiliation(s)
- Patrick Hoscheit
- MaIAGE, INRA, Université Paris-Saclay, Domaine de Vilvert, Jouy-en-Josas 78350, France
| | - Oliver G Pybus
- Department of Zoology, University of Oxford, Peter Medawar Building, South Parks Road, Oxford OX1 3SY, UK
| |
Collapse
|
15
|
Hobolth A, Siri-Jégousse A, Bladt M. Phase-type distributions in population genetics. Theor Popul Biol 2019; 127:16-32. [PMID: 30822431 DOI: 10.1016/j.tpb.2019.02.001] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Accepted: 02/01/2019] [Indexed: 11/19/2022]
Abstract
Probability modelling for DNA sequence evolution is well established and provides a rich framework for understanding genetic variation between samples of individuals from one or more populations. We show that both classical and more recent models for coalescence (with or without recombination) can be described in terms of the so-called phase-type theory, where complicated and tedious calculations are circumvented by the use of matrix manipulations. The application of phase-type theory in population genetics consists of describing the biological system as a Markov model by appropriately setting up a state space and calculating the corresponding intensity and reward matrices. Formulae of interest are then expressed in terms of these aforementioned matrices. We illustrate this procedure by a number of examples: (a) Calculating the mean, (co)variance and even higher order moments of the site frequency spectrum in multiple merger coalescent models, (b) Analysing a sample of DNA sequences from the Atlantic Cod using the Beta-coalescent, and (c) Determining the correlation of the number of segregating sites for multiple samples in the two-locus ancestral recombination graph. We believe that phase-type theory has great potential as a tool for analysing probability models in population genetics. The compact matrix notation is useful for clarification of current models, and in particular their formal manipulation and calculations, but also for further development or extensions.
Collapse
Affiliation(s)
- Asger Hobolth
- Aarhus University, Bioinformatics Research Center, Denmark.
| | | | - Mogens Bladt
- University of Copenhagen, Department of Mathematical Sciences, Denmark.
| |
Collapse
|
16
|
Inferring Demography and Selection in Organisms Characterized by Skewed Offspring Distributions. Genetics 2019; 211:1019-1028. [PMID: 30651284 DOI: 10.1534/genetics.118.301684] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Accepted: 01/15/2019] [Indexed: 01/01/2023] Open
Abstract
The recent increase in time-series population genomic data from experimental, natural, and ancient populations has been accompanied by a promising growth in methodologies for inferring demographic and selective parameters from such data. However, these methods have largely presumed that the populations of interest are well-described by the Kingman coalescent. In reality, many groups of organisms, including viruses, marine organisms, and some plants, protists, and fungi, typified by high variance in progeny number, may be best characterized by multiple-merger coalescent models. Estimation of population genetic parameters under Wright-Fisher assumptions for these organisms may thus be prone to serious mis-inference. We propose a novel method for the joint inference of demography and selection under the Ψ-coalescent model, termed Multiple-Merger Coalescent Approximate Bayesian Computation, or MMC-ABC. We first demonstrate mis-inference under the Kingman, and then exhibit the superior performance of MMC-ABC under conditions of skewed offspring distributions. In order to highlight the utility of this approach, we reanalyzed previously published drug-selection lines of influenza A virus. We jointly inferred the extent of progeny-skew inherent to viral replication and identified putative drug-resistance mutations.
Collapse
|
17
|
Gnedin A, Iksanov A, Marynych A, Möhle M. The collision spectrum of $\Lambda$-coalescents. ANN APPL PROBAB 2018. [DOI: 10.1214/18-aap1409] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
18
|
Statistical test for detecting overdispersion in offspring number based on kinship information. POPUL ECOL 2018. [DOI: 10.1007/s10144-018-0629-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
19
|
Ferretti L, Klassmann A, Raineri E, Ramos-Onsins SE, Wiehe T, Achaz G. The neutral frequency spectrum of linked sites. Theor Popul Biol 2018; 123:70-79. [PMID: 29964061 DOI: 10.1016/j.tpb.2018.06.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Revised: 06/01/2018] [Accepted: 06/11/2018] [Indexed: 11/28/2022]
Abstract
We introduce the conditional Site Frequency Spectrum (SFS) for a genomic region linked to a focal mutation of known frequency. An exact expression for its expected value is provided for the neutral model without recombination. Its relation with the expected SFS for two sites, 2-SFS, is discussed. These spectra derive from the coalescent approach of Fu (1995) for finite samples, which is reviewed. Remarkably simple expressions are obtained for the linked SFS of a large population, which are also solutions of the multi-allelic Kolmogorov equations. These formulae are the immediate extensions of the well known single site θ∕f neutral SFS. Besides the general interest in these spectra, they relate to relevant biological cases, such as structural variants and introgressions. As an application, a recipe to adapt Tajima's D and other SFS-based neutrality tests to a non-recombining region containing a neutral marker is presented.
Collapse
Affiliation(s)
- Luca Ferretti
- The Pirbright Institute, Woking, United Kingdom; Institut de Systématique, Evolution, Biodiversité, UMR 7205, MNHN and Centre Interdisciplinaire de Recherche en Biologie, UMR 7241, Collége de France, Paris, France.
| | | | - Emanuele Raineri
- CNAG-CRG, Centre for Genomic Regulation (CRG) and UPF, Barcelona, Spain
| | | | - Thomas Wiehe
- Institut für Genetik, Universität zu Köln, Köln, Germany
| | - Guillaume Achaz
- Institut de Systématique, Evolution, Biodiversité, UMR 7205, MNHN and Centre Interdisciplinaire de Recherche en Biologie, UMR 7241, Collége de France, Paris, France
| |
Collapse
|
20
|
Koskela J. Multi-locus data distinguishes between population growth and multiple merger coalescents. Stat Appl Genet Mol Biol 2018; 17:sagmb-2017-0011. [DOI: 10.1515/sagmb-2017-0011] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
We introduce a low dimensional function of the site frequency spectrum that is tailor-made for distinguishing coalescent models with multiple mergers from Kingman coalescent models with population growth, and use this function to construct a hypothesis test between these model classes. The null and alternative sampling distributions of the statistic are intractable, but its low dimensionality renders them amenable to Monte Carlo estimation. We construct kernel density estimates of the sampling distributions based on simulated data, and show that the resulting hypothesis test dramatically improves on the statistical power of a current state-of-the-art method. A key reason for this improvement is the use of multi-locus data, in particular averaging observed site frequency spectra across unlinked loci to reduce sampling variance. We also demonstrate the robustness of our method to nuisance and tuning parameters. Finally we show that the same kernel density estimates can be used to conduct parameter estimation, and argue that our method is readily generalisable for applications in model selection, parameter inference and experimental design.
Collapse
Affiliation(s)
- Jere Koskela
- Department of Statistics , University of Warwick , Coventry, CV4 7AL , UK
| |
Collapse
|
21
|
Coalescent Processes with Skewed Offspring Distributions and Nonequilibrium Demography. Genetics 2017; 208:323-338. [PMID: 29127263 DOI: 10.1534/genetics.117.300499] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Accepted: 10/30/2017] [Indexed: 11/18/2022] Open
Abstract
Nonequilibrium demography impacts coalescent genealogies leaving detectable, well-studied signatures of variation. However, similar genomic footprints are also expected under models of large reproductive skew, posing a serious problem when trying to make inference. Furthermore, current approaches consider only one of the two processes at a time, neglecting any genomic signal that could arise from their simultaneous effects, preventing the possibility of jointly inferring parameters relating to both offspring distribution and population history. Here, we develop an extended Moran model with exponential population growth, and demonstrate that the underlying ancestral process converges to a time-inhomogeneous psi-coalescent. However, by applying a nonlinear change of time scale-analogous to the Kingman coalescent-we find that the ancestral process can be rescaled to its time-homogeneous analog, allowing the process to be simulated quickly and efficiently. Furthermore, we derive analytical expressions for the expected site-frequency spectrum under the time-inhomogeneous psi-coalescent, and develop an approximate-likelihood framework for the joint estimation of the coalescent and growth parameters. By means of extensive simulation, we demonstrate that both can be estimated accurately from whole-genome data. In addition, not accounting for demography can lead to serious biases in the inferred coalescent model, with broad implications for genomic studies ranging from ecology to conservation biology. Finally, we use our method to analyze sequence data from Japanese sardine populations, and find evidence of high variation in individual reproductive success, but few signs of a recent demographic expansion.
Collapse
|
22
|
Evolution of highly fecund haploid populations. Theor Popul Biol 2017; 119:48-56. [PMID: 29111301 DOI: 10.1016/j.tpb.2017.10.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2017] [Revised: 10/14/2017] [Accepted: 10/17/2017] [Indexed: 11/24/2022]
Abstract
We consider a model of viability selection in a highly fecund haploid population with sweepstakes reproduction. We use simulations to estimate the time until the allelic type with highest fitness has reached high frequency in a finite population. We compare the time between two reproduction modes of high and low fecundity. We also consider the probability that the allelic type with highest fitness is lost from the population before reaching high frequency. Our simulation results indicate that highly fecund populations can evolve faster (in some cases much faster) than populations of low fecundity. However, high fecundity and sweepstakes reproduction also confer much higher risk of losing the allelic type with highest fitness from the population by chance. The impact of selection on driving alleles to high frequency varies depending on the trait value conferring highest fitness; in some cases the effect of selection can hardly be detected.
Collapse
|
23
|
The site-frequency spectrum associated with Ξ-coalescents. Theor Popul Biol 2016; 110:36-50. [DOI: 10.1016/j.tpb.2016.04.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2015] [Revised: 04/12/2016] [Accepted: 04/13/2016] [Indexed: 11/24/2022]
|
24
|
Spence JP, Kamm JA, Song YS. The Site Frequency Spectrum for General Coalescents. Genetics 2016; 202:1549-61. [PMID: 26883445 PMCID: PMC4827730 DOI: 10.1534/genetics.115.184101] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Accepted: 02/10/2016] [Indexed: 01/25/2023] Open
Abstract
General genealogical processes such as Λ- and Ξ-coalescents, which respectively model multiple and simultaneous mergers, have important applications in studying marine species, strong positive selection, recurrent selective sweeps, strong bottlenecks, large sample sizes, and so on. Recently, there has been significant progress in developing useful inference tools for such general models. In particular, inference methods based on the site frequency spectrum (SFS) have received noticeable attention. Here, we derive a new formula for the expected SFS for general Λ- and Ξ-coalescents, which leads to an efficient algorithm. For time-homogeneous coalescents, the runtime of our algorithm for computing the expected SFS is O(n2) where n is the sample size. This is a factor of[Formula: see text]faster than the state-of-the-art method. Furthermore, in contrast to existing methods, our method generalizes to time-inhomogeneous Λ- and Ξ-coalescents with measures that factorize as[Formula: see text] and [Formula: see text]respectively, where ζ denotes a strictly positive function of time. The runtime of our algorithm in this setting is[Formula: see text]We also obtain general theoretical results for the identifiability of the Λ measure when ζ is a constant function, as well as for the identifiability of the function ζ under a fixed Ξ measure.
Collapse
Affiliation(s)
- Jeffrey P Spence
- Computational Biology Graduate Group, University of California, Berkeley, California 94720
| | - John A Kamm
- Department of Statistics, University of California, Berkeley, California 94720
| | - Yun S Song
- Department of Statistics, University of California, Berkeley, California 94720 Computer Science Division, University of California, Berkeley, California 94720 Department of Integrative Biology, University of California, Berkeley, California 94720 Department of Mathematics, University of Pennsylvania, Philadelphia, Pennsylvania 19104 Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104
| |
Collapse
|
25
|
Inference Methods for Multiple Merger Coalescents. Evol Biol 2016. [DOI: 10.1007/978-3-319-41324-2_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
26
|
Halldórsdóttir K, Árnason E. Trans-species polymorphism at antimicrobial innate immunity cathelicidin genes of Atlantic cod and related species. PeerJ 2015; 3:e976. [PMID: 26038731 PMCID: PMC4451034 DOI: 10.7717/peerj.976] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2015] [Accepted: 05/05/2015] [Indexed: 12/27/2022] Open
Abstract
Natural selection, the most important force in evolution, comes in three forms. Negative purifying selection removes deleterious variation and maintains adaptations. Positive directional selection fixes beneficial variants, producing new adaptations. Balancing selection maintains variation in a population. Important mechanisms of balancing selection include heterozygote advantage, frequency-dependent advantage of rarity, and local and fluctuating episodic selection. A rare pathogen gains an advantage because host defenses are predominantly effective against prevalent types. Similarly, a rare immune variant gives its host an advantage because the prevalent pathogens cannot escape the host's apostatic defense. Due to the stochastic nature of evolution, neutral variation may accumulate on genealogical branches, but trans-species polymorphisms are rare under neutrality and are strong evidence for balancing selection. Balanced polymorphism maintains diversity at the major histocompatibility complex (MHC) in vertebrates. The Atlantic cod is missing genes for both MHC-II and CD4, vital parts of the adaptive immune system. Nevertheless, cod are healthy in their ecological niche, maintaining large populations that support major commercial fisheries. Innate immunity is of interest from an evolutionary perspective, particularly in taxa lacking adaptive immunity. Here, we analyze extensive amino acid and nucleotide polymorphisms of the cathelicidin gene family in Atlantic cod and closely related taxa. There are three major clusters, Cath1, Cath2, and Cath3, that we consider to be paralogous genes. There is extensive nucleotide and amino acid allelic variation between and within clusters. The major feature of the results is that the variation clusters by alleles and not by species in phylogenetic trees and discriminant analysis of principal components. Variation within the three groups shows trans-species polymorphism that is older than speciation and that is suggestive of balancing selection maintaining the variation. Using Bayesian and likelihood methods positive and negative selection is evident at sites in the conserved part of the genes and, to a larger extent, in the active part which also shows episodic diversifying selection, further supporting the argument for balancing selection.
Collapse
Affiliation(s)
- Katrín Halldórsdóttir
- Institute of Life and Environmental Sciences, University of Iceland, Reykjavík, Iceland
| | - Einar Árnason
- Institute of Life and Environmental Sciences, University of Iceland, Reykjavík, Iceland
| |
Collapse
|
27
|
Árnason E, Halldórsdóttir K. Nucleotide variation and balancing selection at the Ckma gene in Atlantic cod: analysis with multiple merger coalescent models. PeerJ 2015; 3:e786. [PMID: 25755922 PMCID: PMC4349156 DOI: 10.7717/peerj.786] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2014] [Accepted: 02/03/2015] [Indexed: 01/11/2023] Open
Abstract
High-fecundity organisms, such as Atlantic cod, can withstand substantial natural selection and the entailing genetic load of replacing alleles at a number of loci due to their excess reproductive capacity. High-fecundity organisms may reproduce by sweepstakes leading to highly skewed heavy-tailed offspring distribution. Under such reproduction the Kingman coalescent of binary mergers breaks down and models of multiple merger coalescent are more appropriate. Here we study nucleotide variation at the Ckma (Creatine Kinase Muscle type A) gene in Atlantic cod. The gene shows extreme differentiation between the North (Canada, Greenland, Iceland, Norway, Barents Sea) and the South (Faroe Islands, North-, Baltic-, Celtic-, and Irish Seas) with FST > 0.8 between regions whereas neutral loci show no differentiation. This is evidence of natural selection. The protein sequence is conserved by purifying selection whereas silent and non-coding sites show extreme differentiation. The unfolded site-frequency spectrum has three modes, a mode at singleton sites and two high frequency modes at opposite frequencies representing divergent branches of the gene genealogy that is evidence for balancing selection. Analysis with multiple-merger coalescent models can account for the high frequency of singleton sites and indicate reproductive sweepstakes. Coalescent time scales vary with population size and with the inverse of variance in offspring number. Parameter estimates using multiple-merger coalescent models show that times scales are faster than under the Kingman coalescent.
Collapse
Affiliation(s)
- Einar Árnason
- Institute of Life and Environmental Sciences, University of Iceland , Reykjavík , Iceland
| | - Katrín Halldórsdóttir
- Institute of Life and Environmental Sciences, University of Iceland , Reykjavík , Iceland
| |
Collapse
|
28
|
Can the site-frequency spectrum distinguish exponential population growth from multiple-merger coalescents? Genetics 2015; 199:841-56. [PMID: 25575536 DOI: 10.1534/genetics.114.173807] [Citation(s) in RCA: 57] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The ability of the site-frequency spectrum (SFS) to reflect the particularities of gene genealogies exhibiting multiple mergers of ancestral lines as opposed to those obtained in the presence of population growth is our focus. An excess of singletons is a well-known characteristic of both population growth and multiple mergers. Other aspects of the SFS, in particular, the weight of the right tail, are, however, affected in specific ways by the two model classes. Using an approximate likelihood method and minimum-distance statistics, our estimates of statistical power indicate that exponential and algebraic growth can indeed be distinguished from multiple-merger coalescents, even for moderate sample sizes, if the number of segregating sites is high enough. A normalized version of the SFS (nSFS) is also used as a summary statistic in an approximate Bayesian computation (ABC) approach. The results give further positive evidence as to the general eligibility of the SFS to distinguish between the different histories.
Collapse
|
29
|
Tellier A, Lemaire C. Coalescence 2.0: a multiple branching of recent theoretical developments and their applications. Mol Ecol 2014; 23:2637-52. [DOI: 10.1111/mec.12755] [Citation(s) in RCA: 67] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2014] [Revised: 04/08/2014] [Accepted: 04/13/2014] [Indexed: 02/01/2023]
Affiliation(s)
- Aurélien Tellier
- Section of Population Genetics; Center of Life and Food Sciences Weihenstephan; Technische Universität München; 85354 Freising Germany
| | - Christophe Lemaire
- LUNAM; UMR1345 Institut de Recherche en Horticulture et Semences; Université d'Angers; SFR 4207 QUASAV 49045 Angers France
- INRA; UMR1345 Institut de Recherche en Horticulture et Semences; 49071 Beaucouzé France
- AgroCampus-Ouest; UMR1345 Institut de Recherche en Horticulture et Semences; 49045 Angers France
| |
Collapse
|