1
|
Diamantidis D, Fan WTL, Birkner M, Wakeley J. Bursts of coalescence within population pedigrees whenever big families occur. Genetics 2024; 227:iyae030. [PMID: 38408329 DOI: 10.1093/genetics/iyae030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 01/23/2024] [Accepted: 02/18/2024] [Indexed: 02/28/2024] Open
Abstract
We consider a simple diploid population-genetic model with potentially high variability of offspring numbers among individuals. Specifically, against a backdrop of Wright-Fisher reproduction and no selection, there is an additional probability that a big family occurs, meaning that a pair of individuals has a number of offspring on the order of the population size. We study how the pedigree of the population generated under this model affects the ancestral genetic process of a sample of size two at a single autosomal locus without recombination. Our population model is of the type for which multiple-merger coalescent processes have been described. We prove that the conditional distribution of the pairwise coalescence time given the random pedigree converges to a limit law as the population size tends to infinity. This limit law may or may not be the usual exponential distribution of the Kingman coalescent, depending on the frequency of big families. But because it includes the number and times of big families, it differs from the usual multiple-merger coalescent models. The usual multiple-merger coalescent models are seen as describing the ancestral process marginal to, or averaging over, the pedigree. In the limiting ancestral process conditional on the pedigree, the intervals between big families can be modeled using the Kingman coalescent but each big family causes a discrete jump in the probability of coalescence. Analogous results should hold for larger samples and other population models. We illustrate these results with simulations and additional analysis, highlighting their implications for inference and understanding of multilocus data.
Collapse
Affiliation(s)
| | - Wai-Tong Louis Fan
- Department of Mathematics, Indiana University, Bloomington, IN 47405, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Matthias Birkner
- Institut für Mathematik, Johannes-Gutenberg-Universität, 55099 Mainz, Germany
| | - John Wakeley
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
2
|
Korfmann K, Temple-Boyer M, Sellinger T, Tellier A. Determinants of rapid adaptation in species with large variance in offspring production. Mol Ecol 2024; 33:e16982. [PMID: 37199145 DOI: 10.1111/mec.16982] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 04/26/2023] [Accepted: 05/02/2023] [Indexed: 05/19/2023]
Abstract
The speed of population adaptation to changing biotic and abiotic environments is determined by the interaction between genetic drift, positive selection and linkage effects. Many marine species (fish, crustaceans), invertebrates and pathogens of humans and crops, exhibit sweepstakes reproduction characterized by the production of a very large amount of offspring (fecundity phase) from which only a small fraction may survive to the next generation (viability phase). Using stochastic simulations, we investigate whether the occurrence of sweepstakes reproduction affects the efficiency of a positively selected unlinked locus, and thus, the speed of adaptation since fecundity and/or viability have distinguishable consequences on mutation rate, probability and fixation time of advantageous alleles. We observe that the mean number of mutations at the next generation is always the function of the population size, but the variance increases with stronger sweepstakes reproduction when mutations occur in the parents. On the one hand, stronger sweepstakes reproduction magnifies the effect of genetic drift thus increasing the probability of fixation of neutral allele and decreasing that of selected alleles. On the other hand, the time to fixation of advantageous (as well as neutral) alleles is shortened by stronger sweepstakes reproduction. Importantly, fecundity and viability selection exhibit different probabilities and times to fixation of advantageous alleles under intermediate and weak sweepstakes reproduction. Finally, alleles under both strong fecundity and viability selection display a synergistic efficiency of selection. We conclude that measuring and modelling accurately fecundity and/or viability selection are crucial to predict the adaptive potential of species with sweepstakes reproduction.
Collapse
Affiliation(s)
- Kevin Korfmann
- Professorship for Population Genetics, Department of Life Science Systems, Technical University of Munich, Freising, Germany
| | - Marie Temple-Boyer
- Professorship for Population Genetics, Department of Life Science Systems, Technical University of Munich, Freising, Germany
| | - Thibaut Sellinger
- Professorship for Population Genetics, Department of Life Science Systems, Technical University of Munich, Freising, Germany
- Department of Environment and Biodiversity, Paris Lodron University of Salzburg, Salzburg, Austria
| | - Aurélien Tellier
- Professorship for Population Genetics, Department of Life Science Systems, Technical University of Munich, Freising, Germany
| |
Collapse
|
3
|
Eldon B, Stephan W. Sweepstakes reproduction facilitates rapid adaptation in highly fecund populations. Mol Ecol 2024; 33:e16903. [PMID: 36896794 DOI: 10.1111/mec.16903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 02/21/2023] [Accepted: 02/23/2023] [Indexed: 03/11/2023]
Abstract
Adaptation enables natural populations to survive in a changing environment. Understanding the mechanics of adaptation is therefore crucial for learning about the evolution and ecology of natural populations. We focus on the impact of random sweepstakes on selection in highly fecund haploid and diploid populations partitioned into two genetic types, with one type conferring selective advantage. For the diploid populations, we incorporate various dominance mechanisms. We assume that the populations may experience recurrent bottlenecks. In random sweepstakes, the distribution of individual recruitment success is highly skewed, resulting in a huge variance in the number of offspring contributed by the individuals present in any given generation. Using computer simulations, we investigate the joint effects of random sweepstakes, recurrent bottlenecks and dominance mechanisms on selection. In our framework, bottlenecks allow random sweepstakes to have an effect on the time to fixation, and in diploid populations, the effect of random sweepstakes depends on the dominance mechanism. We describe selective sweepstakes that are approximated by recurrent sweeps of strongly beneficial allelic types arising by mutation. We demonstrate that both types of sweepstakes reproduction may facilitate rapid adaptation (as defined based on the average time to fixation of a type conferring selective advantage conditioned on fixation of the type). However, whether random sweepstakes cause rapid adaptation depends also on their interactions with bottlenecks and dominance mechanisms. Finally, we review a case study in which a model of recurrent sweeps is shown to essentially explain population genomic data from Atlantic cod.
Collapse
Affiliation(s)
- Bjarki Eldon
- Institute of Evolution and Biodiversity Science, Natural History Museum Berlin, Berlin, Germany
| | | |
Collapse
|
4
|
Miró Pina V, Joly É, Siri-Jégousse A. Estimating the Lambda measure in multiple-merger coalescents. Theor Popul Biol 2023; 154:94-101. [PMID: 37742787 DOI: 10.1016/j.tpb.2023.09.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 09/13/2023] [Accepted: 09/15/2023] [Indexed: 09/26/2023]
Abstract
Multiple-merger coalescents, also known as Λ-coalescents, have been used to describe the genealogy of populations that have a skewed offspring distribution or that undergo strong selection. Inferring the characteristic measure Λ, which describes the rates of the multiple-merger events, is key to understand these processes. So far, most inference methods only work for some particular families of Λ-coalescents that are described by only one parameter, but not for more general models. This article is devoted to the construction of a non-parametric estimator of the density of Λ that is based on the observation at a single time of the so-called Site Frequency Spectrum (SFS), which describes the allelic frequencies in a present population sample. First, we produce estimates of the multiple-merger rates by solving a linear system, whose coefficients are obtained by appropriately subsampling the SFS. Then, we use a technique that aggregates the information extracted from the previous step through a kernel type of re-construction to give a non-parametric estimation of the measure Λ. We give a consistency result of this estimator under mild conditions on the behavior of Λ around 0. We also show some numerical examples of how our method performs.
Collapse
Affiliation(s)
- Verónica Miró Pina
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain; Instituto de Investigaciones en Matemáticas Aplicadas y Sistemas, Universidad Nacional Autónoma de México, CDMX, Mexico
| | - Émilien Joly
- Centro de Investigación en Matemáticas, AC (CIMAT), Guanajuato, Mexico
| | - Arno Siri-Jégousse
- Instituto de Investigaciones en Matemáticas Aplicadas y Sistemas, Universidad Nacional Autónoma de México, CDMX, Mexico.
| |
Collapse
|
5
|
Árnason E, Koskela J, Halldórsdóttir K, Eldon B. Sweepstakes reproductive success via pervasive and recurrent selective sweeps. eLife 2023; 12:80781. [PMID: 36806325 PMCID: PMC9940914 DOI: 10.7554/elife.80781] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Accepted: 12/28/2022] [Indexed: 02/22/2023] Open
Abstract
Highly fecund natural populations characterized by high early mortality abound, yet our knowledge about their recruitment dynamics is somewhat rudimentary. This knowledge gap has implications for our understanding of genetic variation, population connectivity, local adaptation, and the resilience of highly fecund populations. The concept of sweepstakes reproductive success, which posits a considerable variance and skew in individual reproductive output, is key to understanding the distribution of individual reproductive success. However, it still needs to be determined whether highly fecund organisms reproduce through sweepstakes and, if they do, the relative roles of neutral and selective sweepstakes. Here, we use coalescent-based statistical analysis of population genomic data to show that selective sweepstakes likely explain recruitment dynamics in the highly fecund Atlantic cod. We show that the Kingman coalescent (modelling no sweepstakes) and the Xi-Beta coalescent (modelling random sweepstakes), including complex demography and background selection, do not provide an adequate fit for the data. The Durrett-Schweinsberg coalescent, in which selective sweepstakes result from recurrent and pervasive selective sweeps of new mutations, offers greater explanatory power. Our results show that models of sweepstakes reproduction and multiple-merger coalescents are relevant and necessary for understanding genetic diversity in highly fecund natural populations. These findings have fundamental implications for understanding the recruitment variation of fish stocks and general evolutionary genomics of high-fecundity organisms.
Collapse
Affiliation(s)
- Einar Árnason
- Institute of Life- and environmental Sciences, University of IcelandReykjavikIceland,Department of Organismal and Evolutionary Biology, Harvard UniversityCambridgeUnited States
| | - Jere Koskela
- Department of Statistics, University of WarwickCoventryUnited Kingdom
| | - Katrín Halldórsdóttir
- Institute of Life- and environmental Sciences, University of IcelandReykjavikIceland
| | - Bjarki Eldon
- Leibniz Institute for Evolution and Biodiversity Science, Museum für NaturkundeBerlinGermany
| |
Collapse
|
6
|
Baumdicker F, Bisschop G, Goldstein D, Gower G, Ragsdale AP, Tsambos G, Zhu S, Eldon B, Ellerman EC, Galloway JG, Gladstein AL, Gorjanc G, Guo B, Jeffery B, Kretzschmar WW, Lohse K, Matschiner M, Nelson D, Pope NS, Quinto-Cortés CD, Rodrigues MF, Saunack K, Sellinger T, Thornton K, van Kemenade H, Wohns AW, Wong Y, Gravel S, Kern AD, Koskela J, Ralph PL, Kelleher J. Efficient ancestry and mutation simulation with msprime 1.0. Genetics 2021; 220:6460344. [PMID: 34897427 PMCID: PMC9176297 DOI: 10.1093/genetics/iyab229] [Citation(s) in RCA: 91] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 12/03/2021] [Indexed: 11/13/2022] Open
Abstract
Stochastic simulation is a key tool in population genetics, since the models involved are often analytically intractable and simulation is usually the only way of obtaining ground-truth data to evaluate inferences. Because of this, a large number of specialized simulation programs have been developed, each filling a particular niche, but with largely overlapping functionality and a substantial duplication of effort. Here, we introduce msprime version 1.0, which efficiently implements ancestry and mutation simulations based on the succinct tree sequence data structure and the tskit library. We summarize msprime’s many features, and show that its performance is excellent, often many times faster and more memory efficient than specialized alternatives. These high-performance features have been thoroughly tested and validated, and built using a collaborative, open source development model, which reduces duplication of effort and promotes software quality via community engagement.
Collapse
Affiliation(s)
- Franz Baumdicker
- Cluster of Excellence "Controlling Microbes to Fight Infections", Mathematical and Computational Population Genetics, University of Tübingen, 72076 Tübingen, Germany
| | - Gertjan Bisschop
- Institute of Evolutionary Biology,The University of Edinburgh, EH9 3FL, UK
| | - Daniel Goldstein
- Khoury College of Computer Sciences, Northeastern University, MA 02115, USA.,No affiliation
| | - Graham Gower
- Lundbeck GeoGenetics Centre, Globe Institute, University of Copenhagen, 1350 Copenhagen K, Denmark
| | - Aaron P Ragsdale
- Department of Integrative Biology, University of Wisconsin-Madison, WI 53706, USA
| | - Georgia Tsambos
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Victoria, 3010, Australia
| | - Sha Zhu
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, OX3 7LF, UK
| | - Bjarki Eldon
- Leibniz Institute for Evolution and Biodiversity Science,Museum für Naturkunde Berlin, 10115, Germany
| | | | - Jared G Galloway
- Institute of Ecology and Evolution, Department of Biology, University of Oregon, OR 97403-5289, USA.,Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98102, USA
| | - Ariella L Gladstein
- Department of Genetics, University of North Carolina at Chapel Hill, NC 27599-7264, USA.,Embark Veterinary, Inc., Boston, MA 02111, USA
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, EH25 9RG, UK
| | - Bing Guo
- Institute for Genome Sciences,University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| | - Ben Jeffery
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, OX3 7LF, UK
| | - Warren W Kretzschmar
- Center for Hematology and Regenerative Medicine, Karolinska Institute, 141 83 Huddinge, Sweden
| | - Konrad Lohse
- Institute of Evolutionary Biology,The University of Edinburgh, EH9 3FL, UK
| | | | - Dominic Nelson
- Department of Human Genetics, McGill University, Montréal, QC H3A 0C7, Canada
| | - Nathaniel S Pope
- Department of Entomology, Pennsylvania State University, PA 16802, USA
| | - Consuelo D Quinto-Cortés
- National Laboratory of Genomics for Biodiversity (LANGEBIO), Unit of Advanced Genomics, CINVESTAV, Irapuato, Mexico
| | - Murillo F Rodrigues
- Institute of Ecology and Evolution, Department of Biology, University of Oregon, OR 97403-5289, USA
| | - Kumar Saunack
- IIT Bombay, Powai, Mumbai 400 076, Maharashtra, India
| | - Thibaut Sellinger
- Professorship for Population Genetics, Department of Life Science Systems, Technical University of Munich, 85354 Freising, Germany
| | - Kevin Thornton
- Ecology and Evolutionary Biology, University of California, Irvine, CA 92697, USA
| | | | - Anthony W Wohns
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, OX3 7LF, UK.,Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Yan Wong
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, OX3 7LF, UK
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montréal, QC H3A 0C7, Canada
| | - Andrew D Kern
- Institute of Ecology and Evolution, Department of Biology, University of Oregon, OR 97403-5289, USA
| | - Jere Koskela
- Department of Statistics, University of Warwick, CV4 7AL, UK
| | - Peter L Ralph
- Institute of Ecology and Evolution, Department of Biology, University of Oregon, OR 97403-5289, USA.,Department of Mathematics, University of Oregon, OR 97403-5289 USA
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, OX3 7LF, UK
| |
Collapse
|
7
|
Multivariate phase-type theory for the site frequency spectrum. J Math Biol 2021; 83:63. [PMID: 34783900 DOI: 10.1007/s00285-021-01689-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 08/09/2021] [Accepted: 10/13/2021] [Indexed: 10/19/2022]
Abstract
Linear functions of the site frequency spectrum (SFS) play a major role for understanding and investigating genetic diversity. Estimators of the mutation rate (e.g. based on the total number of segregating sites or average of the pairwise differences) and tests for neutrality (e.g. Tajima's D) are perhaps the most well-known examples. The distribution of linear functions of the SFS is important for constructing confidence intervals for the estimators, and to determine significance thresholds for neutrality tests. These distributions are often approximated using simulation procedures. In this paper we use multivariate phase-type theory to specify, characterize and calculate the distribution of linear functions of the site frequency spectrum. In particular, we show that many of the classical estimators of the mutation rate are distributed according to a discrete phase-type distribution. Neutrality tests, however, are generally not discrete phase-type distributed. For neutrality tests we derive the probability generating function using continuous multivariate phase-type theory, and numerically invert the function to obtain the distribution. A main result is an analytically tractable formula for the probability generating function of the SFS. Software implementation of the phase-type methodology is available in the R package PhaseTypeR, and R code for the reproduction of our results is available as an accompanying vignette.
Collapse
|
8
|
Freund F, Siri-Jégousse A. The impact of genetic diversity statistics on model selection between coalescents. Comput Stat Data Anal 2021. [DOI: 10.1016/j.csda.2020.107055] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
9
|
Stadler T, Pybus OG, Stumpf MPH. Phylodynamics for cell biologists. Science 2021; 371:371/6526/eaah6266. [PMID: 33446527 DOI: 10.1126/science.aah6266] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Accepted: 08/13/2020] [Indexed: 12/12/2022]
Abstract
Multicellular organisms are composed of cells connected by ancestry and descent from progenitor cells. The dynamics of cell birth, death, and inheritance within an organism give rise to the fundamental processes of development, differentiation, and cancer. Technical advances in molecular biology now allow us to study cellular composition, ancestry, and evolution at the resolution of individual cells within an organism or tissue. Here, we take a phylogenetic and phylodynamic approach to single-cell biology. We explain how "tree thinking" is important to the interpretation of the growing body of cell-level data and how ecological null models can benefit statistical hypothesis testing. Experimental progress in cell biology should be accompanied by theoretical developments if we are to exploit fully the dynamical information in single-cell data.
Collapse
Affiliation(s)
- T Stadler
- Department of Biosystems Science and Engineering, ETH Zürich, Switzerland. .,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - O G Pybus
- Department of Zoology, University of Oxford, Oxford, UK.
| | - M P H Stumpf
- Melbourne Integrative Genomics, School of BioSciences and School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia.
| |
Collapse
|
10
|
Abstract
Natural highly fecund populations abound. These range from viruses to gadids. Many highly fecund populations are economically important. Highly fecund populations provide an important contrast to the low-fecundity organisms that have traditionally been applied in evolutionary studies. A key question regarding high fecundity is whether large numbers of offspring are produced on a regular basis, by few individuals each time, in a sweepstakes mode of reproduction. Such reproduction characteristics are not incorporated into the classical Wright-Fisher model, the standard reference model of population genetics, or similar types of models, in which each individual can produce only small numbers of offspring relative to the population size. The expected genomic footprints of population genetic models of sweepstakes reproduction are very different from those of the Wright-Fisher model. A key, immediate issue involves identifying the footprints of sweepstakes reproduction in genomic data. Whole-genome sequencing data can be used to distinguish the patterns made by sweepstakes reproduction from the patterns made by population growth in a population evolving according to the Wright-Fisher model (or similar models). If the hypothesis of sweepstakes reproduction cannot be rejected, then models of sweepstakes reproduction and associated multiple-merger coalescents will become at least as relevant as the Wright-Fisher model (or similar models) and the Kingman coalescent, the cornerstones of mathematical population genetics, in further discussions of evolutionary genomics of highly fecund populations.
Collapse
Affiliation(s)
- Bjarki Eldon
- Leibniz Institute for Evolution and Biodiversity Science, Museum für Naturkunde, D-10115 Berlin, Germany;
| |
Collapse
|
11
|
Abstract
A variety of methods based on coalescent theory have been developed to infer demographic history from gene sequences sampled from natural populations. The 'skyline plot' and related approaches are commonly employed as flexible prior distributions for phylogenetic trees in the Bayesian analysis of pathogen gene sequences. In this work we extend the classic and generalized skyline plot methods to phylogenies that contain one or more multifurcations (i.e. hard polytomies). We use the theory of Λ-coalescents (specifically, Beta ( 2 - α , α ) -coalescents) to develop the 'multifurcating skyline plot', which estimates a piecewise constant function of effective population size through time, conditional on a time-scaled multifurcating phylogeny. We implement a smoothing procedure and extend the method to serially sampled (heterochronous) data, but we do not address here the problem of estimating trees with multifurcations from gene sequence alignments. We validate our estimator on simulated data using maximum likelihood and find that parameters of the Beta ( 2 - α , α ) -coalescent process can be estimated accurately. Furthermore, we apply the multifurcating skyline plot to simulated trees generated by tracking transmissions in an individual-based model of epidemic superspreading. We find that high levels of superspreading are consistent with the high-variance assumptions underlying Λ-coalescents and that the estimated parameters of the Λ-coalescent model contain information about the degree of superspreading.
Collapse
Affiliation(s)
- Patrick Hoscheit
- MaIAGE, INRA, Université Paris-Saclay, Domaine de Vilvert, Jouy-en-Josas 78350, France
| | - Oliver G Pybus
- Department of Zoology, University of Oxford, Peter Medawar Building, South Parks Road, Oxford OX1 3SY, UK
| |
Collapse
|
12
|
Koskela J, Wilke Berenguer M. Robust model selection between population growth and multiple merger coalescents. Math Biosci 2019; 311:1-12. [PMID: 30851276 DOI: 10.1016/j.mbs.2019.03.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 03/05/2019] [Accepted: 03/05/2019] [Indexed: 11/24/2022]
Abstract
We study the effect of biological confounders on the model selection problem between Kingman coalescents with population growth, and Ξ-coalescents involving simultaneous multiple mergers. We use a low dimensional, computationally tractable summary statistic, dubbed the singleton-tail statistic, to carry out approximate likelihood ratio tests between these model classes. The singleton-tail statistic has been shown to distinguish between them with high power in the simple setting of neutrally evolving, panmictic populations without recombination. We extend this work by showing that cryptic recombination and selection do not diminish the power of the test, but that misspecifying population structure does. Furthermore, we demonstrate that the singleton-tail statistic can also solve the more challenging model selection problem between multiple mergers due to selective sweeps, and multiple mergers due to high fecundity with moderate power of up to 30%.
Collapse
Affiliation(s)
- Jere Koskela
- Department of Statistics, University of Warwick, Coventry CV4 7AL, UK.
| | - Maite Wilke Berenguer
- Fakultät für Mathematik, Ruhr Universität Bochum, Universitätstraße 150, Bochum 44780, Germany.
| |
Collapse
|