1
|
Hobolth A, Rivas-González I, Bladt M, Futschik A. Phase-type distributions in mathematical population genetics: An emerging framework. Theor Popul Biol 2024; 157:14-32. [PMID: 38460602 DOI: 10.1016/j.tpb.2024.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Revised: 02/29/2024] [Accepted: 03/04/2024] [Indexed: 03/11/2024]
Abstract
A phase-type distribution is the time to absorption in a continuous- or discrete-time Markov chain. Phase-type distributions can be used as a general framework to calculate key properties of the standard coalescent model and many of its extensions. Here, the 'phases' in the phase-type distribution correspond to states in the ancestral process. For example, the time to the most recent common ancestor and the total branch length are phase-type distributed. Furthermore, the site frequency spectrum follows a multivariate discrete phase-type distribution and the joint distribution of total branch lengths in the two-locus coalescent-with-recombination model is multivariate phase-type distributed. In general, phase-type distributions provide a powerful mathematical framework for coalescent theory because they are analytically tractable using matrix manipulations. The purpose of this review is to explain the phase-type theory and demonstrate how the theory can be applied to derive basic properties of coalescent models. These properties can then be used to obtain insight into the ancestral process, or they can be applied for statistical inference. In particular, we show the relation between classical first-step analysis of coalescent models and phase-type calculations. We also show how reward transformations in phase-type theory lead to easy calculation of covariances and correlation coefficients between e.g. tree height, tree length, external branch length, and internal branch length. Furthermore, we discuss how these quantities can be used for statistical inference based on estimating equations. Providing an alternative to previous work based on the Laplace transform, we derive likelihoods for small-size coalescent trees based on phase-type theory. Overall, our main aim is to demonstrate that phase-type distributions provide a convenient general set of tools to understand aspects of coalescent models that are otherwise difficult to derive. Throughout the review, we emphasize the versatility of the phase-type framework, which is also illustrated by our accompanying R-code. All our analyses and figures can be reproduced from code available on GitHub.
Collapse
Affiliation(s)
- Asger Hobolth
- Department of Mathematics, Aarhus University, Denmark.
| | | | - Mogens Bladt
- Department of Mathematical Sciences, University of Copenhagen, Denmark.
| | - Andreas Futschik
- Institute of Applied Statistics, Johannes Kepler University, Austria.
| |
Collapse
|
2
|
Eldon B, Stephan W. Sweepstakes reproduction facilitates rapid adaptation in highly fecund populations. Mol Ecol 2024; 33:e16903. [PMID: 36896794 DOI: 10.1111/mec.16903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 02/21/2023] [Accepted: 02/23/2023] [Indexed: 03/11/2023]
Abstract
Adaptation enables natural populations to survive in a changing environment. Understanding the mechanics of adaptation is therefore crucial for learning about the evolution and ecology of natural populations. We focus on the impact of random sweepstakes on selection in highly fecund haploid and diploid populations partitioned into two genetic types, with one type conferring selective advantage. For the diploid populations, we incorporate various dominance mechanisms. We assume that the populations may experience recurrent bottlenecks. In random sweepstakes, the distribution of individual recruitment success is highly skewed, resulting in a huge variance in the number of offspring contributed by the individuals present in any given generation. Using computer simulations, we investigate the joint effects of random sweepstakes, recurrent bottlenecks and dominance mechanisms on selection. In our framework, bottlenecks allow random sweepstakes to have an effect on the time to fixation, and in diploid populations, the effect of random sweepstakes depends on the dominance mechanism. We describe selective sweepstakes that are approximated by recurrent sweeps of strongly beneficial allelic types arising by mutation. We demonstrate that both types of sweepstakes reproduction may facilitate rapid adaptation (as defined based on the average time to fixation of a type conferring selective advantage conditioned on fixation of the type). However, whether random sweepstakes cause rapid adaptation depends also on their interactions with bottlenecks and dominance mechanisms. Finally, we review a case study in which a model of recurrent sweeps is shown to essentially explain population genomic data from Atlantic cod.
Collapse
Affiliation(s)
- Bjarki Eldon
- Institute of Evolution and Biodiversity Science, Natural History Museum Berlin, Berlin, Germany
| | | |
Collapse
|
3
|
Miró Pina V, Joly É, Siri-Jégousse A. Estimating the Lambda measure in multiple-merger coalescents. Theor Popul Biol 2023; 154:94-101. [PMID: 37742787 DOI: 10.1016/j.tpb.2023.09.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 09/13/2023] [Accepted: 09/15/2023] [Indexed: 09/26/2023]
Abstract
Multiple-merger coalescents, also known as Λ-coalescents, have been used to describe the genealogy of populations that have a skewed offspring distribution or that undergo strong selection. Inferring the characteristic measure Λ, which describes the rates of the multiple-merger events, is key to understand these processes. So far, most inference methods only work for some particular families of Λ-coalescents that are described by only one parameter, but not for more general models. This article is devoted to the construction of a non-parametric estimator of the density of Λ that is based on the observation at a single time of the so-called Site Frequency Spectrum (SFS), which describes the allelic frequencies in a present population sample. First, we produce estimates of the multiple-merger rates by solving a linear system, whose coefficients are obtained by appropriately subsampling the SFS. Then, we use a technique that aggregates the information extracted from the previous step through a kernel type of re-construction to give a non-parametric estimation of the measure Λ. We give a consistency result of this estimator under mild conditions on the behavior of Λ around 0. We also show some numerical examples of how our method performs.
Collapse
Affiliation(s)
- Verónica Miró Pina
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain; Instituto de Investigaciones en Matemáticas Aplicadas y Sistemas, Universidad Nacional Autónoma de México, CDMX, Mexico
| | - Émilien Joly
- Centro de Investigación en Matemáticas, AC (CIMAT), Guanajuato, Mexico
| | - Arno Siri-Jégousse
- Instituto de Investigaciones en Matemáticas Aplicadas y Sistemas, Universidad Nacional Autónoma de México, CDMX, Mexico.
| |
Collapse
|
4
|
Freund F, Kerdoncuff E, Matuszewski S, Lapierre M, Hildebrandt M, Jensen JD, Ferretti L, Lambert A, Sackton TB, Achaz G. Interpreting the pervasive observation of U-shaped Site Frequency Spectra. PLoS Genet 2023; 19:e1010677. [PMID: 36952570 PMCID: PMC10072462 DOI: 10.1371/journal.pgen.1010677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 04/04/2023] [Accepted: 02/22/2023] [Indexed: 03/25/2023] Open
Abstract
The standard neutral model of molecular evolution has traditionally been used as the null model for population genomics. We gathered a collection of 45 genome-wide site frequency spectra from a diverse set of species, most of which display an excess of low and high frequency variants compared to the expectation of the standard neutral model, resulting in U-shaped spectra. We show that multiple merger coalescent models often provide a better fit to these observations than the standard Kingman coalescent. Hence, in many circumstances these under-utilized models may serve as the more appropriate reference for genomic analyses. We further discuss the underlying evolutionary processes that may result in the widespread U-shape of frequency spectra.
Collapse
Affiliation(s)
- Fabian Freund
- Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, Stuttgart, Germany
- Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom
| | - Elise Kerdoncuff
- Department of Genetics, University of California, Berkeley, California, United States of America
- Informatics Group, Harvard University, Cambridge, Massachusetts, United States of America
| | | | - Marguerite Lapierre
- Informatics Group, Harvard University, Cambridge, Massachusetts, United States of America
| | | | - Jeffrey D Jensen
- Center for Evolution & Medicine, School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | - Luca Ferretti
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Amaury Lambert
- Institut de Biologie de l'ENS (IBENS), École Normale Supérieure, Paris, France
- Informatics Group, Harvard University, Cambridge, Massachusetts, United States of America
| | - Timothy B Sackton
- Éco-anthropologie, Muséum National d'Histoire Naturelle, Université Paris-Cité, Paris, France
| | - Guillaume Achaz
- Informatics Group, Harvard University, Cambridge, Massachusetts, United States of America
- SMILE group, Center for Interdisciplinary Research in Biology (CIRB), Collège de France, Paris, France
| |
Collapse
|
5
|
The shape of a seed bank tree. J Appl Probab 2022. [DOI: 10.1017/jpr.2021.79] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Abstract
We derive the asymptotic behavior of the total, active, and inactive branch lengths of the seed bank coalescent when the initial sample size grows to infinity. These random variables have important applications for populations evolving under some seed bank effects, such as plants and bacteria, and for some cases of structured populations like metapopulations. The proof relies on the analysis of the tree at a stopping time corresponding to the first time a deactivated lineage is reactivated. We also give conditional sampling formulas for the random partition, and we study the system at the time of the first reactivation of a lineage. All these results provide a good picture of the different regimes and behaviors of the block-counting process of the seed bank coalescent.
Collapse
|