1
|
Roberts I, Everitt RG, Koskela J, Didelot X. Bayesian Inference of Pathogen Phylogeography using the Structured Coalescent Model. PLoS Comput Biol 2025; 21:e1012995. [PMID: 40258093 PMCID: PMC12040344 DOI: 10.1371/journal.pcbi.1012995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2024] [Revised: 04/29/2025] [Accepted: 03/25/2025] [Indexed: 04/23/2025] Open
Abstract
Over the past decade, pathogen genome sequencing has become well established as a powerful approach to study infectious disease epidemiology. In particular, when multiple genomes are available from several geographical locations, comparing them is informative about the relative size of the local pathogen populations as well as past migration rates and events between locations. The structured coalescent model has a long history of being used as the underlying process for such phylogeographic analysis. However, the computational cost of using this model does not scale well to the large number of genomes frequently analysed in pathogen genomic epidemiology studies. Several approximations of the structured coalescent model have been proposed, but their effects are difficult to predict. Here we show how the exact structured coalescent model can be used to analyse a precomputed dated phylogeny, in order to perform Bayesian inference on the past migration history, the effective population sizes in each location, and the directed migration rates from any location to another. We describe an efficient reversible jump Markov Chain Monte Carlo scheme which is implemented in a new R package StructCoalescent. We use simulations to demonstrate the scalability and correctness of our method and to compare it with existing software. We also applied our new method to several state-of-the-art datasets on the population structure of real pathogens to showcase the relevance of our method to current data scales and research questions.
Collapse
Affiliation(s)
- Ian Roberts
- Department of Statistics, University of Warwick, Coventry, United Kingdom
| | - Richard G. Everitt
- Department of Statistics, University of Warwick, Coventry, United Kingdom
- Zeeman Institute for Systems Biology and Infectious Disease Epidemiology Research (SBIDER), University of Warwick, Coventry, United Kingdom
| | - Jere Koskela
- Department of Statistics, University of Warwick, Coventry, United Kingdom
- School of Mathematics, Statistics and Physics, Newcastle University, Newcastle, United Kingdom
| | - Xavier Didelot
- Department of Statistics, University of Warwick, Coventry, United Kingdom
- Zeeman Institute for Systems Biology and Infectious Disease Epidemiology Research (SBIDER), University of Warwick, Coventry, United Kingdom
- School of Life Sciences, University of Warwick, Coventry, United Kingdom
| |
Collapse
|
2
|
Nayak SS, Rajawat D, Jain K, Sharma A, Gondro C, Tarafdar A, Dutt T, Panigrahi M. A comprehensive review of livestock development: insights into domestication, phylogenetics, diversity, and genomic advances. Mamm Genome 2024; 35:577-599. [PMID: 39397083 DOI: 10.1007/s00335-024-10075-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Accepted: 09/27/2024] [Indexed: 10/15/2024]
Abstract
Livestock plays an essential role in sustaining human livelihoods, offering a diverse range of species integral to food security, economic stability, and cultural traditions. The domestication of livestock, which began over 10,000 years ago, has driven significant genetic changes in species such as cattle, buffaloes, sheep, goats, and pigs. Recent advancements in genomic technologies, including next-generation sequencing (NGS), genome-wide association studies (GWAS), and genomic selection, have dramatically enhanced our understanding of these genetic developments. This review brings together key research on the domestication process, phylogenetics, genetic diversity, and selection signatures within major livestock species. It emphasizes the importance of admixture studies and evolutionary forces like natural selection, genetic drift, and gene flow in shaping livestock populations. Additionally, the integration of machine learning with genomic data offers new perspectives on the functional roles of genes in adaptation and evolution. By exploring these genomic advancements, this review provides insights into genetic variation and evolutionary processes that could inform future approaches to improving livestock management and adaptation to environmental challenges, including climate change.
Collapse
Affiliation(s)
- Sonali Sonejita Nayak
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, Bareilly, 243122, UP, India
| | - Divya Rajawat
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, Bareilly, 243122, UP, India
| | - Karan Jain
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, Bareilly, 243122, UP, India
| | - Anurodh Sharma
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, Bareilly, 243122, UP, India
| | - Cedric Gondro
- Department of Animal Science, Michigan State University, East Lansing, MI, 48824, USA
| | - Ayon Tarafdar
- Livestock Production and Management Section, Indian Veterinary Research Institute, Izatnagar, Bareilly, 243122, UP, India
| | - Triveni Dutt
- Livestock Production and Management Section, Indian Veterinary Research Institute, Izatnagar, Bareilly, 243122, UP, India
| | - Manjit Panigrahi
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, Bareilly, 243122, UP, India.
| |
Collapse
|
3
|
Allen B, McAvoy A. The coalescent in finite populations with arbitrary, fixed structure. Theor Popul Biol 2024; 158:150-169. [PMID: 38880430 DOI: 10.1016/j.tpb.2024.06.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 06/03/2024] [Accepted: 06/12/2024] [Indexed: 06/18/2024]
Abstract
The coalescent is a stochastic process representing ancestral lineages in a population undergoing neutral genetic drift. Originally defined for a well-mixed population, the coalescent has been adapted in various ways to accommodate spatial, age, and class structure, along with other features of real-world populations. To further extend the range of population structures to which coalescent theory applies, we formulate a coalescent process for a broad class of neutral drift models with arbitrary - but fixed - spatial, age, sex, and class structure, haploid or diploid genetics, and any fixed mating pattern. Here, the coalescent is represented as a random sequence of mappings [Formula: see text] from a finite set G to itself. The set G represents the "sites" (in individuals, in particular locations and/or classes) at which these alleles can live. The state of the coalescent, Ct:G→G, maps each site g∈G to the site containing g's ancestor, t time-steps into the past. Using this representation, we define and analyze coalescence time, coalescence branch length, mutations prior to coalescence, and stationary probabilities of identity-by-descent and identity-by-state. For low mutation, we provide a recipe for computing identity-by-descent and identity-by-state probabilities via the coalescent. Applying our results to a diploid population with arbitrary sex ratio r, we find that measures of genetic dissimilarity, among any set of sites, are scaled by 4r(1-r) relative to the even sex ratio case.
Collapse
Affiliation(s)
- Benjamin Allen
- Department of Mathematics, Emmanuel College, 400 The Fenway, Boston, MA, 02115, USA.
| | - Alex McAvoy
- School of Data Science and Society, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA; Department of Mathematics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| |
Collapse
|
4
|
Lavretsky P, Mohl JE, Söderquist P, Kraus RHS, Schummer ML, Brown JI. The meaning of wild: Genetic and adaptive consequences from large-scale releases of domestic mallards. Commun Biol 2023; 6:819. [PMID: 37543640 PMCID: PMC10404241 DOI: 10.1038/s42003-023-05170-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 07/24/2023] [Indexed: 08/07/2023] Open
Abstract
The translocation of individuals around the world is leading to rising incidences of anthropogenic hybridization, particularly between domestic and wild congeners. We apply a landscape genomics approach for thousands of mallard (Anas platyrhynchos) samples across continental and island populations to determine the result of over a century of supplementation practices. We establish that a single domestic game-farm mallard breed is the source for contemporary release programs in Eurasia and North America, as well as for established feral populations in New Zealand and Hawaii. In particular, we identify central Europe and eastern North America as epicenters of ongoing anthropogenic hybridization, and conclude that the release of game-farm mallards continues to affect the genetic integrity of wild mallards. Conversely, self-sustaining feral populations in New Zealand and Hawaii not only show strong differentiation from their original stock, but also signatures of local adaptation occurring in less than a half-century since game-farm mallard releases have ceased. We conclude that 'wild' is not singular, and that even feral populations are capable of responding to natural processes. Although considered paradoxical to biological conservation, understanding the capacity for wildness among feral and feral admixed populations in human landscapes is critical as such interactions increase in the Anthropocene.
Collapse
Affiliation(s)
- Philip Lavretsky
- Department of Biological Sciences, University of Texas at El Paso, El Paso, TX, 79668, USA.
| | - Jonathon E Mohl
- Department of Mathematical Sciences, University of Texas at El Paso, El Paso, TX, 79668, USA
| | - Pär Söderquist
- Faculty of Natural Sciences, Kristianstad University, SE- 291 88, Kristianstad, Sweden
| | - Robert H S Kraus
- Department of Migration, Max Planck Institute of Animal Behavior, 78315, Radolfzell, Germany
| | - Michael L Schummer
- Department of Environmental Biology, State University of New York College of Environmental Science and Forestry, Syracuse, NY, 13210, USA
| | - Joshua I Brown
- Department of Biological Sciences, University of Texas at El Paso, El Paso, TX, 79668, USA
| |
Collapse
|
5
|
Didelot X, Franceschi V, Frost SDW, Dennis A, Volz EM. Model design for nonparametric phylodynamic inference and applications to pathogen surveillance. Virus Evol 2023; 9:vead028. [PMID: 37229349 PMCID: PMC10205094 DOI: 10.1093/ve/vead028] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 04/17/2023] [Accepted: 04/26/2023] [Indexed: 05/27/2023] Open
Abstract
Inference of effective population size from genomic data can provide unique information about demographic history and, when applied to pathogen genetic data, can also provide insights into epidemiological dynamics. The combination of nonparametric models for population dynamics with molecular clock models which relate genetic data to time has enabled phylodynamic inference based on large sets of time-stamped genetic sequence data. The methodology for nonparametric inference of effective population size is well-developed in the Bayesian setting, but here we develop a frequentist approach based on nonparametric latent process models of population size dynamics. We appeal to statistical principles based on out-of-sample prediction accuracy in order to optimize parameters that control shape and smoothness of the population size over time. Our methodology is implemented in a new R package entitled mlesky. We demonstrate the flexibility and speed of this approach in a series of simulation experiments and apply the methodology to a dataset of HIV-1 in the USA. We also estimate the impact of non-pharmaceutical interventions for COVID-19 in England using thousands of SARS-CoV-2 sequences. By incorporating a measure of the strength of these interventions over time within the phylodynamic model, we estimate the impact of the first national lockdown in the UK on the epidemic reproduction number.
Collapse
Affiliation(s)
- Xavier Didelot
- School of Life Sciences and Department of Statistics, University of Warwick, United Kingdom
| | - Vinicius Franceschi
- Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, United Kingdom
| | | | - Ann Dennis
- Department of Medicine, University of North Carolina, USA
| | - Erik M Volz
- Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, United Kingdom
| |
Collapse
|
6
|
Tang M, Dudas G, Bedford T, Minin VN. Fitting stochastic epidemic models to gene genealogies using linear noise approximation. Ann Appl Stat 2023. [DOI: 10.1214/21-aoas1583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Affiliation(s)
- Mingwei Tang
- Department of Statistics, University of Washington, Seattle
| | - Gytis Dudas
- Gothenburg Global Biodiversity Centre (GGBC)
| | - Trevor Bedford
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center
| | | |
Collapse
|
7
|
Didelot X, Helekal D, Kendall M, Ribeca P. Distinguishing imported cases from locally acquired cases within a geographically limited genomic sample of an infectious disease. Bioinformatics 2023; 39:btac761. [PMID: 36440957 PMCID: PMC9805578 DOI: 10.1093/bioinformatics/btac761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 11/17/2022] [Accepted: 11/24/2022] [Indexed: 11/30/2022] Open
Abstract
MOTIVATION The ability to distinguish imported cases from locally acquired cases has important consequences for the selection of public health control strategies. Genomic data can be useful for this, for example, using a phylogeographic analysis in which genomic data from multiple locations are compared to determine likely migration events between locations. However, these methods typically require good samples of genomes from all locations, which is rarely available. RESULTS Here, we propose an alternative approach that only uses genomic data from a location of interest. By comparing each new case with previous cases from the same location, we are able to detect imported cases, as they have a different genealogical distribution than that of locally acquired cases. We show that, when variations in the size of the local population are accounted for, our method has good sensitivity and excellent specificity for the detection of imports. We applied our method to data simulated under the structured coalescent model and demonstrate relatively good performance even when the local population has the same size as the external population. Finally, we applied our method to several recent genomic datasets from both bacterial and viral pathogens, and show that it can, in a matter of seconds or minutes, deliver important insights on the number of imports to a geographically limited sample of a pathogen population. AVAILABILITY AND IMPLEMENTATION The R package DetectImports is freely available from https://github.com/xavierdidelot/DetectImports. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xavier Didelot
- School of Life Sciences and Department of Statistics, University of Warwick, Coventry CV4 7AL, UK
| | - David Helekal
- Centre for Doctoral Training in Mathematics for Real-World Systems, University of Warwick, Coventry CV4 7AL, UK
| | - Michelle Kendall
- School of Life Sciences and Department of Statistics, University of Warwick, Coventry CV4 7AL, UK
| | - Paolo Ribeca
- Gastrointestinal Bacteria Reference Unit, UK Health Security Agency, London NW9 5EQ, UK
- Biomathematics and Statistics Scotland, The James Hutton Institute, Edinburgh EH9 3FD, UK
| |
Collapse
|
8
|
Árnason E, Koskela J, Halldórsdóttir K, Eldon B. Sweepstakes reproductive success via pervasive and recurrent selective sweeps. eLife 2023; 12:80781. [PMID: 36806325 PMCID: PMC9940914 DOI: 10.7554/elife.80781] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Accepted: 12/28/2022] [Indexed: 02/22/2023] Open
Abstract
Highly fecund natural populations characterized by high early mortality abound, yet our knowledge about their recruitment dynamics is somewhat rudimentary. This knowledge gap has implications for our understanding of genetic variation, population connectivity, local adaptation, and the resilience of highly fecund populations. The concept of sweepstakes reproductive success, which posits a considerable variance and skew in individual reproductive output, is key to understanding the distribution of individual reproductive success. However, it still needs to be determined whether highly fecund organisms reproduce through sweepstakes and, if they do, the relative roles of neutral and selective sweepstakes. Here, we use coalescent-based statistical analysis of population genomic data to show that selective sweepstakes likely explain recruitment dynamics in the highly fecund Atlantic cod. We show that the Kingman coalescent (modelling no sweepstakes) and the Xi-Beta coalescent (modelling random sweepstakes), including complex demography and background selection, do not provide an adequate fit for the data. The Durrett-Schweinsberg coalescent, in which selective sweepstakes result from recurrent and pervasive selective sweeps of new mutations, offers greater explanatory power. Our results show that models of sweepstakes reproduction and multiple-merger coalescents are relevant and necessary for understanding genetic diversity in highly fecund natural populations. These findings have fundamental implications for understanding the recruitment variation of fish stocks and general evolutionary genomics of high-fecundity organisms.
Collapse
Affiliation(s)
- Einar Árnason
- Institute of Life- and environmental Sciences, University of IcelandReykjavikIceland,Department of Organismal and Evolutionary Biology, Harvard UniversityCambridgeUnited States
| | - Jere Koskela
- Department of Statistics, University of WarwickCoventryUnited Kingdom
| | - Katrín Halldórsdóttir
- Institute of Life- and environmental Sciences, University of IcelandReykjavikIceland
| | - Bjarki Eldon
- Leibniz Institute for Evolution and Biodiversity Science, Museum für NaturkundeBerlinGermany
| |
Collapse
|
9
|
Strugnell JM, McGregor HV, Wilson NG, Meredith KT, Chown SL, Lau SCY, Robinson SA, Saunders KM. Emerging biological archives can reveal ecological and climatic change in Antarctica. GLOBAL CHANGE BIOLOGY 2022; 28:6483-6508. [PMID: 35900301 PMCID: PMC9826052 DOI: 10.1111/gcb.16356] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Accepted: 06/27/2022] [Indexed: 06/15/2023]
Abstract
Anthropogenic climate change is causing observable changes in Antarctica and the Southern Ocean including increased air and ocean temperatures, glacial melt leading to sea-level rise and a reduction in salinity, and changes to freshwater water availability on land. These changes impact local Antarctic ecosystems and the Earth's climate system. The Antarctic has experienced significant past environmental change, including cycles of glaciation over the Quaternary Period (the past ~2.6 million years). Understanding Antarctica's paleoecosystems, and the corresponding paleoenvironments and climates that have shaped them, provides insight into present day ecosystem change, and importantly, helps constrain model projections of future change. Biological archives such as extant moss beds and peat profiles, biological proxies in lake and marine sediments, vertebrate animal colonies, and extant terrestrial and benthic marine invertebrates, complement other Antarctic paleoclimate archives by recording the nature and rate of past ecological change, the paleoenvironmental drivers of that change, and constrain current ecosystem and climate models. These archives provide invaluable information about terrestrial ice-free areas, a key location for Antarctic biodiversity, and the continental margin which is important for understanding ice sheet dynamics. Recent significant advances in analytical techniques (e.g., genomics, biogeochemical analyses) have led to new applications and greater power in elucidating the environmental records contained within biological archives. Paleoecological and paleoclimate discoveries derived from biological archives, and integration with existing data from other paleoclimate data sources, will significantly expand our understanding of past, present, and future ecological change, alongside climate change, in a unique, globally significant region.
Collapse
Affiliation(s)
- Jan M. Strugnell
- Centre for Sustainable Tropical Fisheries and Aquaculture and College of Science and EngineeringJames Cook UniversityTownsvilleQueenslandAustralia
- Securing Antarctica's Environmental FutureJames Cook UniversityTownsvilleQueenslandAustralia
| | - Helen V. McGregor
- Securing Antarctica's Environmental Future, School of Earth, Atmospheric and Life SciencesUniversity of WollongongWollongongNew South WalesAustralia
| | - Nerida G. Wilson
- Securing Antarctica's Environmental FutureWestern Australian MuseumWestern AustraliaAustralia
- Research and CollectionsWestern Australian MuseumWestern AustraliaAustralia
- School of Biological SciencesUniversity of Western AustraliaCrawleyWestern AustraliaAustralia
| | - Karina T. Meredith
- Securing Antarctica's Environmental FutureAustralian Nuclear Science and Technology OrganisationLucas HeightsNew South WalesAustralia
| | - Steven L. Chown
- Securing Antarctica's Environmental Future, School of Biological SciencesMonash UniversityMelbourneVictoriaAustralia
| | - Sally C. Y. Lau
- Centre for Sustainable Tropical Fisheries and Aquaculture and College of Science and EngineeringJames Cook UniversityTownsvilleQueenslandAustralia
- Securing Antarctica's Environmental FutureJames Cook UniversityTownsvilleQueenslandAustralia
| | - Sharon A. Robinson
- Securing Antarctica's Environmental Future, School of Earth, Atmospheric and Life SciencesUniversity of WollongongWollongongNew South WalesAustralia
| | - Krystyna M. Saunders
- Securing Antarctica's Environmental Future, School of Earth, Atmospheric and Life SciencesUniversity of WollongongWollongongNew South WalesAustralia
- Securing Antarctica's Environmental FutureAustralian Nuclear Science and Technology OrganisationLucas HeightsNew South WalesAustralia
- Institute for Marine and Antarctic StudiesUniversity of TasmaniaHobartTasmaniaAustralia
| |
Collapse
|
10
|
Carson J, Ledda A, Ferretti L, Keeling M, Didelot X. The bounded coalescent model: Conditioning a genealogy on a minimum root date. J Theor Biol 2022; 548:111186. [PMID: 35697144 DOI: 10.1016/j.jtbi.2022.111186] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 05/05/2022] [Accepted: 06/02/2022] [Indexed: 01/27/2023]
Abstract
The coalescent model represents how individuals sampled from a population may have originated from a last common ancestor. The bounded coalescent model is obtained by conditioning the coalescent model such that the last common ancestor must have existed after a certain date. This conditioned model arises in a variety of applications, such as speciation, horizontal gene transfer or transmission analysis, and yet the bounded coalescent model has not been previously analysed in detail. Here we describe a new algorithm to simulate from this model directly, without resorting to rejection sampling. We show that this direct simulation algorithm is more computationally efficient than the rejection sampling approach. We also show how to calculate the probability of the last common ancestor occurring after a given date, which is required to compute the probability density of realisations under the bounded coalescent model. Our results are applicable in both the isochronous (when all samples have the same date) and heterochronous (where samples can have different dates) settings. We explore the effect of setting a bound on the date of the last common ancestor, and show that it affects a number of properties of the resulting phylogenies. All our methods are implemented in a new R package called BoundedCoalescent which is freely available online.
Collapse
Affiliation(s)
- Jake Carson
- Mathematics Institute, University of Warwick, United Kingdom
| | - Alice Ledda
- HCAI, Fungal, AMR, AMU & Sepsis Division, UK Health Security Agency, United Kingdom
| | - Luca Ferretti
- Big Data Institute, University of Oxford, United Kingdom
| | - Matt Keeling
- Mathematics Institute, University of Warwick, United Kingdom
| | - Xavier Didelot
- Department of Statistics and School of Life Sciences, University of Warwick, United Kingdom
| |
Collapse
|
11
|
Helekal D, Ledda A, Volz E, Wyllie D, Didelot X. Bayesian inference of clonal expansions in a dated phylogeny. Syst Biol 2021; 71:1073-1087. [PMID: 34893904 PMCID: PMC9366454 DOI: 10.1093/sysbio/syab095] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 11/23/2021] [Accepted: 11/29/2021] [Indexed: 11/16/2022] Open
Abstract
Microbial population genetics models often assume that all lineages are constrained by the same population size dynamics over time. However, many neutral and selective events can invalidate this assumption and can contribute to the clonal expansion of a specific lineage relative to the rest of the population. Such differential phylodynamic properties between lineages result in asymmetries and imbalances in phylogenetic trees that are sometimes described informally but which are difficult to analyze formally. To this end, we developed a model of how clonal expansions occur and affect the branching patterns of a phylogeny. We show how the parameters of this model can be inferred from a given dated phylogeny using Bayesian statistics, which allows us to assess the probability that one or more clonal expansion events occurred. For each putative clonal expansion event, we estimate its date of emergence and subsequent phylodynamic trajectory, including its long-term evolutionary potential which is important to determine how much effort should be placed on specific control measures. We demonstrate the applicability of our methodology on simulated and real data sets. Inference under our clonal expansion model can reveal important features in the evolution and epidemiology of infectious disease pathogens. [Clonal expansion; genomic epidemiology; microbial population genomics; phylodynamics.]
Collapse
Affiliation(s)
- David Helekal
- Centre for Doctoral Training in Mathematics for Real-World Systems, University of Warwick, United Kingdom
| | - Alice Ledda
- Healthcare Associated Infections and Antimicrobial Resistance Division, National Infection Service, Public Health England, United Kingdom
| | - Erik Volz
- Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, United Kingdom
| | - David Wyllie
- Field Service, East of England, National Infection Service, Public Health England, Cambridge, United Kingdom
| | - Xavier Didelot
- School of Life Sciences and Department of Statistics, University of Warwick, United Kingdom
| |
Collapse
|
12
|
Genealogical structure changes as range expansions transition from pushed to pulled. Proc Natl Acad Sci U S A 2021; 118:2026746118. [PMID: 34413189 DOI: 10.1073/pnas.2026746118] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Range expansions accelerate evolution through multiple mechanisms, including gene surfing and genetic drift. The inference and control of these evolutionary processes ultimately rely on the information contained in genealogical trees. Currently, there are two opposing views on how range expansions shape genealogies. In invasion biology, expansions are typically approximated by a series of population bottlenecks producing genealogies with only pairwise mergers between lineages-a process known as the Kingman coalescent. Conversely, traveling wave models predict a coalescent with multiple mergers, known as the Bolthausen-Sznitman coalescent. Here, we unify these two approaches and show that expansions can generate an entire spectrum of coalescent topologies. Specifically, we show that tree topology is controlled by growth dynamics at the front and exhibits large differences between pulled and pushed expansions. These differences are explained by the fluctuations in the total number of descendants left by the early founders. High growth cooperativity leads to a narrow distribution of reproductive values and the Kingman coalescent. Conversely, low growth cooperativity results in a broad distribution, whose exponent controls the merger sizes in the genealogies. These broad distribution and non-Kingman tree topologies emerge due to the fluctuations in the front shape and position and do not occur in quasi-deterministic simulations. Overall, our results show that range expansions provide a robust mechanism for generating different types of multiple mergers, which could be similar to those observed in populations with strong selection or high fecundity. Thus, caution should be exercised in making inferences about the origin of non-Kingman genealogies.
Collapse
|
13
|
Didelot X, Geidelberg L, Volz EM. Model design for non-parametric phylodynamic inference and applications to pathogen surveillance. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2021.01.18.427056. [PMID: 34426812 PMCID: PMC8382123 DOI: 10.1101/2021.01.18.427056] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Inference of effective population size from genomic data can provide unique information about demographic history, and when applied to pathogen genetic data can also provide insights into epidemiological dynamics. The combination of non-parametric models for population dynamics with molecular clock models which relate genetic data to time has enabled phylodynamic inference based on large sets of time-stamped genetic sequence data. The methodology for non-parametric inference of effective population size is well-developed in the Bayesian setting, but here we develop a frequentist approach based on non-parametric latent process models of population size dynamics. We appeal to statistical principles based on out-of-sample prediction accuracy in order to optimize parameters that control shape and smoothness of the population size over time. We demonstrate the flexibility and speed of this approach in a series of simulation experiments, and apply the methodology to reconstruct the previously described waves in the seventh pandemic of cholera. We also estimate the impact of non-pharmaceutical interventions for COVID-19 in England using thousands of SARS-CoV-2 sequences. By incorporating a measure of the strength of these interventions over time within the phylodynamic model, we estimate the impact of the first national lockdown in the UK on the epidemic reproduction number.
Collapse
Affiliation(s)
- Xavier Didelot
- School of Life Sciences and Department of Statistics, University of Warwick, United Kingdom
| | - Lily Geidelberg
- Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, United Kingdom
| | | | - Erik M Volz
- Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, United Kingdom
| |
Collapse
|
14
|
Bateman RM, Rudall PJ, Murphy ARM, Cowan RS, Devey DS, Peréz-Escobar OA. Whole plastomes are not enough: phylogenomic and morphometric exploration at multiple demographic levels of the bee orchid clade Ophrys sect. Sphegodes. JOURNAL OF EXPERIMENTAL BOTANY 2021; 72:654-681. [PMID: 33449086 DOI: 10.1093/jxb/eraa467] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 12/15/2020] [Indexed: 05/21/2023]
Abstract
Plastid sequences have long dominated phylogeny reconstruction at all time depths, predicated on a usually untested assumption that they accurately represent the evolutionary histories of phenotypically circumscribed species. We combined detailed in situ morphometrics (124 plants) and whole-plastome sequencing through genome skimming (71 plants) in order to better understand species-level diversity and speciation in arguably the most challenging monophyletic group within the taxonomically controversial, pseudo-copulatory bee orchid genus Ophrys. Using trees and ordinations, we interpreted the data at four nested demographic levels-macrospecies, mesospecies, microspecies, and local population-seeking the optimal level for bona fide species. Neither morphological nor molecular discontinuities are evident at any level below macrospecies, the observed overlap among taxa suggesting that both mesospecies and microspecies reflect arbitrary division of a continuum of variation. Plastomes represent geographic location more strongly than taxonomic assignment and correlate poorly with morphology, suggesting widespread plastid capture and possibly post-glacial expansion from multiple southern refugia. As they are rarely directly involved in the speciation process, plastomes depend on extinction of intermediate lineages to provide phylogenetic signal and so cannot adequately document evolutionary radiations. The popular 'ethological' evolutionary model recognizes as numerous 'ecological species' (microspecies) lineages perceived as actively diverging as a result of density-dependent selection on very few features that immediately dictate extreme pollinator specificity. However, it is assumed rather than demonstrated that the many microspecies are genuinely diverging. We conversely envisage a complex four-dimensional reticulate network of lineages, generated locally and transiently through a wide spectrum of mechanisms, but each unlikely to maintain an independent evolutionary trajectory long enough to genuinely speciate by escaping ongoing gene flow. The frequent but localized microevolution that characterizes the Ophrys sphegodes complex is often convergent and rarely leads to macroevolution. Choosing between the contrasting 'discontinuity' and 'ethology' models will require next-generation sequencing of nuclear genomes plus ordination of corresponding morphometric matrices, seeking the crucial distinction between retained ancestral polymorphism-consistent with lineage divergence-and polymorphisms reflecting gene flow through 'hybridization'-more consistent with lineage convergence.
Collapse
|
15
|
Hecht LB, Thompson PC, Rosenthal BM. Assessing the evolutionary persistence of ecological relationships: A review and preview. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2020; 84:104441. [PMID: 32622083 PMCID: PMC7327472 DOI: 10.1016/j.meegid.2020.104441] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 06/20/2020] [Accepted: 06/22/2020] [Indexed: 12/13/2022]
Abstract
Species interactions, such as pollination, parasitism and predation, form the basis of functioning ecosystems. The origins and resilience of such interactions therefore merit attention. However, fossils only occasionally document ancient interactions, and phylogenetic methods are blind to recent interactions. Is there some other way to track shared species experiences? "Comparative demography" examines when pairs of species jointly thrived or declined. By forging links between ecology, epidemiology, and evolutionary biology, this method sheds light on biological adaptation, species resilience, and ecosystem health. Here, we describe how this method works, discuss examples, and suggest future directions in hopes of inspiring interest, imitators, and critics.
Collapse
Affiliation(s)
| | - Peter C. Thompson
- USDA-Agricultural Research Service, Animal Parasitic Diseases Lab, Beltsville, MD 20705 USA
| | - Benjamin M. Rosenthal
- USDA-Agricultural Research Service, Animal Parasitic Diseases Lab, Beltsville, MD 20705 USA,Corresponding author
| |
Collapse
|
16
|
Abstract
Natural highly fecund populations abound. These range from viruses to gadids. Many highly fecund populations are economically important. Highly fecund populations provide an important contrast to the low-fecundity organisms that have traditionally been applied in evolutionary studies. A key question regarding high fecundity is whether large numbers of offspring are produced on a regular basis, by few individuals each time, in a sweepstakes mode of reproduction. Such reproduction characteristics are not incorporated into the classical Wright-Fisher model, the standard reference model of population genetics, or similar types of models, in which each individual can produce only small numbers of offspring relative to the population size. The expected genomic footprints of population genetic models of sweepstakes reproduction are very different from those of the Wright-Fisher model. A key, immediate issue involves identifying the footprints of sweepstakes reproduction in genomic data. Whole-genome sequencing data can be used to distinguish the patterns made by sweepstakes reproduction from the patterns made by population growth in a population evolving according to the Wright-Fisher model (or similar models). If the hypothesis of sweepstakes reproduction cannot be rejected, then models of sweepstakes reproduction and associated multiple-merger coalescents will become at least as relevant as the Wright-Fisher model (or similar models) and the Kingman coalescent, the cornerstones of mathematical population genetics, in further discussions of evolutionary genomics of highly fecund populations.
Collapse
Affiliation(s)
- Bjarki Eldon
- Leibniz Institute for Evolution and Biodiversity Science, Museum für Naturkunde, D-10115 Berlin, Germany;
| |
Collapse
|
17
|
Cabrera VM. Counterbalancing the time-dependent effect on the human mitochondrial DNA molecular clock. BMC Evol Biol 2020; 20:78. [PMID: 32600249 PMCID: PMC7325269 DOI: 10.1186/s12862-020-01640-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Accepted: 06/17/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND The molecular clock is an important genetic tool for estimating evolutionary timescales. However, the detection of a time-dependent effect on substitution rate estimates complicates its application. It has been suggested that demographic processes could be the main cause of this confounding effect. In the present study, I propose a new algorithm for estimating the coalescent age of phylogenetically related sequences, taking into account the observed time-dependent effect on the molecular rate detected by others. RESULTS By applying this method to real human mitochondrial DNA trees with shallow and deep topologies, I obtained significantly older molecular ages for the main events of human evolution than were previously estimated. These ages are in close agreement with the most recent archaeological and paleontological records favoring the emergence of early anatomically modern humans in Africa 315 ± 34 thousand years ago (kya) and the presence of recent modern humans outside of Africa as early as 174 ± 48 thousand years ago. Furthermore, during the implementation process, I demonstrated that in a population with fluctuating sizes, the probability of fixation of a new neutral mutant depends on the effective population size, which is in better accordance with the fact that under the neutral theory of molecular evolution, the fate of a molecular mutation is mainly determined by random drift. CONCLUSIONS I suggest that the demographic history of populations has a more decisive effect than purifying selection and/or mutational saturation on the time-dependent effect observed for the substitution rate, and I propose a new method that corrects for this effect.
Collapse
Affiliation(s)
- Vicente M Cabrera
- Departamento de Genética, Universidad de La Laguna, E-38271 La Laguna, Tenerife, Spain.
| |
Collapse
|
18
|
Wakeley J. Developments in coalescent theory from single loci to chromosomes. Theor Popul Biol 2020; 133:56-64. [DOI: 10.1016/j.tpb.2020.02.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Revised: 02/19/2020] [Accepted: 02/26/2020] [Indexed: 10/24/2022]
|
19
|
Chen H. A Computational Approach for Modeling the Allele Frequency Spectrum of Populations with Arbitrarily Varying Size. GENOMICS PROTEOMICS & BIOINFORMATICS 2020; 17:635-644. [PMID: 32173599 PMCID: PMC7212486 DOI: 10.1016/j.gpb.2019.06.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Revised: 06/04/2019] [Accepted: 08/02/2019] [Indexed: 11/25/2022]
Abstract
The allele frequency spectrum (AFS), or site frequency spectrum, is commonly used to summarize the genomic polymorphism pattern of a sample, which is informative for inferring population history and detecting natural selection. In 2013, Chen and Chen developed a method for analytically deriving the AFS for populations with temporally varying size through the coalescence time-scaling function. However, their approach is only applicable to population history scenarios in which the analytical form of the time-scaling function is tractable. In this paper, we propose a computational approach to extend the method to populations with arbitrary complex varying size by numerically approximating the time-scaling function. We demonstrate the performance of the approach by constructing the AFS for two population history scenarios: the logistic growth model and the Gompertz growth model, for which the AFS are unavailable with existing approaches. Software for implementing the algorithm can be downloaded at http://chenlab.big.ac.cn/software/.
Collapse
Affiliation(s)
- Hua Chen
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China; School of Future Technology, University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
20
|
Werner B, Case J, Williams MJ, Chkhaidze K, Temko D, Fernández-Mateos J, Cresswell GD, Nichol D, Cross W, Spiteri I, Huang W, Tomlinson IPM, Barnes CP, Graham TA, Sottoriva A. Measuring single cell divisions in human tissues from multi-region sequencing data. Nat Commun 2020; 11:1035. [PMID: 32098957 PMCID: PMC7042311 DOI: 10.1038/s41467-020-14844-6] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Accepted: 01/29/2020] [Indexed: 01/06/2023] Open
Abstract
Both normal tissue development and cancer growth are driven by a branching process of cell division and mutation accumulation that leads to intra-tissue genetic heterogeneity. However, quantifying somatic evolution in humans remains challenging. Here, we show that multi-sample genomic data from a single time point of normal and cancer tissues contains information on single-cell divisions. We present a new theoretical framework that, applied to whole-genome sequencing data of healthy tissue and cancer, allows inferring the mutation rate and the cell survival/death rate per division. On average, we found that cells accumulate 1.14 mutations per cell division in healthy haematopoiesis and 1.37 mutations per division in brain development. In both tissues, cell survival was maximal during early development. Analysis of 131 biopsies from 16 tumours showed 4 to 100 times increased mutation rates compared to healthy development and substantial inter-patient variation of cell survival/death rates.
Collapse
Affiliation(s)
- Benjamin Werner
- Evolutionary Genomics and Modelling Lab, Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK.
- Evolutionary Dynamics Group, Centre for Cancer Genomics & Computational Biology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ, UK.
| | - Jack Case
- Evolutionary Genomics and Modelling Lab, Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK
- University of Cambridge, Cambridge, UK
| | - Marc J Williams
- Evolution and Cancer Laboratory, Centre for Cancer Genomics & Computational Biology, Barts Cancer Institute, Queen Mary University London, London, Charterhouse Square, London, EC1M 6BQ, UK
- Department of Cell and Developmental Biology, University College London, London, UK
- Centre for Mathematics and Physics in the Life Sciences and Experimental Biology (CoMPLEX), University College London, London, UK
| | - Ketevan Chkhaidze
- Evolutionary Genomics and Modelling Lab, Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK
| | - Daniel Temko
- Evolution and Cancer Laboratory, Centre for Cancer Genomics & Computational Biology, Barts Cancer Institute, Queen Mary University London, London, Charterhouse Square, London, EC1M 6BQ, UK
| | - Javier Fernández-Mateos
- Evolutionary Genomics and Modelling Lab, Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK
| | - George D Cresswell
- Evolutionary Genomics and Modelling Lab, Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK
| | - Daniel Nichol
- Evolutionary Genomics and Modelling Lab, Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK
| | - William Cross
- Evolution and Cancer Laboratory, Centre for Cancer Genomics & Computational Biology, Barts Cancer Institute, Queen Mary University London, London, Charterhouse Square, London, EC1M 6BQ, UK
| | - Inmaculada Spiteri
- Evolutionary Genomics and Modelling Lab, Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK
| | - Weini Huang
- Group of Theoretical Biology, The State Key Laboratory of Biocontrol, School of Life Science, Sun Yat-sen University, 510060, Guangzhou, China
- School of Mathematical Sciences, Queen Mary University London, London, UK
| | - Ian P M Tomlinson
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK
| | - Chris P Barnes
- Department of Cell and Developmental Biology, University College London, London, UK
- UCL Genetics Institute, University College London, London, UK
| | - Trevor A Graham
- Evolution and Cancer Laboratory, Centre for Cancer Genomics & Computational Biology, Barts Cancer Institute, Queen Mary University London, London, Charterhouse Square, London, EC1M 6BQ, UK.
| | - Andrea Sottoriva
- Evolutionary Genomics and Modelling Lab, Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK.
| |
Collapse
|
21
|
Korábek O, Juřičková L, Petrusek A. Inferring the sources of postglacial range expansion in two large European land snails. J ZOOL SYST EVOL RES 2020. [DOI: 10.1111/jzs.12368] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Ondřej Korábek
- Department of Ecology Faculty of Science Charles University Praha Czechia
| | - Lucie Juřičková
- Department of Zoology Faculty of Science Charles University Praha Czechia
| | - Adam Petrusek
- Department of Ecology Faculty of Science Charles University Praha Czechia
| |
Collapse
|
22
|
Palacios JA, Véber A, Cappello L, Wang Z, Wakeley J, Ramachandran S. Bayesian Estimation of Population Size Changes by Sampling Tajima's Trees. Genetics 2019; 213:967-986. [PMID: 31511299 PMCID: PMC6827370 DOI: 10.1534/genetics.119.302373] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Accepted: 09/06/2019] [Indexed: 11/30/2022] Open
Abstract
The large state space of gene genealogies is a major hurdle for inference methods based on Kingman's coalescent. Here, we present a new Bayesian approach for inferring past population sizes, which relies on a lower-resolution coalescent process that we refer to as "Tajima's coalescent." Tajima's coalescent has a drastically smaller state space, and hence it is a computationally more efficient model, than the standard Kingman coalescent. We provide a new algorithm for efficient and exact likelihood calculations for data without recombination, which exploits a directed acyclic graph and a correspondingly tailored Markov Chain Monte Carlo method. We compare the performance of our Bayesian Estimation of population size changes by Sampling Tajima's Trees (BESTT) with a popular implementation of coalescent-based inference in BEAST using simulated and human data. We empirically demonstrate that BESTT can accurately infer effective population sizes, and it further provides an efficient alternative to the Kingman's coalescent. The algorithms described here are implemented in the R package phylodyn, which is available for download at https://github.com/JuliaPalacios/phylodyn.
Collapse
Affiliation(s)
- Julia A Palacios
- Department of Statistics, Stanford University, California 94305
- Department of Biomedical Data Science, Stanford School of Medicine, California 94305
| | - Amandine Véber
- Centre de Mathématiques Appliquées, École Polytechnique 91128, Le Centre National de la Recherche Scientifique, Palaiseau, France 91767
| | | | - Zhangyuan Wang
- Department of Computer Science, Stanford University, California 94305
| | - John Wakeley
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138
| | - Sohini Ramachandran
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island 02912
- Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island 02912
| |
Collapse
|
23
|
Alcala N, Goldberg A, Ramakrishnan U, Rosenberg NA. Coalescent Theory of Migration Network Motifs. Mol Biol Evol 2019; 36:2358-2374. [PMID: 31165149 PMCID: PMC6759081 DOI: 10.1093/molbev/msz136] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Natural populations display a variety of spatial arrangements, each potentially with a distinctive impact on genetic diversity and genetic differentiation among subpopulations. Although the spatial arrangement of populations can lead to intricate migration networks, theoretical developments have focused mainly on a small subset of such networks, emphasizing the island-migration and stepping-stone models. In this study, we investigate all small network motifs: the set of all possible migration networks among populations subdivided into at most four subpopulations. For each motif, we use coalescent theory to derive expectations for three quantities that describe genetic variation: nucleotide diversity, FST, and half-time to equilibrium diversity. We describe the impact of network properties on these quantities, finding that motifs with a high mean node degree have the largest nucleotide diversity and the longest time to equilibrium, whereas motifs with low density have the largest FST. In addition, we show that the motifs whose pattern of variation is most strongly influenced by loss of a connection or a subpopulation are those that can be split easily into disconnected components. We illustrate our results using two example data sets—sky island birds of genus Sholicola and Indian tigers—identifying disturbance scenarios that produce the greatest reduction in genetic diversity; for tigers, we also compare the benefits of two assisted gene flow scenarios. Our results have consequences for understanding the effect of geography on genetic diversity, and they can assist in designing strategies to alter population migration networks toward maximizing genetic variation in the context of conservation of endangered species.
Collapse
Affiliation(s)
- Nicolas Alcala
- Department of Biology, Stanford University, Stanford, CA
| | - Amy Goldberg
- Department of Biology, Stanford University, Stanford, CA.,Department of Evolutionary Anthropology, Duke University, Durham, NC
| | - Uma Ramakrishnan
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, India
| | | |
Collapse
|
24
|
Werner B, Williams MJ, Barnes CP, Graham TA, Sottoriva A. Reply to 'Currently available bulk sequencing data do not necessarily support a model of neutral tumor evolution'. Nat Genet 2018; 50:1624-1626. [PMID: 30374070 DOI: 10.1038/s41588-018-0235-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Benjamin Werner
- Evolutionary Genomics and Modelling Lab, Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK
| | - Marc J Williams
- Evolution and Cancer Laboratory, Barts Cancer Institute, Queen Marry University of London, London, UK
- Department of Cell and Developmental Biology, University College London, London, UK
- Centre for Mathematics and Physics in the Life Sciences and Experimental Biology (CoMPLEX), University College London, London, UK
| | - Chris P Barnes
- Department of Cell and Developmental Biology, University College London, London, UK
- Department of Genetics, Evolution and Environment, University College London, London, UK
| | - Trevor A Graham
- Evolution and Cancer Laboratory, Barts Cancer Institute, Queen Marry University of London, London, UK
| | - Andrea Sottoriva
- Evolutionary Genomics and Modelling Lab, Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK.
| |
Collapse
|
25
|
Arbisser IM, Jewett EM, Rosenberg NA. On the joint distribution of tree height and tree length under the coalescent. Theor Popul Biol 2018; 122:46-56. [PMID: 29132923 PMCID: PMC5945353 DOI: 10.1016/j.tpb.2017.10.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2017] [Revised: 10/30/2017] [Accepted: 10/31/2017] [Indexed: 10/18/2022]
Abstract
Many statistics that examine genetic variation depend on the underlying shapes of genealogical trees. Under the coalescent model, we investigate the joint distribution of two quantities that describe genealogical tree shape: tree height and tree length. We derive a recursive formula for their exact joint distribution under a demographic model of a constant-sized population. We obtain approximations for the mean and variance of the ratio of tree height to tree length, using them to show that this ratio converges in probability to 0 as the sample size increases. We find that as the sample size increases, the correlation coefficient for tree height and length approaches (π2-6)∕[π2π2-18]≈0.9340. Using simulations, we examine the joint distribution of height and length under demographic models with population growth and population subdivision. We interpret the joint distribution in relation to problems of interest in data analysis, including inference of the time to the most recent common ancestor. The results assist in understanding the influences of demographic histories on two fundamental features of tree shape.
Collapse
Affiliation(s)
- Ilana M Arbisser
- Department of Biology, Stanford University, Stanford, CA 94305, USA.
| | - Ethan M Jewett
- Departments of Electrical Engineering & Computer Science and Statistics, University of California, Berkeley, CA 94720, USA
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
26
|
Reppell M, Zöllner S. An efficient algorithm for generating the internal branches of a Kingman coalescent. Theor Popul Biol 2018; 122:57-66. [PMID: 28709926 PMCID: PMC5764821 DOI: 10.1016/j.tpb.2017.05.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Revised: 05/19/2017] [Accepted: 05/26/2017] [Indexed: 01/16/2023]
Abstract
Coalescent simulations are a widely used approach for simulating sample genealogies, but can become computationally burdensome in large samples. Methods exist to analytically calculate a sample's expected frequency spectrum without simulating full genealogies. However, statistics that rely on the distribution of the length of internal coalescent branches, such as the probability that two mutations of equal size arose on the same genealogical branch, have previously required full coalescent simulations to estimate. Here, we present a sampling method capable of efficiently generating limited portions of sample genealogies using a series of analytic equations that give probabilities for the number, start, and end of internal branches conditional on the number of final samples they subtend. These equations are independent of the coalescent waiting times and need only be calculated a single time, lending themselves to efficient computation. We compare our method with full coalescent simulations to show the resulting distribution of branch lengths and summary statistics are equivalent, but that for many conditions our method is at least 10 times faster.
Collapse
Affiliation(s)
- M Reppell
- Department of Human Genetics, University of Chicago, Chicago, IL, USA.
| | - S Zöllner
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA; Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
27
|
Patiño-Galindo JÁ, González-Candelas F. Molecular evolution methods to study HIV-1 epidemics. Future Virol 2018; 13:399-404. [PMID: 29967650 DOI: 10.2217/fvl-2017-0159] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2017] [Accepted: 04/04/2018] [Indexed: 01/17/2023]
Abstract
Nucleotide sequences of HIV isolates are obtained routinely to evaluate the presence of resistance mutations to antiretroviral drugs. But, beyond their clinical use, these and other viral sequences include a wealth of information that can be used to better understand and characterize the epidemiology of HIV in relevant populations. In this review, we provide a brief overview of the main methods used to analyze HIV sequences, the data bases where reference sequences can be obtained, and some caveats about the possible applications for public health of these analyses, along with some considerations about their limitations and correct usage to derive robust and reliable conclusions.
Collapse
Affiliation(s)
- Juan Á Patiño-Galindo
- Department of Systems Biology, Columbia University, New York, NY 10032, USA.,Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Fernando González-Candelas
- Joint Research Unit "Infección y Salud Pública" FISABIO-Salud Pública/Universitat de València-Institute for Integrative Systems Biology (ISysBio, CSIC-UV) Valencia, Spain.,CIBER in Epidemiology & Public Health, Valencia, Spain.,Joint Research Unit "Infección y Salud Pública" FISABIO-Salud Pública/Universitat de València-Institute for Integrative Systems Biology (ISysBio, CSIC-UV) Valencia, Spain.,CIBER in Epidemiology & Public Health, Valencia, Spain
| |
Collapse
|
28
|
Bateman RM, Sramkó G, Paun O. Integrating restriction site-associated DNA sequencing (RAD-seq) with morphological cladistic analysis clarifies evolutionary relationships among major species groups of bee orchids. ANNALS OF BOTANY 2018; 121:85-105. [PMID: 29325077 PMCID: PMC5786241 DOI: 10.1093/aob/mcx129] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2017] [Accepted: 10/02/2017] [Indexed: 05/03/2023]
Abstract
BACKGROUND AND AIMS Bee orchids (Ophrys) have become the most popular model system for studying reproduction via insect-mediated pseudo-copulation and for exploring the consequent, putatively adaptive, evolutionary radiations. However, despite intensive past research, both the phylogenetic structure and species diversity within the genus remain highly contentious. Here, we integrate next-generation sequencing and morphological cladistic techniques to clarify the phylogeny of the genus. METHODS At least two accessions of each of the ten species groups previously circumscribed from large-scale cloned nuclear ribosomal internal transcibed spacer (nrITS) sequencing were subjected to restriction site-associated sequencing (RAD-seq). The resulting matrix of 4159 single nucleotide polymorphisms (SNPs) for 34 accessions was used to construct an unrooted network and a rooted maximum likelihood phylogeny. A parallel morphological cladistic matrix of 43 characters generated both polymorphic and non-polymorphic sets of parsimony trees before being mapped across the RAD-seq topology. KEY RESULTS RAD-seq data strongly support the monophyly of nine out of ten groups previously circumscribed using nrITS and resolve three major clades; in contrast, supposed microspecies are barely distinguishable. Strong incongruence separated the RAD-seq trees from both the morphological trees and traditional classifications; mapping of the morphological characters across the RAD-seq topology rendered them far more homoplastic. CONCLUSIONS The comparatively high level of morphological homoplasy reflects extensive convergence, whereas the derived placement of the fusca group is attributed to paedomorphic simplification. The phenotype of the most recent common ancestor of the extant lineages is inferred, but it post-dates the majority of the character-state changes that typify the genus. RAD-seq may represent the high-water mark of the contribution of molecular phylogenetics to understanding evolution within Ophrys; further progress will require large-scale population-level studies that integrate phenotypic and genotypic data in a cogent conceptual framework.
Collapse
Affiliation(s)
- Richard M Bateman
- Jodrell Laboratory, Royal Botanic Gardens Kew, Richmond, Surrey, UK
- For correspondence. E-mail
| | - Gábor Sramkó
- Department of Botany, University of Debrecen, Egyetem, Debrecen, Hungary
- MTA-DE ‘Lendület’ Evolutionary Phylogenomics Research Group, Egyetem, Debrecen, Hungary
| | - Ovidiu Paun
- Department of Botany and Biodiversity Research, University of Vienna, Rennweg, Vienna, Austria
| |
Collapse
|
29
|
Abstract
Coalescent theory is a powerful tool for population geneticists as well as molecular biologists interested in understanding the patterns and levels of DNA variation. Using coalescent Monte Carlo simulations it is possible to obtain the empirical distributions for a number of statistics across a wide range of evolutionary models; these distributions can be used to test evolutionary hypotheses using experimental data. The mlcoalsim application presented here (based on a version of the ms program, Hudson, 2002) adds important new features to improve methodology (uncertainty and conditional methods for mutation and recombination), models (including strong positive selection, finite sites and heterogeneity in mutation and recombination rates) and analyses (calculating a number of statistics used in population genetics and P-values for observed data). One of the most important features of mlcoalsim is the analysis of multilocus data in linked and independent regions. In summary, mlcoalsim is an integrated software application aimed at researchers interested in molecular evolution. mlcoalsim is written in ANSI C and is available at: http://www.ub.es/softevol/mlcoalsim .
Collapse
Affiliation(s)
- Sebastian E. Ramos-Onsins
- Max-Planck Institute for Chemical Ecology, Hans-Knöll Str. 8, D-07745 Jena, Germany
- Present address: Departament de Genètica, Universitat de Barcelona, Diagonal 645, Barcelona, Spain
| | - Thomas Mitchell-Olds
- Max-Planck Institute for Chemical Ecology, Hans-Knöll Str. 8, D-07745 Jena, Germany
- Present address: Department of Biology, Duke University, Durham, NC 27708, USA
| |
Collapse
|
30
|
Shirley MH, Austin JD. Did Late Pleistocene climate change result in parallel genetic structure and demographic bottlenecks in sympatric Central African crocodiles, Mecistops and Osteolaemus? Mol Ecol 2017; 26:6463-6477. [PMID: 29024142 DOI: 10.1111/mec.14378] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2016] [Revised: 09/19/2017] [Accepted: 09/25/2017] [Indexed: 01/24/2023]
Abstract
The mid-Holocene has had profound demographic impacts on wildlife on the African continent, although there is little known about the impacts on species from Central Africa. Understanding the impacts of climate change on codistributed species can enhance our understanding of ecosystem dynamics and for formulating restoration objectives. We took a multigenome comparative approach to examine the phylogeographic structure of two poorly known Central African crocodile species-Mecistops sp. aff. cataphractus and Osteolaemus tetraspis. In addition, we conducted coalescent-based demographic reconstructions to test the hypothesis that population decline was driven by climate change since the Last Glacial Maximum, vs. more recent anthropogenic pressures. Using a hierarchical Bayesian model to reconstruct demographic history, we show that both species had dramatic declines (>97%) in effective population size in the 'period following the Last Glacial Maximum 1,500-18,000 YBP. Identification of genetic structuring showed both species have similar regional structure corresponding to major geological features (i.e., hydrologic basin) and that small observed differences between them are best explained by the differences in their ecology and the likely impact that climate change had on their habitat needs. Our results support our hypothesis that climatic effects, presumably on forest and wetland habitat, had a congruent negative impact on both species.
Collapse
Affiliation(s)
- Matthew H Shirley
- Tropical Conservation Institute, Florida International University, Biscayne Bay Campus, North Miami, FL, USA.,Rare Species Conservatory Foundation, Loxahatchee, FL, USA
| | - James D Austin
- Department of Wildlife Ecology & Conservation, University of Florida, Gainesville, FL, USA
| |
Collapse
|
31
|
Williams MJ, Werner B, Barnes CP, Graham TA, Sottoriva A. Reply: Uncertainties in tumor allele frequencies limit power to infer evolutionary pressures. Nat Genet 2017; 49:1289-1291. [PMID: 28854180 DOI: 10.1038/ng.3877] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Marc J Williams
- Evolution and Cancer Laboratory, Barts Cancer Institute, Queen Marry University of London, London, UK
- Department of Cell and Developmental Biology, University College London, London, UK
- Centre for Mathematics and Physics in the Life Sciences and Experimental Biology (CoMPLEX), University College London, London, UK
| | - Benjamin Werner
- Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK
| | - Chris P Barnes
- Department of Cell and Developmental Biology, University College London, London, UK
- Department of Genetics, Evolution and Environment, University College London, London, UK
| | - Trevor A Graham
- Evolution and Cancer Laboratory, Barts Cancer Institute, Queen Marry University of London, London, UK
| | - Andrea Sottoriva
- Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK
| |
Collapse
|
32
|
Barrowclough GF, Gutiérrez RJ, Groth JG. PHYLOGEOGRAPHY OF SPOTTED OWL (STRIX OCCIDENTALIS) POPULATIONS BASED ON MITOCHONDRIAL DNA SEQUENCES: GENE FLOW, GENETIC STRUCTURE, AND A NOVEL BIOGEOGRAPHIC PATTERN. Evolution 2017; 53:919-931. [PMID: 28565647 DOI: 10.1111/j.1558-5646.1999.tb05385.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/1997] [Accepted: 12/22/1998] [Indexed: 10/19/2022]
Abstract
Mitochondrial DNA control region sequences of spotted owls (Strix occidentalis) allowed us to investigate gene flow, genetic structure, and biogeographic relationships among these forest-dwelling birds of western North America Estimates of gene flow based on genetic partitioning and the phylogeography of haplotypes indicate substantial dispersal within three long-recognized subspecies. However, patterns of individual phyletic relationships indicate a historical absence of gene flow among the subspecies, which are essentially monophyletic. The pattern of haplotype coalescence enabled us to identify the approximate timing and direction of a recent episode of gene flow from the Sierra Nevada to the northern coastal ranges. The three subspecies comprise phylogenetic species, and the northern spotted owl (S. o. caurina) is sister to a clade of California (S. o. occidentalis) plus Mexican spotted owls (S o lucida); this represents a novel biogeographic pattern within birds. The California spotted owl had substantially lower nucleotide diversity than the other two subspecies; this result is inconsistent with present patterns of population density A causal explanation requires postulating a severe bottleneck or a selective sweep, either of which was confined to only one geographic region.
Collapse
Affiliation(s)
- George F Barrowclough
- Department of Ornithology, American Museum of Natural History, Central Park West at 79th Street, New York, New York, 10024
| | - R J Gutiérrez
- Department of Wildlife Management, School of Natural Resources, Humboldt State University, Arcata, California, 95521
| | - Jeffrey G Groth
- Department of Ornithology, American Museum of Natural History, Central Park West at 79th Street, New York, New York, 10024
| |
Collapse
|
33
|
Petit E, Excoffier L, Mayer F. NO EVIDENCE OF BOTTLENECK IN THE POSTGLACIAL RECOLONIZATION OF EUROPE BY THE NOCTULE BAT (NYCTALUS NOCTULA). Evolution 2017; 53:1247-1258. [PMID: 28565510 DOI: 10.1111/j.1558-5646.1999.tb04537.x] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/1998] [Accepted: 02/15/1999] [Indexed: 11/29/2022]
Abstract
During the Pleistocene, the habitat of the noctule bat (Nyctalus noctula) was limited to small refuge areas located in Southern Europe, whereas the species is now widespread across this continent. Using mtDNA (control region and ND1 gene) polymorphisms, we asked whether this recolonization occurred through bottlenecks and whether it was accompanied by population growth. Sequences of the second hypervariable domain of the control region were obtained from 364 noctule bats representing 18 colonies sampled across Europe. This yielded 108 haplotypes that were depicted on a minimum spanning tree that showed a starlike structure with two long branches. Additional sequences obtained from the ND1 gene confirmed that the different parts of the MST correspond to three clades which diverged before the Last Glacial Maximum (18,000 yrC14 BP), leading to the conclusion that the noctule bat survived in several isolated refugia. Partitioning populations into coherent geographical groups divided our samples (φCT = 0.17; P = 0.01) into a group of highly variable nursing colonies from central and eastern Europe and less variable, isolated colonies from western and southern Europe. Demographic analyses suggest that populations of the former group underwent demographic expansions either after the Younger Dryas (11,000-10,000 yrC14 BP), assuming a fast mutation rate for HV II, or during the Pleistocene, assuming a conventional mutation rate. We discuss the fact that the high genetic variability (h = 0.69-0.96; π = 0.006-0.013) observed in nursing colonies that are located some distance from potential Pleistocene refugia is probably due to the combined effect of rapid evolution of the control region in growing populations and a range shift of noctule populations parallel to the recovery of forests in Europe after the last glaciations.
Collapse
Affiliation(s)
- Eric Petit
- Institut für Zoologie II, Universität Erlangen, Staudtstrasse 5, 91058, Erlangen, Germany
| | - Laurent Excoffier
- Genetics and Biometry Laboratory, Department of Anthropology and Ecology, University of Geneva, CP 511, 1211, Geneva 24, Switzerland
| | - Frieder Mayer
- Institut für Zoologie II, Universität Erlangen, Staudtstrasse 5, 91058, Erlangen, Germany
| |
Collapse
|
34
|
Abstract
A variety of convergence results for genealogical and line-of-descendent processes are known for exchangeable neutral population genetics models. A general convergence-to-the-coalescent theorem is presented, which works not only for a larger class of exchangeable models but also for a large class of non-exchangeable population models. The coalescence probability, i.e. the probability that two genes, chosen randomly without replacement, have a common ancestor one generation backwards in time, is the central quantity to analyse the ancestral structure.
Collapse
|
35
|
Abstract
We study the genealogical structure of samples from a population for which any given generation is made up of direct descendants from several previous generations. These occur in nature when there are seed banks or egg banks allowing an individual to leave offspring several generations in the future. We show how this temporal structure in the reproduction mechanism causes a decrease in the coalescence rate. We also investigate the effects of age-dependent neutral mutations. Our main result gives weak convergence of the scaled ancestral process, with the usual diffusion scaling, to a coalescent process which is equivalent to a time-changed version of Kingman's coalescent.
Collapse
|
36
|
Jagers P, Sagitov S. Convergence to the coalescent in populations of substantially varying size. J Appl Probab 2016. [DOI: 10.1239/jap/1082999072] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Kingman's classical theory of the coalescent uncovered the basic pattern of genealogical trees of random samples of individuals in large but time-constant populations. Time is viewed as being discrete and is identified with non-overlapping generations. Reproduction can be very generally taken as exchangeable (meaning that the labelling of individuals in each generation carries no significance). Recent generalisations have dealt with population sizes exhibiting given deterministic or (minor) random fluctuations. We consider population sizes which constitute a stationary Markov chain, explicitly allowing large fluctuations in short times. Convergence of the genealogical tree, as population size tends to infinity, towards the (time-scaled) coalescent is proved under minimal conditions. As a result, we obtain a formula for effective population size, generalising the well-known harmonic mean expression for effective size.
Collapse
|
37
|
Abstract
The aim of this paper is to study genealogical processes in a geographically structured population with weak migration. The coalescence time for sampled genes from different colonies diverges to infinity as the migration rates among colonies are close to zero. We investigate the moment generating functions of the coalescence time, the number of segregating sites and the number of allele types in sampled genes when there is low migration. Employing a perturbation method, we obtain a system of recurrence relations for the approximate solutions of these moment generating functions and solve them in some cases.
Collapse
|
38
|
Abstract
A variety of convergence results for genealogical and line-of-descendent processes are known for exchangeable neutral population genetics models. A general convergence-to-the-coalescent theorem is presented, which works not only for a larger class of exchangeable models but also for a large class of non-exchangeable population models. The coalescence probability, i.e. the probability that two genes, chosen randomly without replacement, have a common ancestor one generation backwards in time, is the central quantity to analyse the ancestral structure.
Collapse
|
39
|
Abstract
For a large class of neutral population models the asymptotics of the ancestral structure of a sample of n individuals (or genes) is studied, if the total population size becomes large. Under certain conditions and under a well-known time-scaling, which can be expressed in terms of the coalescence probabilities, weak convergence in DE([0,∞)) to the coalescent holds. Further the convergence behaviour of the jump chain of the ancestral process is studied. The results are used to approximate probabilities which are of certain interest in applications, for example hitting probabilities.
Collapse
|
40
|
Abstract
For a large class of neutral population models the asymptotics of the ancestral structure of a sample of n individuals (or genes) is studied, if the total population size becomes large. Under certain conditions and under a well-known time-scaling, which can be expressed in terms of the coalescence probabilities, weak convergence in D
E
([0,∞)) to the coalescent holds. Further the convergence behaviour of the jump chain of the ancestral process is studied. The results are used to approximate probabilities which are of certain interest in applications, for example hitting probabilities.
Collapse
|
41
|
Abstract
We study the ancestral process of a sample from a subdivided population with stochastically varying subpopulation sizes. The sizes of the subpopulations change very rapidly (almost every generation) with respect to the coalescent time scale. For haploid populations of sizeN, one coalescence time unit corresponds toNgenerations. Coalescence and migration events occur on the same time scale. We show that, when the total population size tends to infinity, the structured coalescent is obtained, thus confirming the robustness of the coalescent. Many population structure models have been shown to converge to the structured coalescent (see Herbots (1997), Hudson (1998), Nordborg (2001), Nordborg and Krone (2002), and Notohara (1990)).
Collapse
|
42
|
Abstract
Kingman's classical theory of the coalescent uncovered the basic pattern of genealogical trees of random samples of individuals in large but time-constant populations. Time is viewed as being discrete and is identified with non-overlapping generations. Reproduction can be very generally taken as exchangeable (meaning that the labelling of individuals in each generation carries no significance). Recent generalisations have dealt with population sizes exhibiting given deterministic or (minor) random fluctuations. We consider population sizes which constitute a stationary Markov chain, explicitly allowing large fluctuations in short times. Convergence of the genealogical tree, as population size tends to infinity, towards the (time-scaled) coalescent is proved under minimal conditions. As a result, we obtain a formula for effective population size, generalising the well-known harmonic mean expression for effective size.
Collapse
|
43
|
Abstract
The aim of this paper is to study genealogical processes in a geographically structured population with weak migration. The coalescence time for sampled genes from different colonies diverges to infinity as the migration rates among colonies are close to zero. We investigate the moment generating functions of the coalescence time, the number of segregating sites and the number of allele types in sampled genes when there is low migration. Employing a perturbation method, we obtain a system of recurrence relations for the approximate solutions of these moment generating functions and solve them in some cases.
Collapse
|
44
|
Abstract
We study the genealogical structure of samples from a population for which any given generation is made up of direct descendants from several previous generations. These occur in nature when there are seed banks or egg banks allowing an individual to leave offspring several generations in the future. We show how this temporal structure in the reproduction mechanism causes a decrease in the coalescence rate. We also investigate the effects of age-dependent neutral mutations. Our main result gives weak convergence of the scaled ancestral process, with the usual diffusion scaling, to a coalescent process which is equivalent to a time-changed version of Kingman's coalescent.
Collapse
|
45
|
Abstract
We study the ancestral process of a sample from a subdivided population with stochastically varying subpopulation sizes. The sizes of the subpopulations change very rapidly (almost every generation) with respect to the coalescent time scale. For haploid populations of sizeN, one coalescence time unit corresponds toNgenerations. Coalescence and migration events occur on the same time scale. We show that, when the total population size tends to infinity, the structured coalescent is obtained, thus confirming the robustness of the coalescent. Many population structure models have been shown to converge to the structured coalescent (see Herbots (1997), Hudson (1998), Nordborg (2001), Nordborg and Krone (2002), and Notohara (1990)).
Collapse
|
46
|
Gorroochurn P. Post-data inference of coalescence times and segregating-site distribution in a two-island model with symmetric migration. ADV APPL PROBAB 2016. [DOI: 10.1239/aap/1005091355] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In this paper, we present the distribution of the coalescence time of two DNA sequences (or genes) subject to symmetric migration between two islands, and conditional on the observed number of segregating sites in the sequences. The distribution for the segregating-site pattern is also obtained. Some surprising results emerge when both genes are initially on the same island. First, the post-data mean coalescence time is shown to be dependent on the migration parameter, as opposed to the pre-data mean. Second, both the post-data density and expectation for the coalescence time are shown to converge, in the weak-migration limit, to the corresponding panmictic results, as opposed to the pre-data situation where there is convergence in the density but not in the expectation. Finally, it is shown that there is convergence in the weak-migration limit in the distribution of the number of segregating sites but not in the expectation and variance. Numerical and graphical results for samples of size greater than two are also presented.
Collapse
|
47
|
Abstract
We study a process where balls are repeatedly thrown intonboxes independently according to some probability distributionp. We start withnballs, and at each step, all balls landing in the same box are fused into a single ball; the process terminates when there is only one ball left (coalescence). Letc:= ∑jpj2, the collision probability of two fixed balls. We show that the expected coalescence time is asymptotically 2c−1, under two constraints onpthat exclude a thin set of distributionsp. One of the constraints isc=o(ln−2n). This ln−2nis shown to be a threshold value: forc= ω(ln−2n), there existspwithc(p) =csuch that the expected coalescence time far exceedsc−1. Connections to coalescent processes in population biology and theoretical computer science are discussed.
Collapse
|
48
|
Möhle M. A convergence theorem for markov chains arising in population genetics and the coalescent with selfing. ADV APPL PROBAB 2016. [DOI: 10.1239/aap/1035228080] [Citation(s) in RCA: 65] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
A simple convergence theorem for sequences of Markov chains is presented in order to derive new ‘convergence-to-the-coalescent’ results for diploid neutral population models.For the so-called diploid Wright-Fisher model with selfing probabilitysand mutation rate θ, it is shown that the ancestral structure ofnsampled genes can be treated in the framework of ann-coalescent with mutation rate ̃θ := θ(1-s/2), if the population sizeNis large and if the time is measured in units of (2-s)Ngenerations.
Collapse
|
49
|
Gill MS, Lemey P, Bennett SN, Biek R, Suchard MA. Understanding Past Population Dynamics: Bayesian Coalescent-Based Modeling with Covariates. Syst Biol 2016; 65:1041-1056. [PMID: 27368344 DOI: 10.1093/sysbio/syw050] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2015] [Revised: 05/16/2016] [Accepted: 05/23/2016] [Indexed: 12/12/2022] Open
Abstract
Effective population size characterizes the genetic variability in a population and is a parameter of paramount importance in population genetics and evolutionary biology. Kingman's coalescent process enables inference of past population dynamics directly from molecular sequence data, and researchers have developed a number of flexible coalescent-based models for Bayesian nonparametric estimation of the effective population size as a function of time. Major goals of demographic reconstruction include identifying driving factors of effective population size, and understanding the association between the effective population size and such factors. Building upon Bayesian nonparametric coalescent-based approaches, we introduce a flexible framework that incorporates time-varying covariates that exploit Gaussian Markov random fields to achieve temporal smoothing of effective population size trajectories. To approximate the posterior distribution, we adapt efficient Markov chain Monte Carlo algorithms designed for highly structured Gaussian models. Incorporating covariates into the demographic inference framework enables the modeling of associations between the effective population size and covariates while accounting for uncertainty in population histories. Furthermore, it can lead to more precise estimates of population dynamics. We apply our model to four examples. We reconstruct the demographic history of raccoon rabies in North America and find a significant association with the spatiotemporal spread of the outbreak. Next, we examine the effective population size trajectory of the DENV-4 virus in Puerto Rico along with viral isolate count data and find similar cyclic patterns. We compare the population history of the HIV-1 CRF02_AG clade in Cameroon with HIV incidence and prevalence data and find that the effective population size is more reflective of incidence rate. Finally, we explore the hypothesis that the population dynamics of musk ox during the Late Quaternary period were related to climate change. [Coalescent; effective population size; Gaussian Markov random fields; phylodynamics; phylogenetics; population genetics.
Collapse
Affiliation(s)
- Mandev S Gill
- Department of Statistics, Columbia University, New York, NY 10027, USA
| | - Philippe Lemey
- Department of Microbiology and Immunology, Rega Institute, KU Leuven, Minderbroederstaat 10, 3000 Leuven, Belgium
| | - Shannon N Bennett
- Department of Microbiology, California Academy of Sciences, San Francisco, CA 94118, USA
| | - Roman Biek
- Boyd Orr Centre for Population and Ecosystem Health, Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G12 8QQ, UK
| | - Marc A Suchard
- Department of Biomathematics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, CA 90095, USA.,Department of Human Genetics, David Geffen School of Medicine at UCLA, Universtiy of California, Los Angeles, CA 90095, USA.,Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California, Los Angeles, CA 90095, USA
| |
Collapse
|
50
|
A convergence theorem for markov chains arising in population genetics and the coalescent with selfing. ADV APPL PROBAB 2016. [DOI: 10.1017/s000186780004739x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A simple convergence theorem for sequences of Markov chains is presented in order to derive new ‘convergence-to-the-coalescent’ results for diploid neutral population models.
For the so-called diploid Wright-Fisher model with selfing probability s and mutation rate θ, it is shown that the ancestral structure of n sampled genes can be treated in the framework of an n-coalescent with mutation rate ̃θ := θ(1-s/2), if the population size N is large and if the time is measured in units of (2-s)N generations.
Collapse
|