1
|
Deng Y, Nielsen R, Song YS. A previously reported bottleneck in human ancestry 900 kya is likely a statistical artifact. Genetics 2025; 229:1-3. [PMID: 39679949 PMCID: PMC11708913 DOI: 10.1093/genetics/iyae192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Accepted: 11/08/2024] [Indexed: 12/17/2024] Open
Abstract
It was recently reported that a severe ancient bottleneck occurred around 900 thousand years ago in the ancestry of African populations, while this signal is absent in non-African populations. Here, we present evidence to show that this finding is likely a statistical artifact.
Collapse
Affiliation(s)
- Yun Deng
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Rasmus Nielsen
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA 94720, USA
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA 94720, USA
- Department of Statistics, University of California, Berkeley, Berkeley, CA 94720, USA
- Center for GeoGenetics, University of Copenhagen, 1350 Copenhagen K, Denmark
| | - Yun S Song
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA 94720, USA
- Department of Statistics, University of California, Berkeley, Berkeley, CA 94720, USA
- Computer Science Division, University of California, Berkeley, Berkeley, CA 94720, USA
| |
Collapse
|
2
|
Patel RA, Weiß CL, Zhu H, Mostafavi H, Simons YB, Spence JP, Pritchard JK. Conditional frequency spectra as a tool for studying selection on complex traits in biobanks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.15.599126. [PMID: 38948697 PMCID: PMC11212903 DOI: 10.1101/2024.06.15.599126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Natural selection on complex traits is difficult to study in part due to the ascertainment inherent to genome-wide association studies (GWAS). The power to detect a trait-associated variant in GWAS is a function of frequency and effect size - but for traits under selection, the effect size of a variant determines the strength of selection against it, constraining its frequency. To account for GWAS ascertainment, we propose studying the joint distribution of allele frequencies across populations, conditional on the frequencies in the GWAS cohort. Before considering these conditional frequency spectra, we first characterized the impact of selection and non-equilibrium demography on allele frequency dynamics forwards and backwards in time. We then used these results to understand conditional frequency spectra under realistic human demography. Finally, we investigated empirical conditional frequency spectra for GWAS variants associated with 106 complex traits, finding compelling evidence for either stabilizing or purifying selection. Our results provide insight into polygenic score portability and other properties of variants ascertained with GWAS, highlighting the utility of conditional frequency spectra.
Collapse
Affiliation(s)
- Roshni A. Patel
- Department of Genetics, Stanford University School of Medicine, Stanford, CA
| | - Clemens L. Weiß
- Stanford Cancer Institute Core, Stanford University School of Medicine, Stanford, CA
| | - Huisheng Zhu
- Department of Biology, Stanford University, Stanford, CA
| | - Hakhamanesh Mostafavi
- Center for Human Genetics and Genomics, New York University School of Medicine, New York, NY
- Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY
| | | | - Jeffrey P. Spence
- Department of Genetics, Stanford University School of Medicine, Stanford, CA
| | - Jonathan K. Pritchard
- Department of Genetics, Stanford University School of Medicine, Stanford, CA
- Department of Biology, Stanford University, Stanford, CA
| |
Collapse
|
3
|
Tran LN, Sun CK, Struck TJ, Sajan M, Gutenkunst RN. Computationally Efficient Demographic History Inference from Allele Frequencies with Supervised Machine Learning. Mol Biol Evol 2024; 41:msae077. [PMID: 38636507 PMCID: PMC11082913 DOI: 10.1093/molbev/msae077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 04/08/2024] [Accepted: 04/12/2024] [Indexed: 04/20/2024] Open
Abstract
Inferring past demographic history of natural populations from genomic data is of central concern in many studies across research fields. Previously, our group had developed dadi, a widely used demographic history inference method based on the allele frequency spectrum (AFS) and maximum composite-likelihood optimization. However, dadi's optimization procedure can be computationally expensive. Here, we present donni (demography optimization via neural network inference), a new inference method based on dadi that is more efficient while maintaining comparable inference accuracy. For each dadi-supported demographic model, donni simulates the expected AFS for a range of model parameters then trains a set of Mean Variance Estimation neural networks using the simulated AFS. Trained networks can then be used to instantaneously infer the model parameters from future genomic data summarized by an AFS. We demonstrate that for many demographic models, donni can infer some parameters, such as population size changes, very well and other parameters, such as migration rates and times of demographic events, fairly well. Importantly, donni provides both parameter and confidence interval estimates from input AFS with accuracy comparable to parameters inferred by dadi's likelihood optimization while bypassing its long and computationally intensive evaluation process. donni's performance demonstrates that supervised machine learning algorithms may be a promising avenue for developing more sustainable and computationally efficient demographic history inference methods.
Collapse
Affiliation(s)
- Linh N Tran
- Genetics Graduate Interdisciplinary Program, University of Arizona, Tucson, AZ 85721, USA
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Connie K Sun
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Travis J Struck
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Mathews Sajan
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Ryan N Gutenkunst
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
4
|
Tran LN, Sun CK, Struck TJ, Sajan M, Gutenkunst RN. Computationally efficient demographic history inference from allele frequencies with supervised machine learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.05.24.542158. [PMID: 38405827 PMCID: PMC10888863 DOI: 10.1101/2023.05.24.542158] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Inferring past demographic history of natural populations from genomic data is of central concern in many studies across research fields. Previously, our group had developed dadi, a widely used demographic history inference method based on the allele frequency spectrum (AFS) and maximum composite likelihood optimization. However, dadi's optimization procedure can be computationally expensive. Here, we developed donni (demography optimization via neural network inference), a new inference method based on dadi that is more efficient while maintaining comparable inference accuracy. For each dadi-supported demographic model, donni simulates the expected AFS for a range of model parameters then trains a set of Mean Variance Estimation neural networks using the simulated AFS. Trained networks can then be used to instantaneously infer the model parameters from future input data AFS. We demonstrated that for many demographic models, donni can infer some parameters, such as population size changes, very well and other parameters, such as migration rates and times of demographic events, fairly well. Importantly, donni provides both parameter and confidence interval estimates from input AFS with accuracy comparable to parameters inferred by dadi's likelihood optimization while bypassing its long and computationally intensive evaluation process. donni's performance demonstrates that supervised machine learning algorithms may be a promising avenue for developing more sustainable and computationally efficient demographic history inference methods.
Collapse
Affiliation(s)
- Linh N. Tran
- Genetics Graduate Interdisciplinary Program, University of Arizona, Tucson, AZ, USA
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ, USA
| | - Connie K. Sun
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ, USA
| | - Travis J. Struck
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ, USA
| | - Mathews Sajan
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ, USA
| | - Ryan N. Gutenkunst
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ, USA
| |
Collapse
|
5
|
Legried B, Terhorst J. Rates of convergence in the two-island and isolation-with-migration models. Theor Popul Biol 2022; 147:16-27. [PMID: 36007782 DOI: 10.1016/j.tpb.2022.08.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2021] [Revised: 08/10/2022] [Accepted: 08/11/2022] [Indexed: 11/25/2022]
Abstract
A number of powerful demographic inference methods have been developed in recent years, with the goal of fitting rich evolutionary models to genetic data obtained from many populations. In this paper we investigate the statistical performance of these methods in the specific case where there is continuous migration between populations. Compared with earlier work, migration significantly complicates the theoretical analysis and requires new techniques. We employ the theories of phase-type distributions and concentration of measure in order to study the two-island and isolation-with-migration models, resulting in both upper and lower bounds on rates of convergence for parametric estimators in migration models. For the upper bounds, we consider inferring rates of coalescent and migration on the basis of directly observing pairwise coalescent times, and, more realistically, when (conditionally) Poisson-distributed mutations dropped on latent trees are observed. We complement these upper bounds with information-theoretic lower bounds which establish a limit, in terms of sample size, below which inference is effectively impossible.
Collapse
Affiliation(s)
- Brandon Legried
- Department of Statistics, University of Michigan, United States of America
| | - Jonathan Terhorst
- Department of Statistics, University of Michigan, United States of America.
| |
Collapse
|
6
|
DeWitt WS, Harris KD, Ragsdale AP, Harris K. Nonparametric coalescent inference of mutation spectrum history and demography. Proc Natl Acad Sci U S A 2021; 118:e2013798118. [PMID: 34016747 PMCID: PMC8166128 DOI: 10.1073/pnas.2013798118] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
As populations boom and bust, the accumulation of genetic diversity is modulated, encoding histories of living populations in present-day variation. Many methods exist to decode these histories, and all must make strong model assumptions. It is typical to assume that mutations accumulate uniformly across the genome at a constant rate that does not vary between closely related populations. However, recent work shows that mutational processes in human and great ape populations vary across genomic regions and evolve over time. This perturbs the mutation spectrum (relative mutation rates in different local nucleotide contexts). Here, we develop theoretical tools in the framework of Kingman's coalescent to accommodate mutation spectrum dynamics. We present mutation spectrum history inference (mushi), a method to perform nonparametric inference of demographic and mutation spectrum histories from allele frequency data. We use mushi to reconstruct trajectories of effective population size and mutation spectrum divergence between human populations, identify mutation signatures and their dynamics in different human populations, and calibrate the timing of a previously reported mutational pulse in the ancestors of Europeans. We show that mutation spectrum histories can be placed in a well-studied theoretical setting and rigorously inferred from genomic variation data, like other features of evolutionary history.
Collapse
Affiliation(s)
- William S DeWitt
- Department of Genome Sciences, University of Washington, Seattle, WA 98195;
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109
| | - Kameron Decker Harris
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA 98195
- Department of Biology, University of Washington, Seattle, WA 98195
| | - Aaron P Ragsdale
- National Laboratory of Genomics for Biodiversity, Unit of Advanced Genomics, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional, Irapuato, Mexico 36821
| | - Kelley Harris
- Department of Genome Sciences, University of Washington, Seattle, WA 98195;
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109
| |
Collapse
|
7
|
Rougemont Q, Moore JS, Leroy T, Normandeau E, Rondeau EB, Withler RE, Van Doornik DM, Crane PA, Naish KA, Garza JC, Beacham TD, Koop BF, Bernatchez L. Demographic history shaped geographical patterns of deleterious mutation load in a broadly distributed Pacific Salmon. PLoS Genet 2020; 16:e1008348. [PMID: 32845885 PMCID: PMC7478589 DOI: 10.1371/journal.pgen.1008348] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 09/08/2020] [Accepted: 06/24/2020] [Indexed: 12/24/2022] Open
Abstract
A thorough reconstruction of historical processes is essential for a comprehensive understanding of the mechanisms shaping patterns of genetic diversity. Indeed, past and current conditions influencing effective population size have important evolutionary implications for the efficacy of selection, increased accumulation of deleterious mutations, and loss of adaptive potential. Here, we gather extensive genome-wide data that represent the extant diversity of the Coho salmon (Oncorhynchus kisutch) to address two objectives. We demonstrate that a single glacial refugium is the source of most of the present-day genetic diversity, with detectable inputs from a putative secondary micro-refugium. We found statistical support for a scenario whereby ancestral populations located south of the ice sheets expanded recently, swamping out most of the diversity from other putative micro-refugia. Demographic inferences revealed that genetic diversity was also affected by linked selection in large parts of the genome. Moreover, we demonstrate that the recent demographic history of this species generated regional differences in the load of deleterious mutations among populations, a finding that mirrors recent results from human populations and provides increased support for models of expansion load. We propose that insights from these historical inferences should be better integrated in conservation planning of wild organisms, which currently focuses largely on neutral genetic diversity and local adaptation, with the role of potentially maladaptive variation being generally ignored. Reconstruction of a species’ past demographic history from genetic data can highlight historical factors that have shaped the distribution of genetic diversity along its genome and its geographic range. Here, we combine genotyping-by-sequencing with demographic modelling to address these issues in the Coho salmon, a Pacific salmon of conservation concern in some parts of its range, notably in the south. Our demographic reconstructions reveal a linear decrease in genetic diversity toward the north of the species range, supporting the hypothesis of a northern route of postglacial recolonization from a single major southern refugium. As predicted by theory, we also observed a higher proportion of deleterious mutations in the most distant populations from this refugium. Beyond this general pattern, among-site variation in the proportion of deleterious mutations is consistent with different local trends in effective population sizes. Our results highlight the potential importance of understanding historical factors that have shaped geographic patterns of the distribution of deleterious mutations in order to implement effective management programs for the conservation of wild populations. Such fundamental knowledge of human historical demography is now having major impacts on health sciences, and we argue it is time to integrate such approaches in conservation science as well.
Collapse
Affiliation(s)
- Quentin Rougemont
- Département de Biologie, Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Québec, Canada
- * E-mail:
| | - Jean-Sébastien Moore
- Département de Biologie, Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Québec, Canada
| | - Thibault Leroy
- ISEM, Univ. Montpellier, CNRS, EPHE, IRD, Montpellier, France
- Department of Botany & Biodiversity Research, University of Vienna, Vienna, Austria
| | - Eric Normandeau
- Département de Biologie, Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Québec, Canada
| | - Eric B. Rondeau
- Centre for Biomedical Research, University of Victoria, Victoria, BC, Canada
- Department of Biology, University of Victoria, Victoria, BC, Canada
| | - Ruth E. Withler
- Department of Fisheries and Ocean, Pacific Biological Station, Nanaimo, British Columbia, Canada
| | - Donald M. Van Doornik
- National Oceanic and Atmospheric Administration, National Marine Fisheries Service, Northwest Fisheries Science Center, Manchester Research Station, Port Orchard, Washington, United States of America
| | - Penelope A. Crane
- Conservation Genetics Laboratory, U.S. Fish and Wildlife Service, Anchorage, Alaska, United States of America
| | - Kerry A. Naish
- School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA, United States of America
| | - John Carlos Garza
- Fisheries Ecology Division, Southwest Fisheries Science Center, National Marine Fisheries Service and Institute of Marine Sciences, University of California–Santa Cruz, Santa Cruz, California, United States of America
| | - Terry D. Beacham
- Department of Fisheries and Ocean, Pacific Biological Station, Nanaimo, British Columbia, Canada
| | - Ben F. Koop
- Centre for Biomedical Research, University of Victoria, Victoria, BC, Canada
- Department of Biology, University of Victoria, Victoria, BC, Canada
| | - Louis Bernatchez
- Département de Biologie, Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Québec, Canada
| |
Collapse
|
8
|
Patton AH, Margres MJ, Stahlke AR, Hendricks S, Lewallen K, Hamede RK, Ruiz-Aravena M, Ryder O, McCallum HI, Jones ME, Hohenlohe PA, Storfer A. Contemporary Demographic Reconstruction Methods Are Robust to Genome Assembly Quality: A Case Study in Tasmanian Devils. Mol Biol Evol 2020; 36:2906-2921. [PMID: 31424552 PMCID: PMC6878949 DOI: 10.1093/molbev/msz191] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Reconstructing species’ demographic histories is a central focus of molecular ecology and evolution. Recently, an expanding suite of methods leveraging either the sequentially Markovian coalescent (SMC) or the site-frequency spectrum has been developed to reconstruct population size histories from genomic sequence data. However, few studies have investigated the robustness of these methods to genome assemblies of varying quality. In this study, we first present an improved genome assembly for the Tasmanian devil using the Chicago library method. Compared with the original reference genome, our new assembly reduces the number of scaffolds (from 35,975 to 10,010) and increases the scaffold N90 (from 0.101 to 2.164 Mb). Second, we assess the performance of four contemporary genomic methods for inferring population size history (PSMC, MSMC, SMC++, Stairway Plot), using the two devil genome assemblies as well as simulated, artificially fragmented genomes that approximate the hypothesized demographic history of Tasmanian devils. We demonstrate that each method is robust to assembly quality, producing similar estimates of Ne when simulated genomes were fragmented into up to 5,000 scaffolds. Overall, methods reliant on the SMC are most reliable between ∼300 generations before present (gbp) and 100 kgbp, whereas methods exclusively reliant on the site-frequency spectrum are most reliable between the present and 30 gbp. Our results suggest that when used in concert, genomic methods for reconstructing species’ effective population size histories 1) can be applied to nonmodel organisms without highly contiguous reference genomes, and 2) are capable of detecting independently documented effects of historical geological events.
Collapse
Affiliation(s)
- Austin H Patton
- School of Biological Sciences, Washington State University, Pullman, WA
| | - Mark J Margres
- School of Biological Sciences, Washington State University, Pullman, WA.,Department of Organismic and Evolutionary Biology, Harvard University, MA
| | - Amanda R Stahlke
- Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID
| | - Sarah Hendricks
- Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID
| | - Kevin Lewallen
- Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID
| | - Rodrigo K Hamede
- School of Natural Sciences, University of Tasmania, Hobart, Australia
| | | | - Oliver Ryder
- Institute for Conservation Research, San Diego, CA
| | | | - Menna E Jones
- School of Natural Sciences, University of Tasmania, Hobart, Australia
| | - Paul A Hohenlohe
- Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID
| | - Andrew Storfer
- School of Biological Sciences, Washington State University, Pullman, WA
| |
Collapse
|
9
|
Kamm J, Terhorst J, Durbin R, Song YS. Efficiently inferring the demographic history of many populations with allele count data. J Am Stat Assoc 2019; 115:1472-1487. [PMID: 33012903 PMCID: PMC7531012 DOI: 10.1080/01621459.2019.1635482] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Revised: 04/14/2019] [Accepted: 06/08/2019] [Indexed: 01/06/2023]
Abstract
The sample frequency spectrum (SFS), or histogram of allele counts, is an important summary statistic in evolutionary biology, and is often used to infer the history of population size changes, migrations, and other demographic events affecting a set of populations. The expected multipopulation SFS under a given demographic model can be efficiently computed when the populations in the model are related by a tree, scaling to hundreds of populations. Admixture, back-migration, and introgression are common natural processes that violate the assumption of a tree-like population history, however, and until now the expected SFS could be computed for only a handful of populations when the demographic history is not a tree. In this article, we present a new method for efficiently computing the expected SFS and linear functionals of it, for demographies described by general directed acyclic graphs. This method can scale to more populations than p reviously possible for complex demographic histories including admixture. We apply our method to an 8-population SFS to estimate the timing and strength of a proposed "basal Eurasian" admixture event in human history. We implement and release our method in a new open-source software package momi2.
Collapse
Affiliation(s)
- Jack Kamm
- Wellcome Sanger Institute, Hinxton, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
- Chan Zuckerberg Biohub, San Francisco, USA
| | | | - Richard Durbin
- Wellcome Sanger Institute, Hinxton, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - Yun S. Song
- Computer Science Division, University of California, Berkeley, USA
- Department of Statistics, University of California, Berkeley, USA
- Chan Zuckerberg Biohub, San Francisco, USA
| |
Collapse
|
10
|
Spence JP, Steinrücken M, Terhorst J, Song YS. Inference of population history using coalescent HMMs: review and outlook. Curr Opin Genet Dev 2018; 53:70-76. [PMID: 30056275 PMCID: PMC6296859 DOI: 10.1016/j.gde.2018.07.002] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Revised: 07/08/2018] [Accepted: 07/09/2018] [Indexed: 01/02/2023]
Abstract
Studying how diverse human populations are related is of historical and anthropological interest, in addition to providing a realistic null model for testing for signatures of natural selection or disease associations. Furthermore, understanding the demographic histories of other species is playing an increasingly important role in conservation genetics. A number of statistical methods have been developed to infer population demographic histories using whole-genome sequence data, with recent advances focusing on allowing for more flexible modeling choices, scaling to larger data sets, and increasing statistical power. Here we review coalescent hidden Markov models, a powerful class of population genetic inference methods that can utilize linkage disequilibrium information effectively. We highlight recent advances, give advice for practitioners, point out potential pitfalls, and present possible future research directions.
Collapse
Affiliation(s)
- Jeffrey P Spence
- Computational Biology Graduate Group, University of California, Berkeley, United States
| | | | | | - Yun S Song
- Computer Science Division and Department of Statistics, University of California, Berkeley, United States; Chan Zuckerberg Biohub, San Francisco, United States.
| |
Collapse
|
11
|
Ragsdale AP, Moreau C, Gravel S. Genomic inference using diffusion models and the allele frequency spectrum. Curr Opin Genet Dev 2018; 53:140-147. [PMID: 30366252 DOI: 10.1016/j.gde.2018.10.001] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Revised: 09/14/2018] [Accepted: 10/07/2018] [Indexed: 01/25/2023]
Abstract
Evolutionary, biological, and demographic processes together shape observed variation in populations. Understanding how these processes influence variation allows us to infer past demography and the nature of selection in populations. Forward in time models such as the diffusion approximation provide a powerful tool for performing inference based on the distribution of allele frequencies. Here, we discuss recent computational developments and their application to reconstructing human demographic history. Using whole-genome sequence data for 797 French Canadian individuals, we assess the neutrality of synonymous variants and show that selection can bias inferred demography, mutation rates, and distributions of fitness effects. We argue that the simple evolutionary models investigated by Kimura and Ohta still provide important insight into modern genetic research.
Collapse
Affiliation(s)
- Aaron P Ragsdale
- Department of Human Genetics, McGill University, Montreal, QC, Canada
| | - Claudia Moreau
- Département des Sciences Fondamentales, Université du Québec à Chicoutimi, Chicoutimi, QC, Canada
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montreal, QC, Canada.
| |
Collapse
|
12
|
Cayuela H, Rougemont Q, Prunier JG, Moore JS, Clobert J, Besnard A, Bernatchez L. Demographic and genetic approaches to study dispersal in wild animal populations: A methodological review. Mol Ecol 2018; 27:3976-4010. [DOI: 10.1111/mec.14848] [Citation(s) in RCA: 85] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Revised: 08/17/2018] [Accepted: 08/19/2018] [Indexed: 12/31/2022]
Affiliation(s)
- Hugo Cayuela
- Institut de Biologie Intégrative et des Systèmes (IBIS); Université Laval; Québec City Québec Canada
| | - Quentin Rougemont
- Institut de Biologie Intégrative et des Systèmes (IBIS); Université Laval; Québec City Québec Canada
| | - Jérôme G. Prunier
- Station d'Ecologie Théorique et Expérimentale; Unité Mixte de Recherche (UMR) 5321; Centre National de la Recherche Scientifique (CNRS); Université Paul Sabatier (UPS); Moulis France
| | - Jean-Sébastien Moore
- Institut de Biologie Intégrative et des Systèmes (IBIS); Université Laval; Québec City Québec Canada
| | - Jean Clobert
- Station d'Ecologie Théorique et Expérimentale; Unité Mixte de Recherche (UMR) 5321; Centre National de la Recherche Scientifique (CNRS); Université Paul Sabatier (UPS); Moulis France
| | - Aurélien Besnard
- CNRS; PSL Research University; EPHE; UM, SupAgro, IRD; INRA; UMR 5175 CEFE; Montpellier France
| | - Louis Bernatchez
- Institut de Biologie Intégrative et des Systèmes (IBIS); Université Laval; Québec City Québec Canada
| |
Collapse
|
13
|
Geometry of the Sample Frequency Spectrum and the Perils of Demographic Inference. Genetics 2018; 210:665-682. [PMID: 30064984 DOI: 10.1534/genetics.118.300733] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Accepted: 07/30/2018] [Indexed: 11/18/2022] Open
Abstract
The sample frequency spectrum (SFS), which describes the distribution of mutant alleles in a sample of DNA sequences, is a widely used summary statistic in population genetics. The expected SFS has a strong dependence on the historical population demography and this property is exploited by popular statistical methods to infer complex demographic histories from DNA sequence data. Most, if not all, of these inference methods exhibit pathological behavior, however. Specifically, they often display runaway behavior in optimization, where the inferred population sizes and epoch durations can degenerate to zero or diverge to infinity, and show undesirable sensitivity to perturbations in the data. The goal of this article is to provide theoretical insights into why such problems arise. To this end, we characterize the geometry of the expected SFS for piecewise-constant demographies and use our results to show that the aforementioned pathological behavior of popular inference methods is intrinsic to the geometry of the expected SFS. We provide explicit descriptions and visualizations for a toy model, and generalize our intuition to arbitrary sample sizes using tools from convex and algebraic geometry. We also develop a universal characterization result which shows that the expected SFS of a sample of size n under an arbitrary population history can be recapitulated by a piecewise-constant demography with only [Formula: see text] epochs, where [Formula: see text] is between [Formula: see text] and [Formula: see text] The set of expected SFS for piecewise-constant demographies with fewer than [Formula: see text] epochs is open and nonconvex, which causes the above phenomena for inference from data.
Collapse
|