1
|
Patel RA, Weiß CL, Zhu H, Mostafavi H, Simons YB, Spence JP, Pritchard JK. Conditional frequency spectra as a tool for studying selection on complex traits in biobanks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.15.599126. [PMID: 38948697 PMCID: PMC11212903 DOI: 10.1101/2024.06.15.599126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Natural selection on complex traits is difficult to study in part due to the ascertainment inherent to genome-wide association studies (GWAS). The power to detect a trait-associated variant in GWAS is a function of frequency and effect size - but for traits under selection, the effect size of a variant determines the strength of selection against it, constraining its frequency. To account for GWAS ascertainment, we propose studying the joint distribution of allele frequencies across populations, conditional on the frequencies in the GWAS cohort. Before considering these conditional frequency spectra, we first characterized the impact of selection and non-equilibrium demography on allele frequency dynamics forwards and backwards in time. We then used these results to understand conditional frequency spectra under realistic human demography. Finally, we investigated empirical conditional frequency spectra for GWAS variants associated with 106 complex traits, finding compelling evidence for either stabilizing or purifying selection. Our results provide insight into polygenic score portability and other properties of variants ascertained with GWAS, highlighting the utility of conditional frequency spectra.
Collapse
|
2
|
Tran LN, Sun CK, Struck TJ, Sajan M, Gutenkunst RN. Computationally Efficient Demographic History Inference from Allele Frequencies with Supervised Machine Learning. Mol Biol Evol 2024; 41:msae077. [PMID: 38636507 PMCID: PMC11082913 DOI: 10.1093/molbev/msae077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 04/08/2024] [Accepted: 04/12/2024] [Indexed: 04/20/2024] Open
Abstract
Inferring past demographic history of natural populations from genomic data is of central concern in many studies across research fields. Previously, our group had developed dadi, a widely used demographic history inference method based on the allele frequency spectrum (AFS) and maximum composite-likelihood optimization. However, dadi's optimization procedure can be computationally expensive. Here, we present donni (demography optimization via neural network inference), a new inference method based on dadi that is more efficient while maintaining comparable inference accuracy. For each dadi-supported demographic model, donni simulates the expected AFS for a range of model parameters then trains a set of Mean Variance Estimation neural networks using the simulated AFS. Trained networks can then be used to instantaneously infer the model parameters from future genomic data summarized by an AFS. We demonstrate that for many demographic models, donni can infer some parameters, such as population size changes, very well and other parameters, such as migration rates and times of demographic events, fairly well. Importantly, donni provides both parameter and confidence interval estimates from input AFS with accuracy comparable to parameters inferred by dadi's likelihood optimization while bypassing its long and computationally intensive evaluation process. donni's performance demonstrates that supervised machine learning algorithms may be a promising avenue for developing more sustainable and computationally efficient demographic history inference methods.
Collapse
Affiliation(s)
- Linh N Tran
- Genetics Graduate Interdisciplinary Program, University of Arizona, Tucson, AZ 85721, USA
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Connie K Sun
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Travis J Struck
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Mathews Sajan
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Ryan N Gutenkunst
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
3
|
Tran LN, Sun CK, Struck TJ, Sajan M, Gutenkunst RN. Computationally efficient demographic history inference from allele frequencies with supervised machine learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.05.24.542158. [PMID: 38405827 PMCID: PMC10888863 DOI: 10.1101/2023.05.24.542158] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Inferring past demographic history of natural populations from genomic data is of central concern in many studies across research fields. Previously, our group had developed dadi, a widely used demographic history inference method based on the allele frequency spectrum (AFS) and maximum composite likelihood optimization. However, dadi's optimization procedure can be computationally expensive. Here, we developed donni (demography optimization via neural network inference), a new inference method based on dadi that is more efficient while maintaining comparable inference accuracy. For each dadi-supported demographic model, donni simulates the expected AFS for a range of model parameters then trains a set of Mean Variance Estimation neural networks using the simulated AFS. Trained networks can then be used to instantaneously infer the model parameters from future input data AFS. We demonstrated that for many demographic models, donni can infer some parameters, such as population size changes, very well and other parameters, such as migration rates and times of demographic events, fairly well. Importantly, donni provides both parameter and confidence interval estimates from input AFS with accuracy comparable to parameters inferred by dadi's likelihood optimization while bypassing its long and computationally intensive evaluation process. donni's performance demonstrates that supervised machine learning algorithms may be a promising avenue for developing more sustainable and computationally efficient demographic history inference methods.
Collapse
Affiliation(s)
- Linh N. Tran
- Genetics Graduate Interdisciplinary Program, University of Arizona, Tucson, AZ, USA
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ, USA
| | - Connie K. Sun
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ, USA
| | - Travis J. Struck
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ, USA
| | - Mathews Sajan
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ, USA
| | - Ryan N. Gutenkunst
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ, USA
| |
Collapse
|
4
|
Guerrero Montero J, Blythe RA. Self-contained Beta-with-Spikes approximation for inference under a Wright-Fisher model. Genetics 2023; 225:iyad092. [PMID: 37226886 PMCID: PMC10550310 DOI: 10.1093/genetics/iyad092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 03/10/2023] [Accepted: 05/10/2023] [Indexed: 05/26/2023] Open
Abstract
We construct a reliable estimation method for evolutionary parameters within the Wright-Fisher model, which describes changes in allele frequencies due to selection and genetic drift, from time-series data. Such data exist for biological populations, for example via artificial evolution experiments, and for the cultural evolution of behavior, such as linguistic corpora that document historical usage of different words with similar meanings. Our method of analysis builds on a Beta-with-Spikes approximation to the distribution of allele frequencies predicted by the Wright-Fisher model. We introduce a self-contained scheme for estimating parameters in the approximation, and demonstrate its robustness with synthetic data, especially in the strong-selection and near-extinction regimes where previous approaches fail. We further apply the method to allele frequency data for baker's yeast (Saccharomyces cerevisiae), finding a significant signal of selection in cases where independent evidence supports such a conclusion. We further demonstrate the possibility of detecting time points at which evolutionary parameters change in the context of a historical spelling reform in the Spanish language.
Collapse
Affiliation(s)
- Juan Guerrero Montero
- SUPA, School of Physics and Astronomy, University of Edinburgh, Edinburgh, EH9 3FD, UK
| | - Richard A Blythe
- Corresponding author: SUPA, School of Physics and Astronomy, University of Edinburgh, Edinburgh EH9 3FD, UK.
| |
Collapse
|
5
|
Johnson KE, Adams CJ, Voight BF. Identifying rare variants inconsistent with identity-by-descent in population-scale whole-genome sequencing data. Methods Ecol Evol 2022; 13:2429-2442. [PMID: 38938451 PMCID: PMC11210625 DOI: 10.1111/2041-210x.13991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 09/12/2022] [Indexed: 12/01/2022]
Abstract
Analyses of genetic variation typically assume that rare variants within a population are inherited from a single common ancestral event identity-by-descent (IBD). However, there are genetic and technical processes through which rare variants in population genetic data may deviate from this simple evolutionary model, including recurrent mutations, gene conversions and genotyping error. All these processes can decrease the expected length of shared background haplotype surrounding a rare variant if that variant was inherited from a single event descending from a common ancestor. No method exists to computationally infer rare variants inconsistent with this simple model-denoted here as 'IBD-inconsistent'-using unphased population sequencing data.We hypothesized that the difference in shared haplotype background length can distinguish variants consistent and inconsistent with this simple IBD transmission population sequencing data without pedigree information. We implemented a Bayesian hierarchical model and used Gibbs sampling to estimate the posterior probability of IBD state for rare variants, using simulated recurrent mutations to demonstrate that our approach accurately distinguishes rare variants consistent and inconsistent with a simple IBD inheritance model.Applying our method to whole-genome sequencing data from 3,621 human individuals in the UK10K consortium, we found that IBD-inconsistent variants correlated with higher local mutation rates and genomic features like replication timing. Using a heuristic to categorize IBD-inconsistent variants as gene conversions, we found that potential gene conversions had expected properties such as enriched local GC content.By identifying IBD-inconsistent variants, we can better understand the spectrum of recent mutations in human populations, a source of genetic variation driving evolution and a key factor in understanding recent demographic history.
Collapse
Affiliation(s)
- Kelsey E. Johnson
- Cell and Molecular Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Christopher J. Adams
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Benjamin F. Voight
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
6
|
Chak STC, Harris SE, Hultgren KM, Duffy JE, Rubenstein DR. Demographic inference provides insights into the extirpation and ecological dominance of eusocial snapping shrimps. J Hered 2022; 113:552-562. [PMID: 35921239 DOI: 10.1093/jhered/esac035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 07/27/2022] [Indexed: 11/14/2022] Open
Abstract
Although eusocial animals often achieve ecological dominance in the ecosystems where they occur, many populations are unstable, resulting in local extinction. Both patterns may be linked to the characteristic demography of eusocial species-high reproductive skew and reproductive division of labor support stable effective population sizes that make eusocial groups more competitive in some species, but also lower effective population sizes that increase susceptibility to population collapse in others. Here, we examine the relationship between demography and social organization in Synalpheus snapping shrimps, a group in which eusociality has evolved recently and repeatedly. We show using coalescent demographic modelling that eusocial species have had lower but more stable effective population sizes across 100,000 generations. Our results are consistent with the idea that stable population sizes may enable competitive dominance in eusocial shrimps, but they also suggest that recent population declines are likely caused by eusocial shrimps' heightened sensitivity to environmental changes, perhaps as a result of their low effective population sizes and localized dispersal. Thus, although the unique life histories and demography of eusocial shrimps have likely contributed to their persistence and ecological dominance over evolutionary timescales, these social traits may also make them vulnerable to contemporary environmental change.
Collapse
Affiliation(s)
- Solomon T C Chak
- Department of Ecology, Evolution and Environmental Biology, Columbia University, New York, NY, USA.,Department of Biological Sciences, New Jersey Institute of Technology, Newark, NJ, USA.,Department of Biological Sciences, SUNY College at Old Westbury, Old Westbury, NY, USA
| | - Stephen E Harris
- Department of Ecology, Evolution and Environmental Biology, Columbia University, New York, NY, USA.,Biology Department, SUNY Purchase College, Purchase, NY, USA
| | | | - J Emmett Duffy
- Tennenbaum Marine Observatories Network, Smithsonian Institution, Edgewater, MD, USA
| | - Dustin R Rubenstein
- Department of Ecology, Evolution and Environmental Biology, Columbia University, New York, NY, USA
| |
Collapse
|
7
|
Clemente F, Unterländer M, Dolgova O, Amorim CEG, Coroado-Santos F, Neuenschwander S, Ganiatsou E, Cruz Dávalos DI, Anchieri L, Michaud F, Winkelbach L, Blöcher J, Arizmendi Cárdenas YO, Sousa da Mota B, Kalliga E, Souleles A, Kontopoulos I, Karamitrou-Mentessidi G, Philaniotou O, Sampson A, Theodorou D, Tsipopoulou M, Akamatis I, Halstead P, Kotsakis K, Urem-Kotsou D, Panagiotopoulos D, Ziota C, Triantaphyllou S, Delaneau O, Jensen JD, Moreno-Mayar JV, Burger J, Sousa VC, Lao O, Malaspinas AS, Papageorgopoulou C. The genomic history of the Aegean palatial civilizations. Cell 2021; 184:2565-2586.e21. [PMID: 33930288 PMCID: PMC8127963 DOI: 10.1016/j.cell.2021.03.039] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Revised: 09/17/2020] [Accepted: 03/18/2021] [Indexed: 12/30/2022]
Abstract
The Cycladic, the Minoan, and the Helladic (Mycenaean) cultures define the Bronze Age (BA) of Greece. Urbanism, complex social structures, craft and agricultural specialization, and the earliest forms of writing characterize this iconic period. We sequenced six Early to Middle BA whole genomes, along with 11 mitochondrial genomes, sampled from the three BA cultures of the Aegean Sea. The Early BA (EBA) genomes are homogeneous and derive most of their ancestry from Neolithic Aegeans, contrary to earlier hypotheses that the Neolithic-EBA cultural transition was due to massive population turnover. EBA Aegeans were shaped by relatively small-scale migration from East of the Aegean, as evidenced by the Caucasus-related ancestry also detected in Anatolians. In contrast, Middle BA (MBA) individuals of northern Greece differ from EBA populations in showing ∼50% Pontic-Caspian Steppe-related ancestry, dated at ca. 2,600-2,000 BCE. Such gene flow events during the MBA contributed toward shaping present-day Greek genomes.
Collapse
Affiliation(s)
- Florian Clemente
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Martina Unterländer
- Laboratory of Physical Anthropology, Department of History and Ethnology, Democritus University of Thrace, 69100 Komotini, Greece; Palaeogenetics Group, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University of Mainz, 55099 Mainz, Germany
| | - Olga Dolgova
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, 08028 Barcelona, Spain
| | - Carlos Eduardo G Amorim
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Francisco Coroado-Santos
- CE3C, Centre for Ecology, Evolution and Environmental Changes, Faculty of Sciences of the University of Lisbon, 1749-016 Lisbon, Portugal
| | - Samuel Neuenschwander
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; Vital-IT, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Elissavet Ganiatsou
- Laboratory of Physical Anthropology, Department of History and Ethnology, Democritus University of Thrace, 69100 Komotini, Greece
| | - Diana I Cruz Dávalos
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Lucas Anchieri
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Frédéric Michaud
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Laura Winkelbach
- Palaeogenetics Group, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University of Mainz, 55099 Mainz, Germany
| | - Jens Blöcher
- Palaeogenetics Group, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University of Mainz, 55099 Mainz, Germany
| | - Yami Ommar Arizmendi Cárdenas
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Bárbara Sousa da Mota
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Eleni Kalliga
- Laboratory of Physical Anthropology, Department of History and Ethnology, Democritus University of Thrace, 69100 Komotini, Greece
| | - Angelos Souleles
- Laboratory of Physical Anthropology, Department of History and Ethnology, Democritus University of Thrace, 69100 Komotini, Greece
| | - Ioannis Kontopoulos
- Center for GeoGenetics, GLOBE Institute, University of Copenhagen, 1350 Copenhagen, Denmark
| | | | - Olga Philaniotou
- Ephor Emerita of Antiquities, Hellenic Ministry of Culture and Sports, 10682 Athens, Greece
| | - Adamantios Sampson
- Department of Mediterranean Studies, University of the Aegean, 85132 Rhodes, Greece
| | - Dimitra Theodorou
- Ephorate of Antiquities of Kozani, Hellenic Ministry of Culture and Sports, 50004 Kozani, Greece
| | - Metaxia Tsipopoulou
- Ephor Emerita of Antiquities, Hellenic Ministry of Culture and Sports, 10682 Athens, Greece
| | - Ioannis Akamatis
- Department of History and Archaeology, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
| | - Paul Halstead
- Department of Archaeology, University of Sheffield, Minalloy House, 10-16 Regent St., Sheffield S1 3NJ, UK
| | - Kostas Kotsakis
- Department of History and Archaeology, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
| | - Dushka Urem-Kotsou
- Department of History and Ethnology, Democritus University of Thrace, 69100 Komotini, Greece
| | - Diamantis Panagiotopoulos
- Institute of Classical Archaeology, University of Heidelberg, Marstallhof 4, 69117 Heidelberg, Germany
| | - Christina Ziota
- Ephorate of Antiquities of Florina, Hellenic Ministry of Culture and Sports, 53100 Florina, Greece
| | - Sevasti Triantaphyllou
- Department of History and Archaeology, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
| | - Olivier Delaneau
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| | - J Víctor Moreno-Mayar
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; Center for GeoGenetics, GLOBE Institute, University of Copenhagen, 1350 Copenhagen, Denmark; National Institute of Genomic Medicine (INMEGEN), 14610 Mexico City, Mexico
| | - Joachim Burger
- Palaeogenetics Group, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University of Mainz, 55099 Mainz, Germany
| | - Vitor C Sousa
- CE3C, Centre for Ecology, Evolution and Environmental Changes, Faculty of Sciences of the University of Lisbon, 1749-016 Lisbon, Portugal
| | - Oscar Lao
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, 08028 Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Anna-Sapfo Malaspinas
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.
| | - Christina Papageorgopoulou
- Laboratory of Physical Anthropology, Department of History and Ethnology, Democritus University of Thrace, 69100 Komotini, Greece.
| |
Collapse
|
8
|
Harris AM, DeGiorgio M. A Likelihood Approach for Uncovering Selective Sweep Signatures from Haplotype Data. Mol Biol Evol 2021; 37:3023-3046. [PMID: 32392293 PMCID: PMC7530616 DOI: 10.1093/molbev/msaa115] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Selective sweeps are frequent and varied signatures in the genomes of natural populations, and detecting them is consequently important in understanding mechanisms of adaptation by natural selection. Following a selective sweep, haplotypic diversity surrounding the site under selection decreases, and this deviation from the background pattern of variation can be applied to identify sweeps. Multiple methods exist to locate selective sweeps in the genome from haplotype data, but none leverages the power of a model-based approach to make their inference. Here, we propose a likelihood ratio test statistic T to probe whole-genome polymorphism data sets for selective sweep signatures. Our framework uses a simple but powerful model of haplotype frequency spectrum distortion to find sweeps and additionally make an inference on the number of presently sweeping haplotypes in a population. We found that the T statistic is suitable for detecting both hard and soft sweeps across a variety of demographic models, selection strengths, and ages of the beneficial allele. Accordingly, we applied the T statistic to variant calls from European and sub-Saharan African human populations, yielding primarily literature-supported candidates, including LCT, RSPH3, and ZNF211 in CEU, SYT1, RGS18, and NNT in YRI, and HLA genes in both populations. We also searched for sweep signatures in Drosophila melanogaster, finding expected candidates at Ace, Uhg1, and Pimet. Finally, we provide open-source software to compute the T statistic and the inferred number of presently sweeping haplotypes from whole-genome data.
Collapse
Affiliation(s)
- Alexandre M Harris
- Department of Biology, Pennsylvania State University, University Park, PA.,Molecular, Cellular, and Integrative Biosciences, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA
| | - Michael DeGiorgio
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL
| |
Collapse
|
9
|
Blischak PD, Barker MS, Gutenkunst RN. Inferring the Demographic History of Inbred Species from Genome-Wide SNP Frequency Data. Mol Biol Evol 2021; 37:2124-2136. [PMID: 32068861 DOI: 10.1093/molbev/msaa042] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Revised: 02/04/2020] [Accepted: 02/13/2020] [Indexed: 01/04/2023] Open
Abstract
Demographic inference using the site frequency spectrum (SFS) is a common way to understand historical events affecting genetic variation. However, most methods for estimating demography from the SFS assume random mating within populations, precluding these types of analyses in inbred populations. To address this issue, we developed a model for the expected SFS that includes inbreeding by parameterizing individual genotypes using beta-binomial distributions. We then take the convolution of these genotype probabilities to calculate the expected frequency of biallelic variants in the population. Using simulations, we evaluated the model's ability to coestimate demography and inbreeding using one- and two-population models across a range of inbreeding levels. We also applied our method to two empirical examples, American pumas (Puma concolor) and domesticated cabbage (Brassica oleracea var. capitata), inferring models both with and without inbreeding to compare parameter estimates and model fit. Our simulations showed that we are able to accurately coestimate demographic parameters and inbreeding even for highly inbred populations (F = 0.9). In contrast, failing to include inbreeding generally resulted in inaccurate parameter estimates in simulated data and led to poor model fit in our empirical analyses. These results show that inbreeding can have a strong effect on demographic inference, a pattern that was especially noticeable for parameters involving changes in population size. Given the importance of these estimates for informing practices in conservation, agriculture, and elsewhere, our method provides an important advancement for accurately estimating the demographic histories of these species.
Collapse
Affiliation(s)
- Paul D Blischak
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ.,Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ
| | - Michael S Barker
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ
| | - Ryan N Gutenkunst
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ
| |
Collapse
|
10
|
Inference of gene flow in the process of speciation: Efficient maximum-likelihood implementation of a generalised isolation-with-migration model. Theor Popul Biol 2021; 140:1-15. [PMID: 33736959 DOI: 10.1016/j.tpb.2021.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 02/28/2021] [Accepted: 03/01/2021] [Indexed: 11/21/2022]
Abstract
The 'isolation with migration' (IM) model has been extensively used in the literature to detect gene flow during the process of speciation. In this model, an ancestral population split into two or more descendant populations which subsequently exchanged migrants at a constant rate until the present. Of course, the assumption of constant gene flow until the present is often over-simplistic in the context of speciation. In this paper, we consider a 'generalised IM' (GIM) model: a two-population IM model in which migration rates and population sizes are allowed to change at some point in the past. By developing a maximum-likelihood implementation of this model, we enable inference on both historical and contemporary rates of gene flow between two closely related populations or species. The GIM model encompasses both the standard two-population IM model and the 'isolation with initial migration' (IIM) model as special cases, as well as a model of secondary contact. We examine for simulated data how our method can be used, by means of likelihood ratio tests or AIC scores, to distinguish between the following scenarios of population divergence: (a) divergence in complete isolation; (b) divergence with a period of gene flow followed by isolation; (c) divergence with a period of isolation followed by secondary contact; (d) divergence with ongoing gene flow. Our method is based on the coalescent and is suitable for data sets consisting of the number of nucleotide differences between one pair of DNA sequences at each of a large number of independent loci. As our method relies on an explicit expression for the likelihood, it is computationally very fast.
Collapse
|
11
|
Johri P, Riall K, Becher H, Excoffier L, Charlesworth B, Jensen JD. The Impact of Purifying and Background Selection on the Inference of Population History: Problems and Prospects. Mol Biol Evol 2021; 38:2986-3003. [PMID: 33591322 PMCID: PMC8233493 DOI: 10.1093/molbev/msab050] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Current procedures for inferring population history generally assume complete neutrality—that is, they neglect both direct selection and the effects of selection on linked sites. We here examine how the presence of direct purifying selection and background selection may bias demographic inference by evaluating two commonly-used methods (MSMC and fastsimcoal2), specifically studying how the underlying shape of the distribution of fitness effects and the fraction of directly selected sites interact with demographic parameter estimation. The results show that, even after masking functional genomic regions, background selection may cause the mis-inference of population growth under models of both constant population size and decline. This effect is amplified as the strength of purifying selection and the density of directly selected sites increases, as indicated by the distortion of the site frequency spectrum and levels of nucleotide diversity at linked neutral sites. We also show how simulated changes in background selection effects caused by population size changes can be predicted analytically. We propose a potential method for correcting for the mis-inference of population growth caused by selection. By treating the distribution of fitness effect as a nuisance parameter and averaging across all potential realizations, we demonstrate that even directly selected sites can be used to infer demographic histories with reasonable accuracy.
Collapse
Affiliation(s)
- Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Kellen Riall
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Hannes Becher
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Laurent Excoffier
- Institute of Ecology and Evolution, University of Berne, Berne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Brian Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
12
|
Johri P, Riall K, Becher H, Excoffier L, Charlesworth B, Jensen JD. The impact of purifying and background selection on the inference of population history: problems and prospects. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021. [PMID: 33501439 PMCID: PMC7836109 DOI: 10.1101/2020.04.28.066365] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Current procedures for inferring population history generally assume complete neutrality - that is, they neglect both direct selection and the effects of selection on linked sites. We here examine how the presence of direct purifying selection and background selection may bias demographic inference by evaluating two commonly-used methods (MSMC and fastsimcoal2), specifically studying how the underlying shape of the distribution of fitness effects (DFE) and the fraction of directly selected sites interact with demographic parameter estimation. The results show that, even after masking functional genomic regions, background selection may cause the mis-inference of population growth under models of both constant population size and decline. This effect is amplified as the strength of purifying selection and the density of directly selected sites increases, as indicated by the distortion of the site frequency spectrum and levels of nucleotide diversity at linked neutral sites. We also show how simulated changes in background selection effects caused by population size changes can be predicted analytically. We propose a potential method for correcting for the mis-inference of population growth caused by selection. By treating the DFE as a nuisance parameter and averaging across all potential realizations, we demonstrate that even directly selected sites can be used to infer demographic histories with reasonable accuracy.
Collapse
Affiliation(s)
- Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| | - Kellen Riall
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| | - Hannes Becher
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, EH9 3FL, United Kingdom
| | - Laurent Excoffier
- Institute of Ecology and Evolution, University of Berne, Berne 3012, Switzerland.,Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| | - Brian Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, EH9 3FL, United Kingdom
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| |
Collapse
|
13
|
Stoltz M, Baeumer B, Bouckaert R, Fox C, Hiscott G, Bryant D. Bayesian Inference of Species Trees using Diffusion Models. Syst Biol 2020; 70:145-161. [PMID: 33005955 DOI: 10.1093/sysbio/syaa051] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Revised: 06/19/2020] [Accepted: 06/23/2020] [Indexed: 11/13/2022] Open
Abstract
We describe a new and computationally efficient Bayesian methodology for inferring species trees and demographics from unlinked binary markers. Likelihood calculations are carried out using diffusion models of allele frequency dynamics combined with novel numerical algorithms. The diffusion approach allows for analysis of data sets containing hundreds or thousands of individuals. The method, which we call Snapper, has been implemented as part of the BEAST2 package. We conducted simulation experiments to assess numerical error, computational requirements, and accuracy recovering known model parameters. A reanalysis of soybean SNP data demonstrates that the models implemented in Snapp and Snapper can be difficult to distinguish in practice, a characteristic which we tested with further simulations. We demonstrate the scale of analysis possible using a SNP data set sampled from 399 fresh water turtles in 41 populations. [Bayesian inference; diffusion models; multi-species coalescent; SNP data; species trees; spectral methods.].
Collapse
Affiliation(s)
- Marnus Stoltz
- Department of Mathematics and Statistics, University of Otago, Dunedin 9054, New Zealand
| | - Boris Baeumer
- Department of Mathematics and Statistics, University of Otago, Dunedin 9054, New Zealand
| | - Remco Bouckaert
- Centre for Computational Evolution, University of Auckland, Auckland 1142, New Zealand
| | - Colin Fox
- Department of Physics, University of Otago, Dunedin 9054, New Zealand
| | - Gordon Hiscott
- Department of Mathematics and Statistics, University of Otago, Dunedin 9054, New Zealand
| | - David Bryant
- Department of Mathematics and Statistics, University of Otago, Dunedin 9054, New Zealand
| |
Collapse
|
14
|
Steinrücken M, Kamm J, Spence JP, Song YS. Inference of complex population histories using whole-genome sequences from multiple populations. Proc Natl Acad Sci U S A 2019; 116:17115-17120. [PMID: 31387977 PMCID: PMC6708337 DOI: 10.1073/pnas.1905060116] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
There has been much interest in analyzing genome-scale DNA sequence data to infer population histories, but inference methods developed hitherto are limited in model complexity and computational scalability. Here we present an efficient, flexible statistical method, diCal2, that can use whole-genome sequence data from multiple populations to infer complex demographic models involving population size changes, population splits, admixture, and migration. Applying our method to data from Australian, East Asian, European, and Papuan populations, we find that the population ancestral to Australians and Papuans started separating from East Asians and Europeans about 100,000 y ago, and that the separation of East Asians and Europeans started about 50,000 y ago, with pervasive gene flow between all pairs of populations.
Collapse
Affiliation(s)
- Matthias Steinrücken
- Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637
- Department of Human Genetics, University of Chicago, Chicago, IL 60637
| | - Jack Kamm
- Department of Statistics, University of California, Berkeley, CA 94720
- Chan Zuckerberg Biohub, San Francisco, CA 94158
| | - Jeffrey P Spence
- Computational Biology Graduate Group, University of California, Berkeley, CA 94720
| | - Yun S Song
- Department of Statistics, University of California, Berkeley, CA 94720;
- Chan Zuckerberg Biohub, San Francisco, CA 94158
- Computer Science Division, University of California, Berkeley, CA 94720
| |
Collapse
|
15
|
Kamm J, Terhorst J, Durbin R, Song YS. Efficiently inferring the demographic history of many populations with allele count data. J Am Stat Assoc 2019; 115:1472-1487. [PMID: 33012903 PMCID: PMC7531012 DOI: 10.1080/01621459.2019.1635482] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Revised: 04/14/2019] [Accepted: 06/08/2019] [Indexed: 01/06/2023]
Abstract
The sample frequency spectrum (SFS), or histogram of allele counts, is an important summary statistic in evolutionary biology, and is often used to infer the history of population size changes, migrations, and other demographic events affecting a set of populations. The expected multipopulation SFS under a given demographic model can be efficiently computed when the populations in the model are related by a tree, scaling to hundreds of populations. Admixture, back-migration, and introgression are common natural processes that violate the assumption of a tree-like population history, however, and until now the expected SFS could be computed for only a handful of populations when the demographic history is not a tree. In this article, we present a new method for efficiently computing the expected SFS and linear functionals of it, for demographies described by general directed acyclic graphs. This method can scale to more populations than p reviously possible for complex demographic histories including admixture. We apply our method to an 8-population SFS to estimate the timing and strength of a proposed "basal Eurasian" admixture event in human history. We implement and release our method in a new open-source software package momi2.
Collapse
Affiliation(s)
- Jack Kamm
- Wellcome Sanger Institute, Hinxton, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
- Chan Zuckerberg Biohub, San Francisco, USA
| | | | - Richard Durbin
- Wellcome Sanger Institute, Hinxton, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - Yun S. Song
- Computer Science Division, University of California, Berkeley, USA
- Department of Statistics, University of California, Berkeley, USA
- Chan Zuckerberg Biohub, San Francisco, USA
| |
Collapse
|
16
|
Approximate Bayesian computation with deep learning supports a third archaic introgression in Asia and Oceania. Nat Commun 2019; 10:246. [PMID: 30651539 PMCID: PMC6335398 DOI: 10.1038/s41467-018-08089-7] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Accepted: 12/12/2018] [Indexed: 01/24/2023] Open
Abstract
Since anatomically modern humans dispersed Out of Africa, the evolutionary history of Eurasian populations has been marked by introgressions from presently extinct hominins. Some of these introgressions have been identified using sequenced ancient genomes (Neanderthal and Denisova). Other introgressions have been proposed for still unidentified groups using the genetic diversity present in current human populations. We built a demographic model based on deep learning in an Approximate Bayesian Computation framework to infer the evolutionary history of Eurasian populations including past introgression events in Out of Africa populations fitting the current genetic evidence. In addition to the reported Neanderthal and Denisovan introgressions, our results support a third introgression in all Asian and Oceanian populations from an archaic population. This population is either related to the Neanderthal-Denisova clade or diverged early from the Denisova lineage. We propose the use of deep learning methods for clarifying situations with high complexity in evolutionary genomics.
Collapse
|
17
|
Beichman AC, Huerta-Sanchez E, Lohmueller KE. Using Genomic Data to Infer Historic Population Dynamics of Nonmodel Organisms. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2018. [DOI: 10.1146/annurev-ecolsys-110617-062431] [Citation(s) in RCA: 89] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Genome sequence data are now being routinely obtained from many nonmodel organisms. These data contain a wealth of information about the demographic history of the populations from which they originate. Many sophisticated statistical inference procedures have been developed to infer the demographic history of populations from this type of genomic data. In this review, we discuss the different statistical methods available for inference of demography, providing an overview of the underlying theory and logic behind each approach. We also discuss the types of data required and the pros and cons of each method. We then discuss how these methods have been applied to a variety of nonmodel organisms. We conclude by presenting some recommendations for researchers looking to use genomic data to infer demographic history.
Collapse
Affiliation(s)
- Annabel C. Beichman
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095, USA
| | - Emilia Huerta-Sanchez
- Department of Molecular and Cell Biology, University of California, Merced, California 95343, USA
- Current affiliation: Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island 02912, USA
| | - Kirk E. Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095, USA
- Interdepartmental Program in Bioinformatics and Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California 90095, USA
| |
Collapse
|
18
|
Ragsdale AP, Moreau C, Gravel S. Genomic inference using diffusion models and the allele frequency spectrum. Curr Opin Genet Dev 2018; 53:140-147. [PMID: 30366252 DOI: 10.1016/j.gde.2018.10.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Revised: 09/14/2018] [Accepted: 10/07/2018] [Indexed: 01/25/2023]
Abstract
Evolutionary, biological, and demographic processes together shape observed variation in populations. Understanding how these processes influence variation allows us to infer past demography and the nature of selection in populations. Forward in time models such as the diffusion approximation provide a powerful tool for performing inference based on the distribution of allele frequencies. Here, we discuss recent computational developments and their application to reconstructing human demographic history. Using whole-genome sequence data for 797 French Canadian individuals, we assess the neutrality of synonymous variants and show that selection can bias inferred demography, mutation rates, and distributions of fitness effects. We argue that the simple evolutionary models investigated by Kimura and Ohta still provide important insight into modern genetic research.
Collapse
Affiliation(s)
- Aaron P Ragsdale
- Department of Human Genetics, McGill University, Montreal, QC, Canada
| | - Claudia Moreau
- Département des Sciences Fondamentales, Université du Québec à Chicoutimi, Chicoutimi, QC, Canada
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montreal, QC, Canada.
| |
Collapse
|
19
|
The Wright-Fisher site frequency spectrum as a perturbation of the coalescent's. Theor Popul Biol 2018; 124:81-92. [PMID: 30308178 DOI: 10.1016/j.tpb.2018.09.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 09/22/2018] [Accepted: 09/28/2018] [Indexed: 11/24/2022]
Abstract
The first terms of the Wright-Fisher (WF) site frequency spectrum that follow the coalescent approximation are determined precisely, with a view to understanding the accuracy of the coalescent approximation for large samples. The perturbing terms show that the probability of a single mutant in the sample (singleton probability) is elevated in WF but the rest of the frequency spectrum is lowered. A part of the perturbation can be attributed to a mismatch in rates of merger between WF and the coalescent. The rest of it can be attributed to the difference in the way WF and the coalescent partition children between parents. In particular, the number of children of a parent is approximately Poisson under WF and approximately geometric under the coalescent. Whereas the mismatch in rates raises the probability of singletons under WF, its offspring distribution being approximately Poisson lowers it. The two effects are of opposite sense everywhere except at the tail of the frequency spectrum. The WF frequency spectrum begins to depart from that of the coalescent only for sample sizes that are comparable to the population size. These conclusions are confirmed by a separate analysis that assumes the sample size n to be equal to the population size N. Partly thanks to the canceling effects, the total variation distance of WF minus coalescent is 0.12∕logN for a population sized sample with n=N, which is only 1% for N=2×104. The coalescent remains a good approximation for the site frequency spectrum of-large samples.
Collapse
|
20
|
Fraïsse C, Roux C, Gagnaire PA, Romiguier J, Faivre N, Welch JJ, Bierne N. The divergence history of European blue mussel species reconstructed from Approximate Bayesian Computation: the effects of sequencing techniques and sampling strategies. PeerJ 2018; 6:e5198. [PMID: 30083438 PMCID: PMC6071616 DOI: 10.7717/peerj.5198] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Accepted: 06/19/2018] [Indexed: 01/25/2023] Open
Abstract
Genome-scale diversity data are increasingly available in a variety of biological systems, and can be used to reconstruct the past evolutionary history of species divergence. However, extracting the full demographic information from these data is not trivial, and requires inferential methods that account for the diversity of coalescent histories throughout the genome. Here, we evaluate the potential and limitations of one such approach. We reexamine a well-known system of mussel sister species, using the joint site frequency spectrum (jSFS) of synonymous mutations computed either from exome capture or RNA-seq, in an Approximate Bayesian Computation (ABC) framework. We first assess the best sampling strategy (number of: individuals, loci, and bins in the jSFS), and show that model selection is robust to variation in the number of individuals and loci. In contrast, different binning choices when summarizing the jSFS, strongly affect the results: including classes of low and high frequency shared polymorphisms can more effectively reveal recent migration events. We then take advantage of the flexibility of ABC to compare more realistic models of speciation, including variation in migration rates through time (i.e., periodic connectivity) and across genes (i.e., genome-wide heterogeneity in migration rates). We show that these models were consistently selected as the most probable, suggesting that mussels have experienced a complex history of gene flow during divergence and that the species boundary is semi-permeable. Our work provides a comprehensive evaluation of ABC demographic inference in mussels based on the coding jSFS, and supplies guidelines for employing different sequencing techniques and sampling strategies. We emphasize, perhaps surprisingly, that inferences are less limited by the volume of data, than by the way in which they are analyzed.
Collapse
Affiliation(s)
- Christelle Fraïsse
- Institut des Sciences de l’Evolution UMR5554, University Montpellier, CNRS, IRD, EPHE, Montpellier, France
- Department of Genetics, University of Cambridge, Cambridge, UK
- Institute of Science and Technology Austria, Klosterneuburg, Austria
| | - Camille Roux
- Université de Lille, Unité Evo-Eco-Paléo (EEP), UMR 8198, Villeneuve d’Ascq, France
| | - Pierre-Alexandre Gagnaire
- Institut des Sciences de l’Evolution UMR5554, University Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Jonathan Romiguier
- Institut des Sciences de l’Evolution UMR5554, University Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Nicolas Faivre
- Institut des Sciences de l’Evolution UMR5554, University Montpellier, CNRS, IRD, EPHE, Montpellier, France
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - John J. Welch
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - Nicolas Bierne
- Institut des Sciences de l’Evolution UMR5554, University Montpellier, CNRS, IRD, EPHE, Montpellier, France
- Department of Genetics, University of Cambridge, Cambridge, UK
| |
Collapse
|
21
|
Waltoft BL, Hobolth A. Non-parametric estimation of population size changes from the site frequency spectrum. Stat Appl Genet Mol Biol 2018; 17:sagmb-2017-0061. [PMID: 29886455 DOI: 10.1515/sagmb-2017-0061] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Changes in population size is a useful quantity for understanding the evolutionary history of a species. Genetic variation within a species can be summarized by the site frequency spectrum (SFS). For a sample of size n, the SFS is a vector of length n - 1 where entry i is the number of sites where the mutant base appears i times and the ancestral base appears n - i times. We present a new method, CubSFS, for estimating the changes in population size of a panmictic population from an observed SFS. First, we provide a straightforward proof for the expression of the expected site frequency spectrum depending only on the population size. Our derivation is based on an eigenvalue decomposition of the instantaneous coalescent rate matrix. Second, we solve the inverse problem of determining the changes in population size from an observed SFS. Our solution is based on a cubic spline for the population size. The cubic spline is determined by minimizing the weighted average of two terms, namely (i) the goodness of fit to the observed SFS, and (ii) a penalty term based on the smoothness of the changes. The weight is determined by cross-validation. The new method is validated on simulated demographic histories and applied on unfolded and folded SFS from 26 different human populations from the 1000 Genomes Project.
Collapse
Affiliation(s)
- Berit Lindum Waltoft
- Bioinformatics Research Centre, Aarhus University, C.F. Møllers allé 8, 8000 Aarhus C, Denmark, Phone: +45 87165763.,National Centre for Register-based Research, Department of Economics and Business, Aarhus University, Fuglesangs allé 26, 8210 Aarhus V, Denmark.,The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
| | - Asger Hobolth
- Bioinformatics Research Centre, Aarhus University, 8000 Aarhus C, Denmark
| |
Collapse
|
22
|
Tataru P, Simonsen M, Bataillon T, Hobolth A. Statistical Inference in the Wright-Fisher Model Using Allele Frequency Data. Syst Biol 2018; 66:e30-e46. [PMID: 28173553 PMCID: PMC5837693 DOI: 10.1093/sysbio/syw056] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2015] [Revised: 05/31/2016] [Accepted: 06/06/2016] [Indexed: 11/14/2022] Open
Abstract
The Wright–Fisher model provides an elegant mathematical framework for understanding allele frequency data. In particular, the model can be used to infer the demographic history of species and identify loci under selection. A crucial quantity for inference under the Wright–Fisher model is the distribution of allele frequencies (DAF). Despite the apparent simplicity of the model, the calculation of the DAF is challenging. We review and discuss strategies for approximating the DAF, and how these are used in methods that perform inference from allele frequency data. Various evolutionary forces can be incorporated in the Wright–Fisher model, and we consider these in turn. We begin our review with the basic bi-allelic Wright–Fisher model where random genetic drift is the only evolutionary force. We then consider mutation, migration, and selection. In particular, we compare diffusion-based and moment-based methods in terms of accuracy, computational efficiency, and analytical tractability. We conclude with a brief overview of the multi-allelic process with a general mutation model.
Collapse
Affiliation(s)
- Paula Tataru
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Maria Simonsen
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Thomas Bataillon
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Asger Hobolth
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| |
Collapse
|
23
|
Inference in population genetics using forward and backward, discrete and continuous time processes. J Theor Biol 2018; 439:166-180. [DOI: 10.1016/j.jtbi.2017.12.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Revised: 11/23/2017] [Accepted: 12/08/2017] [Indexed: 01/01/2023]
|
24
|
Baharian S, Gravel S. On the decidability of population size histories from finite allele frequency spectra. Theor Popul Biol 2018; 120:42-51. [PMID: 29305873 DOI: 10.1016/j.tpb.2017.12.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Revised: 12/15/2017] [Accepted: 12/20/2017] [Indexed: 10/18/2022]
Abstract
Understanding the historical events that shaped current genomic diversity has applications in historical, biological, and medical research. However, the amount of historical information that can be inferred from genetic data is finite, which leads to an identifiability problem. For example, different historical processes can lead to identical distribution of allele frequencies. This identifiability issue casts a shadow of uncertainty over the results of any study which uses the frequency spectrum to infer past demography. It has been argued that imposing mild 'reasonableness' constraints on demographic histories can enable unique reconstruction, at least in an idealized setting where the length of the genome is nearly infinite. Here, we discuss this problem for finite sample size and genome length. Using the diffusion approximation, we obtain bounds on likelihood differences between similar demographic histories, and use them to construct pairs of very different reasonable histories that produce almost-identical frequency distributions. The finite-genome problem therefore remains poorly determined even among reasonable histories. Where fits to few-parameter models produce narrow parameter confidence intervals, large uncertainties lurk hidden by model assumption.
Collapse
Affiliation(s)
- Soheil Baharian
- Department of Human Genetics, McGill University, Montreal, QC, Canada; McGill University and Genome Quebec Innovation Centre, Montreal, QC, Canada
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montreal, QC, Canada; McGill University and Genome Quebec Innovation Centre, Montreal, QC, Canada.
| |
Collapse
|
25
|
Xue AT, Hickerson MJ. multi-dice: r package for comparative population genomic inference under hierarchical co-demographic models of independent single-population size changes. Mol Ecol Resour 2017; 17:e212-e224. [PMID: 28449263 PMCID: PMC5724483 DOI: 10.1111/1755-0998.12686] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2017] [Revised: 03/14/2017] [Accepted: 04/14/2017] [Indexed: 01/25/2023]
Abstract
Population genetic data from multiple taxa can address comparative phylogeographic questions about community-scale response to environmental shifts, and a useful strategy to this end is to employ hierarchical co-demographic models that directly test multi-taxa hypotheses within a single, unified analysis. This approach has been applied to classical phylogeographic data sets such as mitochondrial barcodes as well as reduced-genome polymorphism data sets that can yield 10,000s of SNPs, produced by emergent technologies such as RAD-seq and GBS. A strategy for the latter had been accomplished by adapting the site frequency spectrum to a novel summarization of population genomic data across multiple taxa called the aggregate site frequency spectrum (aSFS), which potentially can be deployed under various inferential frameworks including approximate Bayesian computation, random forest and composite likelihood optimization. Here, we introduce the r package multi-dice, a wrapper program that exploits existing simulation software for flexible execution of hierarchical model-based inference using the aSFS, which is derived from reduced genome data, as well as mitochondrial data. We validate several novel software features such as applying alternative inferential frameworks, enforcing a minimal threshold of time surrounding co-demographic pulses and specifying flexible hyperprior distributions. In sum, multi-dice provides comparative analysis within the familiar R environment while allowing a high degree of user customization, and will thus serve as a tool for comparative phylogeography and population genomics.
Collapse
Affiliation(s)
- Alexander T. Xue
- Department of Biology: Subprogram in Ecology, Evolutionary Biology, and BehaviorCity College and Graduate Center of City University of New YorkNew YorkNYUSA
| | - Michael J. Hickerson
- Department of Biology: Subprogram in Ecology, Evolutionary Biology, and BehaviorCity College and Graduate Center of City University of New YorkNew YorkNYUSA
- Division of Invertebrate ZoologyAmerican Museum of Natural HistoryNew YorkNYUSA
| |
Collapse
|
26
|
Exact Calculation of the Joint Allele Frequency Spectrum for Isolation with Migration Models. Genetics 2017; 207:241-253. [PMID: 28696217 PMCID: PMC5586375 DOI: 10.1534/genetics.116.194019] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2016] [Accepted: 06/30/2017] [Indexed: 12/26/2022] Open
Abstract
Population genomic datasets collected over the past decade have spurred interest in developing methods that can utilize massive numbers of loci for inference of demographic and selective histories of populations. The allele frequency spectrum (AFS) provides a convenient statistic for such analysis, and, accordingly, much attention has been paid to predicting theoretical expectations of the AFS under a number of different models. However, to date, exact solutions for the joint AFS of two or more populations under models of migration and divergence have not been found. Here, we present a novel Markov chain representation of the coalescent on the state space of the joint AFS that allows for rapid, exact calculation of the joint AFS under isolation with migration (IM) models. In turn, we show how our Markov chain method, in the context of composite likelihood estimation, can be used for accurate inference of parameters of the IM model using SNP data. Lastly, we apply our method to recent whole genome datasets from African Drosophila melanogaster.
Collapse
|
27
|
Inferring the Joint Demographic History of Multiple Populations: Beyond the Diffusion Approximation. Genetics 2017; 206:1549-1567. [PMID: 28495960 DOI: 10.1534/genetics.117.200493] [Citation(s) in RCA: 110] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2017] [Accepted: 04/26/2017] [Indexed: 12/18/2022] Open
Abstract
Understanding variation in allele frequencies across populations is a central goal of population genetics. Classical models for the distribution of allele frequencies, using forward simulation, coalescent theory, or the diffusion approximation, have been applied extensively for demographic inference, medical study design, and evolutionary studies. Here we propose a tractable model of ordinary differential equations for the evolution of allele frequencies that is closely related to the diffusion approximation but avoids many of its limitations and approximations. We show that the approach is typically faster, more numerically stable, and more easily generalizable than the state-of-the-art software implementation of the diffusion approximation. We present a number of applications to human sequence data, including demographic inference with a five-population joint frequency spectrum and a discussion of the robustness of the out-of-Africa model inference to the choice of modern population.
Collapse
|
28
|
Kamm JA, Terhorst J, Song YS. Efficient computation of the joint sample frequency spectra for multiple populations. J Comput Graph Stat 2017; 26:182-194. [PMID: 28239248 DOI: 10.1080/10618600.2016.1159212] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
A wide range of studies in population genetics have employed the sample frequency spectrum (SFS), a summary statistic which describes the distribution of mutant alleles at a polymorphic site in a sample of DNA sequences and provides a highly efficient dimensional reduction of large-scale population genomic variation data. Recently, there has been much interest in analyzing the joint SFS data from multiple populations to infer parameters of complex demographic histories, including variable population sizes, population split times, migration rates, admixture proportions, and so on. SFS-based inference methods require accurate computation of the expected SFS under a given demographic model. Although much methodological progress has been made, existing methods suffer from numerical instability and high computational complexity when multiple populations are involved and the sample size is large. In this paper, we present new analytic formulas and algorithms that enable accurate, efficient computation of the expected joint SFS for thousands of individuals sampled from hundreds of populations related by a complex demographic model with arbitrary population size histories (including piecewise-exponential growth). Our results are implemented in a new software package called momi (MOran Models for Inference). Through an empirical study we demonstrate our improvements to numerical stability and computational complexity.
Collapse
Affiliation(s)
- John A Kamm
- Department of Statistics, University of California, Berkeley
| | | | - Yun S Song
- Departments of EECS, Statistics, and Integrative Biology, University of California, Berkeley
| |
Collapse
|
29
|
Bagley RK, Sousa VC, Niemiller ML, Linnen CR. History, geography and host use shape genomewide patterns of genetic variation in the redheaded pine sawfly (
Neodiprion lecontei
). Mol Ecol 2017; 26:1022-1044. [DOI: 10.1111/mec.13972] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Revised: 11/10/2016] [Accepted: 12/01/2016] [Indexed: 01/03/2023]
Affiliation(s)
- Robin K. Bagley
- Department of Biology University of Kentucky Lexington KY 40506 USA
| | - Vitor C. Sousa
- cE3c ‐ Centre for Ecology, Evolution and Environmental Changes Faculdade de Ciências Universidade de Lisboa 1749‐016 Lisboa Portugal
| | - Matthew L. Niemiller
- Illinois Natural History Survey Prairie Research Institute University of Illinois Urbana‐Champaign Champaign IL 61820 USA
| | | |
Collapse
|
30
|
Gao F, Keinan A. Explosive genetic evidence for explosive human population growth. Curr Opin Genet Dev 2016; 41:130-139. [PMID: 27710906 PMCID: PMC5161661 DOI: 10.1016/j.gde.2016.09.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Revised: 08/26/2016] [Accepted: 09/11/2016] [Indexed: 11/19/2022]
Abstract
The advent of next-generation sequencing technology has allowed the collection of vast amounts of genetic variation data. A recurring discovery from studying larger and larger samples of individuals had been the extreme, previously unexpected, excess of very rare genetic variants, which has been shown to be mostly due to the recent explosive growth of human populations. Here, we review recent literature that inferred recent changes in population size in different human populations and with different methodologies, with many pointing to recent explosive growth, especially in European populations for which more data has been available. We also review the state-of-the-art methods and software for the inference of historical population size changes that lead to these discoveries. Finally, we discuss the implications of recent population growth on personalized genomics, on purifying selection in the non-equilibrium state it entails and, as a consequence, on the genetic architecture underlying complex disease and the performance of mapping methods in discovering rare variants that contribute to complex disease risk.
Collapse
Affiliation(s)
- Feng Gao
- Department of Biological Statistics and Computational Biology, Ithaca, NY 14850, United States
| | - Alon Keinan
- Department of Biological Statistics and Computational Biology, Ithaca, NY 14850, United States.
| |
Collapse
|
31
|
Schrider DR, Shanku AG, Kern AD. Effects of Linked Selective Sweeps on Demographic Inference and Model Selection. Genetics 2016; 204:1207-1223. [PMID: 27605051 PMCID: PMC5105852 DOI: 10.1534/genetics.116.190223] [Citation(s) in RCA: 90] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Accepted: 09/02/2016] [Indexed: 01/06/2023] Open
Abstract
The availability of large-scale population genomic sequence data has resulted in an explosion in efforts to infer the demographic histories of natural populations across a broad range of organisms. As demographic events alter coalescent genealogies, they leave detectable signatures in patterns of genetic variation within and between populations. Accordingly, a variety of approaches have been designed to leverage population genetic data to uncover the footprints of demographic change in the genome. The vast majority of these methods make the simplifying assumption that the measures of genetic variation used as their input are unaffected by natural selection. However, natural selection can dramatically skew patterns of variation not only at selected sites, but at linked, neutral loci as well. Here we assess the impact of recent positive selection on demographic inference by characterizing the performance of three popular methods through extensive simulation of data sets with varying numbers of linked selective sweeps. In particular, we examined three different demographic models relevant to a number of species, finding that positive selection can bias parameter estimates of each of these models-often severely. We find that selection can lead to incorrect inferences of population size changes when none have occurred. Moreover, we show that linked selection can lead to incorrect demographic model selection, when multiple demographic scenarios are compared. We argue that natural populations may experience the amount of recent positive selection required to skew inferences. These results suggest that demographic studies conducted in many species to date may have exaggerated the extent and frequency of population size changes.
Collapse
Affiliation(s)
- Daniel R Schrider
- Department of Genetics, Rutgers University, Piscataway, New Jersey 08854
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, New Jersey 08554
| | - Alexander G Shanku
- Department of Genetics, Rutgers University, Piscataway, New Jersey 08854
- Institute for Quantitative Biomedicine, Rutgers University, Piscataway, New Jersey 08554
| | - Andrew D Kern
- Department of Genetics, Rutgers University, Piscataway, New Jersey 08854
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, New Jersey 08554
| |
Collapse
|
32
|
Boitard S, Rodríguez W, Jay F, Mona S, Austerlitz F. Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach. PLoS Genet 2016; 12:e1005877. [PMID: 26943927 PMCID: PMC4778914 DOI: 10.1371/journal.pgen.1005877] [Citation(s) in RCA: 99] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Accepted: 01/27/2016] [Indexed: 12/02/2022] Open
Abstract
Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey), PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles. Molecular data sampled from extant individuals contains considerable information about their demographic history. In particular, one classical question in population genetics is to reconstruct past population size changes from such data. Relating these changes to various climatic, geological or anthropogenic events allows characterizing the main factors driving genetic diversity and can have major outcomes for conservation. Until recently, mostly very simple histories, including one or two population size changes, could be estimated from genetic data. This has changed with the sequencing of entire genomes in many species, and several methods allow now inferring complex histories consisting of several tens of population size changes. However, analyzing entire genomes, while accounting for recombination, remains a statistical and numerical challenge. These methods, therefore, can only be applied to small samples with a few diploid genomes. We overcome this limitation by using an approximate estimation approach, where observed genomes are summarized using a small number of statistics related to allele frequencies and linkage disequilibrium. In contrast to previous approaches, we show that our method allows us to reconstruct also the most recent part (the last 100 generations) of the population size history. As an illustration, we apply it to large samples of whole-genome sequences in four cattle breeds.
Collapse
Affiliation(s)
- Simon Boitard
- Institut de Systématique, Évolution, Biodiversité ISYEB - UMR 7205 - CNRS & MNHN & UPMC & EPHE, Ecole Pratique des Hautes Etudes, Sorbonne Universités, Paris, France
- GABI, INRA, AgroParisTech, Université Paris-Saclay, Jouy-en-Josas, France
- * E-mail:
| | - Willy Rodríguez
- UMR CNRS 5219, Institut de Mathématiques de Toulouse, Université de Toulouse, Toulouse, France
| | - Flora Jay
- UMR 7206 Eco-anthropologie et Ethnobiologie, Muséum National d’Histoire Naturelle, CNRS, Université Paris Diderot, Paris, France
- LRI, Paris-Sud University, CNRS UMR 8623, Orsay, France
| | - Stefano Mona
- Institut de Systématique, Évolution, Biodiversité ISYEB - UMR 7205 - CNRS & MNHN & UPMC & EPHE, Ecole Pratique des Hautes Etudes, Sorbonne Universités, Paris, France
| | - Frédéric Austerlitz
- UMR 7206 Eco-anthropologie et Ethnobiologie, Muséum National d’Histoire Naturelle, CNRS, Université Paris Diderot, Paris, France
| |
Collapse
|
33
|
Xue AT, Hickerson MJ. The aggregate site frequency spectrum for comparative population genomic inference. Mol Ecol 2015; 24:6223-40. [PMID: 26769405 PMCID: PMC4717917 DOI: 10.1111/mec.13447] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2015] [Revised: 10/26/2015] [Accepted: 10/28/2015] [Indexed: 12/11/2022]
Abstract
Understanding how assemblages of species responded to past climate change is a central goal of comparative phylogeography and comparative population genomics, an endeavour that has increasing potential to integrate with community ecology. New sequencing technology now provides the potential to perform complex demographic inference at unprecedented resolution across assemblages of nonmodel species. To this end, we introduce the aggregate site frequency spectrum (aSFS), an expansion of the site frequency spectrum to use single nucleotide polymorphism (SNP) data sets collected from multiple, co-distributed species for assemblage-level demographic inference. We describe how the aSFS is constructed over an arbitrary number of independent population samples and then demonstrate how the aSFS can differentiate various multispecies demographic histories under a wide range of sampling configurations while allowing effective population sizes and expansion magnitudes to vary independently. We subsequently couple the aSFS with a hierarchical approximate Bayesian computation (hABC) framework to estimate degree of temporal synchronicity in expansion times across taxa, including an empirical demonstration with a data set consisting of five populations of the threespine stickleback (Gasterosteus aculeatus). Corroborating what is generally understood about the recent postglacial origins of these populations, the joint aSFS/hABC analysis strongly suggests that the stickleback data are most consistent with synchronous expansion after the Last Glacial Maximum (posterior probability = 0.99). The aSFS will have general application for multilevel statistical frameworks to test models involving assemblages and/or communities, and as large-scale SNP data from nonmodel species become routine, the aSFS expands the potential for powerful next-generation comparative population genomic inference.
Collapse
Affiliation(s)
- Alexander T. Xue
- Department of Biology: Subprogram in Ecology, Evolutionary Biology, and Behavior, City College and Graduate Center of City University of New York, 160 Convent Avenue, Marshak Science Building, Room 526, New York, NY 10031
| | - Michael J. Hickerson
- Department of Biology: Subprogram in Ecology, Evolutionary Biology, and Behavior, City College and Graduate Center of City University of New York, 160 Convent Avenue, Marshak Science Building, Room 526, New York, NY 10031
| |
Collapse
|
34
|
|
35
|
Chen H. Population genetic studies in the genomic sequencing era. DONG WU XUE YAN JIU = ZOOLOGICAL RESEARCH 2015; 36:223-32. [PMID: 26228473 DOI: 10.13918/j.issn.2095-8137.2015.4.223] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
Recent advances in high-throughput sequencing technologies have revolutionized the field of population genetics. Data now routinely contain genomic level polymorphism information, and the low cost of DNA sequencing enables researchers to investigate tens of thousands of subjects at a time. This provides an unprecedented opportunity to address fundamental evolutionary questions, while posing challenges on traditional population genetic theories and methods. This review provides an overview of the recent methodological developments in the field of population genetics, specifically methods used to infer ancient population history and investigate natural selection using large-sample, large-scale genetic data. Several open questions are also discussed at the end of the review.
Collapse
Affiliation(s)
- Hua Chen
- Center for Computational Genomics, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101,
| |
Collapse
|
36
|
Chen H, Hey J, Chen K. Inferring Very Recent Population Growth Rate from Population-Scale Sequencing Data: Using a Large-Sample Coalescent Estimator. Mol Biol Evol 2015; 32:2996-3011. [PMID: 26187437 DOI: 10.1093/molbev/msv158] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Large-sample or population-level sequencing data provide unprecedented opportunities for inferring detailed population histories, especially recent demographic histories. On the other hand, it challenges most existing population genetic methods: Simulation-based approaches require intensive computation, and analytical approaches are often numerically intractable when the sample size is large. We propose a computationally efficient method for simultaneous estimation of population size, the rate, and onset time of population growth in the very recent history, using the pattern of the total number of segregating sites as a function of sample size. Coalescent simulation shows that it can accurately and efficiently estimate the parameters of recent population growth from large-scale data. This approach has the flexibility to model population history with multiple growth stages or other epochs, and it is robust when the sample size is very large or at the population scale, for which the Kingman's coalescent assumption is not valid. This approach is applied to recently published data and estimates the recent population growth rate in the European population to be 1.49% with the onset time 7.26 ka, and the rate in the African population to be 0.735% with the onset time 10.01 ka.
Collapse
Affiliation(s)
- Hua Chen
- Center for Computational Genomics, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Jody Hey
- Center for Computational Genetics and Genomics, Temple University
| | - Kun Chen
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA
| |
Collapse
|
37
|
Pyron RA. Post-molecular systematics and the future of phylogenetics. Trends Ecol Evol 2015; 30:384-9. [DOI: 10.1016/j.tree.2015.04.016] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Revised: 04/29/2015] [Accepted: 04/30/2015] [Indexed: 12/21/2022]
|
38
|
Transition Densities and Sample Frequency Spectra of Diffusion Processes with Selection and Variable Population Size. Genetics 2015; 200:601-17. [PMID: 25873633 DOI: 10.1534/genetics.115.175265] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Accepted: 04/09/2015] [Indexed: 11/18/2022] Open
Abstract
Advances in empirical population genetics have made apparent the need for models that simultaneously account for selection and demography. To address this need, we here study the Wright-Fisher diffusion under selection and variable effective population size. In the case of genic selection and piecewise-constant effective population sizes, we obtain the transition density by extending a recently developed method for computing an accurate spectral representation for a constant population size. Utilizing this extension, we show how to compute the sample frequency spectrum in the presence of genic selection and an arbitrary number of instantaneous changes in the effective population size. We also develop an alternate, efficient algorithm for computing the sample frequency spectrum using a moment-based approach. We apply these methods to answer the following questions: If neutrality is incorrectly assumed when there is selection, what effects does it have on demographic parameter estimation? Can the impact of negative selection be observed in populations that undergo strong exponential growth?
Collapse
|
39
|
Chen H, Hey J, Slatkin M. A hidden Markov model for investigating recent positive selection through haplotype structure. Theor Popul Biol 2015; 99:18-30. [PMID: 25446961 PMCID: PMC4277924 DOI: 10.1016/j.tpb.2014.11.001] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2013] [Revised: 10/24/2014] [Accepted: 11/04/2014] [Indexed: 12/17/2022]
Abstract
Recent positive selection can increase the frequency of an advantageous mutant rapidly enough that a relatively long ancestral haplotype will be remained intact around it. We present a hidden Markov model (HMM) to identify such haplotype structures. With HMM identified haplotype structures, a population genetic model for the extent of ancestral haplotypes is then adopted for parameter inference of the selection intensity and the allele age. Simulations show that this method can detect selection under a wide range of conditions and has higher power than the existing frequency spectrum-based method. In addition, it provides good estimate of the selection coefficients and allele ages for strong selection. The method analyzes large data sets in a reasonable amount of running time. This method is applied to HapMap III data for a genome scan, and identifies a list of candidate regions putatively under recent positive selection. It is also applied to several genes known to be under recent positive selection, including the LCT, KITLG and TYRP1 genes in Northern Europeans, and OCA2 in East Asians, to estimate their allele ages and selection coefficients.
Collapse
Affiliation(s)
- Hua Chen
- Center for Computational Genomics, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; Center for Computational Genetics and Genomics, Temple University, Philadelphia PA 19122, United States.
| | - Jody Hey
- Center for Computational Genetics and Genomics, Temple University, Philadelphia PA 19122, United States.
| | - Montgomery Slatkin
- Department of Integrative Biology, University of California, Berkeley, CA 94720, United States.
| |
Collapse
|
40
|
Bhaskar A, Wang YXR, Song YS. Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data. Genome Res 2015; 25:268-79. [PMID: 25564017 PMCID: PMC4315300 DOI: 10.1101/gr.178756.114] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
With the recent increase in study sample sizes in human genetics, there has been growing interest in inferring historical population demography from genomic variation data. Here, we present an efficient inference method that can scale up to very large samples, with tens or hundreds of thousands of individuals. Specifically, by utilizing analytic results on the expected frequency spectrum under the coalescent and by leveraging the technique of automatic differentiation, which allows us to compute gradients exactly, we develop a very efficient algorithm to infer piecewise-exponential models of the historical effective population size from the distribution of sample allele frequencies. Our method is orders of magnitude faster than previous demographic inference methods based on the frequency spectrum. In addition to inferring demography, our method can also accurately estimate locus-specific mutation rates. We perform extensive validation of our method on simulated data and show that it can accurately infer multiple recent epochs of rapid exponential growth, a signal that is difficult to pick up with small sample sizes. Lastly, we use our method to analyze data from recent sequencing studies, including a large-sample exome-sequencing data set of tens of thousands of individuals assayed at a few hundred genic regions.
Collapse
Affiliation(s)
- Anand Bhaskar
- Simons Institute for the Theory of Computing, Berkeley, California 94720, USA; Computer Science Division, University of California, Berkeley, California 94720, USA
| | - Y X Rachel Wang
- Department of Statistics, University of California, Berkeley, California 94720, USA
| | - Yun S Song
- Simons Institute for the Theory of Computing, Berkeley, California 94720, USA; Computer Science Division, University of California, Berkeley, California 94720, USA; Department of Statistics, University of California, Berkeley, California 94720, USA; Department of Integrative Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
41
|
Robinson JD, Coffman AJ, Hickerson MJ, Gutenkunst RN. Sampling strategies for frequency spectrum-based population genomic inference. BMC Evol Biol 2014; 14:254. [PMID: 25471595 PMCID: PMC4269862 DOI: 10.1186/s12862-014-0254-4] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2014] [Accepted: 11/24/2014] [Indexed: 01/25/2023] Open
Abstract
Background The allele frequency spectrum (AFS) consists of counts of the number of single nucleotide polymorphism (SNP) loci with derived variants present at each given frequency in a sample. Multiple approaches have recently been developed for parameter estimation and calculation of model likelihoods based on the joint AFS from two or more populations. We conducted a simulation study of one of these approaches, implemented in the Python module δaδi, to compare parameter estimation and model selection accuracy given different sample sizes under one- and two-population models. Results Our simulations included a variety of demographic models and two parameterizations that differed in the timing of events (divergence or size change). Using a number of SNPs reasonably obtained through next-generation sequencing approaches (10,000 - 50,000), accurate parameter estimates and model selection were possible for models with more ancient demographic events, even given relatively small numbers of sampled individuals. However, for recent events, larger numbers of individuals were required to achieve accuracy and precision in parameter estimates similar to that seen for models with older divergence or population size changes. We quantify i) the uncertainty in model selection, using tools from information theory, and ii) the accuracy and precision of parameter estimates, using the root mean squared error, as a function of the timing of demographic events, sample sizes used in the analysis, and complexity of the simulated models. Conclusions Here, we illustrate the utility of the genome-wide AFS for estimating demographic history and provide recommendations to guide sampling in population genomics studies that seek to draw inference from the AFS. Our results indicate that larger samples of individuals (and thus larger AFS) provide greater power for model selection and parameter estimation for more recent demographic events. Electronic supplementary material The online version of this article (doi:10.1186/s12862-014-0254-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- John D Robinson
- Department of Biology, City College of New York, New York, NY, 10031, USA. .,Current Address: South Carolina Department of Natural Resources, Marine Resources Research Institute, Charleston, SC, 29412, USA.
| | - Alec J Coffman
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ, 85721, USA.
| | - Michael J Hickerson
- Department of Biology, City College of New York, New York, NY, 10031, USA. .,Subprogram in Ecology, Evolution and Behavior, the Graduate Center of the City University of New York, New York, NY, 10016, USA. .,Division of Invertebrate Zoology, American Museum of Natural History, New York, NY, 10024, USA.
| | - Ryan N Gutenkunst
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ, 85721, USA.
| |
Collapse
|
42
|
Bhaskar A, Song YS. DESCARTES' RULE OF SIGNS AND THE IDENTIFIABILITY OF POPULATION DEMOGRAPHIC MODELS FROM GENOMIC VARIATION DATA. Ann Stat 2014; 42:2469-2493. [PMID: 28018011 DOI: 10.1214/14-aos1264] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
The sample frequency spectrum (SFS) is a widely-used summary statistic of genomic variation in a sample of homologous DNA sequences. It provides a highly efficient dimensional reduction of large-scale population genomic data and its mathematical dependence on the underlying population demography is well understood, thus enabling the development of efficient inference algorithms. However, it has been recently shown that very different population demographies can actually generate the same SFS for arbitrarily large sample sizes. Although in principle this nonidentifiability issue poses a thorny challenge to statistical inference, the population size functions involved in the counterexamples are arguably not so biologically realistic. Here, we revisit this problem and examine the identifiability of demographic models under the restriction that the population sizes are piecewise-defined where each piece belongs to some family of biologically-motivated functions. Under this assumption, we prove that the expected SFS of a sample uniquely determines the underlying demographic model, provided that the sample is sufficiently large. We obtain a general bound on the sample size sufficient for identifiability; the bound depends on the number of pieces in the demographic model and also on the type of population size function in each piece. In the cases of piecewise-constant, piecewise-exponential and piecewise-generalized-exponential models, which are often assumed in population genomic inferences, we provide explicit formulas for the bounds as simple functions of the number of pieces. Lastly, we obtain analogous results for the "folded" SFS, which is often used when there is ambiguity as to which allelic type is ancestral. Our results are proved using a generalization of Descartes' rule of signs for polynomials to the Laplace transform of piecewise continuous functions.
Collapse
|
43
|
Robinson JD, Bunnefeld L, Hearn J, Stone GN, Hickerson MJ. ABC inference of multi-population divergence with admixture from unphased population genomic data. Mol Ecol 2014; 23:4458-71. [PMID: 25113024 PMCID: PMC4285295 DOI: 10.1111/mec.12881] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2013] [Revised: 08/04/2014] [Accepted: 08/06/2014] [Indexed: 01/13/2023]
Abstract
Rapidly developing sequencing technologies and declining costs have made it possible to collect genome-scale data from population-level samples in nonmodel systems. Inferential tools for historical demography given these data sets are, at present, underdeveloped. In particular, approximate Bayesian computation (ABC) has yet to be widely embraced by researchers generating these data. Here, we demonstrate the promise of ABC for analysis of the large data sets that are now attainable from nonmodel taxa through current genomic sequencing technologies. We develop and test an ABC framework for model selection and parameter estimation, given histories of three-population divergence with admixture. We then explore different sampling regimes to illustrate how sampling more loci, longer loci or more individuals affects the quality of model selection and parameter estimation in this ABC framework. Our results show that inferences improved substantially with increases in the number and/or length of sequenced loci, while less benefit was gained by sampling large numbers of individuals. Optimal sampling strategies given our inferential models included at least 2000 loci, each approximately 2 kb in length, sampled from five diploid individuals per population, although specific strategies are model and question dependent. We tested our ABC approach through simulation-based cross-validations and illustrate its application using previously analysed data from the oak gall wasp, Biorhiza pallida.
Collapse
Affiliation(s)
- John D Robinson
- Department of Biology, City College of New York, 160 Convent Ave., MR 526, New York, NY, 10031, USA
| | | | | | | | | |
Collapse
|
44
|
Distortion of genealogical properties when the sample is very large. Proc Natl Acad Sci U S A 2014; 111:2385-90. [PMID: 24469801 DOI: 10.1073/pnas.1322709111] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Study sample sizes in human genetics are growing rapidly, and in due course it will become routine to analyze samples with hundreds of thousands, if not millions, of individuals. In addition to posing computational challenges, such large sample sizes call for carefully reexamining the theoretical foundation underlying commonly used analytical tools. Here, we study the accuracy of the coalescent, a central model for studying the ancestry of a sample of individuals. The coalescent arises as a limit of a large class of random mating models, and it is an accurate approximation to the original model provided that the population size is sufficiently larger than the sample size. We develop a method for performing exact computation in the discrete-time Wright-Fisher (DTWF) model and compare several key genealogical quantities of interest with the coalescent predictions. For recently inferred demographic scenarios, we find that there are a significant number of multiple- and simultaneous-merger events under the DTWF model, which are absent in the coalescent by construction. Furthermore, for large sample sizes, there are noticeable differences in the expected number of rare variants between the coalescent and the DTWF model. To balance the trade-off between accuracy and computational efficiency, we propose a hybrid algorithm that uses the DTWF model for the recent past and the coalescent for the more distant past. Our results demonstrate that the hybrid method with only a handful of generations of the DTWF model leads to a frequency spectrum that is quite close to the prediction of the full DTWF model.
Collapse
|
45
|
McCoy RC, Garud NR, Kelley JL, Boggs CL, Petrov DA. Genomic inference accurately predicts the timing and severity of a recent bottleneck in a nonmodel insect population. Mol Ecol 2013; 23:136-50. [PMID: 24237665 DOI: 10.1111/mec.12591] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2013] [Accepted: 10/30/2013] [Indexed: 02/04/2023]
Abstract
The analysis of molecular data from natural populations has allowed researchers to answer diverse ecological questions that were previously intractable. In particular, ecologists are often interested in the demographic history of populations, information that is rarely available from historical records. Methods have been developed to infer demographic parameters from genomic data, but it is not well understood how inferred parameters compare to true population history or depend on aspects of experimental design. Here, we present and evaluate a method of SNP discovery using RNA sequencing and demographic inference using the program δaδi, which uses a diffusion approximation to the allele frequency spectrum to fit demographic models. We test these methods in a population of the checkerspot butterfly Euphydryas gillettii. This population was intentionally introduced to Gothic, Colorado in 1977 and has as experienced extreme fluctuations including bottlenecks of fewer than 25 adults, as documented by nearly annual field surveys. Using RNA sequencing of eight individuals from Colorado and eight individuals from a native population in Wyoming, we generate the first genomic resources for this system. While demographic inference is commonly used to examine ancient demography, our study demonstrates that our inexpensive, all-in-one approach to marker discovery and genotyping provides sufficient data to accurately infer the timing of a recent bottleneck. This demographic scenario is relevant for many species of conservation concern, few of which have sequenced genomes. Our results are remarkably insensitive to sample size or number of genomic markers, which has important implications for applying this method to other nonmodel systems.
Collapse
Affiliation(s)
- Rajiv C McCoy
- Department of Biology, Stanford University, 371 Serra Mall, Stanford, CA, 94305, USA; Rocky Mountain Biological Laboratory, Crested Butte, CO, 81224, USA
| | | | | | | | | |
Collapse
|
46
|
General triallelic frequency spectrum under demographic models with variable population size. Genetics 2013; 196:295-311. [PMID: 24214345 DOI: 10.1534/genetics.113.158584] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
It is becoming routine to obtain data sets on DNA sequence variation across several thousands of chromosomes, providing unprecedented opportunity to infer the underlying biological and demographic forces. Such data make it vital to study summary statistics that offer enough compression to be tractable, while preserving a great deal of information. One well-studied summary is the site frequency spectrum-the empirical distribution, across segregating sites, of the sample frequency of the derived allele. However, most previous theoretical work has assumed that each site has experienced at most one mutation event in its genealogical history, which becomes less tenable for very large sample sizes. In this work we obtain, in closed form, the predicted frequency spectrum of a site that has experienced at most two mutation events, under very general assumptions about the distribution of branch lengths in the underlying coalescent tree. Among other applications, we obtain the frequency spectrum of a triallelic site in a model of historically varying population size. We demonstrate the utility of our formulas in two settings: First, we show that triallelic sites are more sensitive to the parameters of a population that has experienced historical growth, suggesting that they will have use if they can be incorporated into demographic inference. Second, we investigate a recently proposed alternative mechanism of mutation in which the two derived alleles of a triallelic site are created simultaneously within a single individual, and we develop a test to determine whether it is responsible for the excess of triallelic sites in the human genome.
Collapse
|
47
|
Excoffier L, Dupanloup I, Huerta-Sánchez E, Sousa VC, Foll M. Robust demographic inference from genomic and SNP data. PLoS Genet 2013; 9:e1003905. [PMID: 24204310 PMCID: PMC3812088 DOI: 10.1371/journal.pgen.1003905] [Citation(s) in RCA: 840] [Impact Index Per Article: 76.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2013] [Accepted: 09/11/2013] [Indexed: 01/09/2023] Open
Abstract
We introduce a flexible and robust simulation-based framework to infer demographic parameters from the site frequency spectrum (SFS) computed on large genomic datasets. We show that our composite-likelihood approach allows one to study evolutionary models of arbitrary complexity, which cannot be tackled by other current likelihood-based methods. For simple scenarios, our approach compares favorably in terms of accuracy and speed with ∂a∂i, the current reference in the field, while showing better convergence properties for complex models. We first apply our methodology to non-coding genomic SNP data from four human populations. To infer their demographic history, we compare neutral evolutionary models of increasing complexity, including unsampled populations. We further show the versatility of our framework by extending it to the inference of demographic parameters from SNP chips with known ascertainment, such as that recently released by Affymetrix to study human origins. Whereas previous ways of handling ascertained SNPs were either restricted to a single population or only allowed the inference of divergence time between a pair of populations, our framework can correctly infer parameters of more complex models including the divergence of several populations, bottlenecks and migration. We apply this approach to the reconstruction of African demography using two distinct ascertained human SNP panels studied under two evolutionary models. The two SNP panels lead to globally very similar estimates and confidence intervals, and suggest an ancient divergence (>110 Ky) between Yoruba and San populations. Our methodology appears well suited to the study of complex scenarios from large genomic data sets.
Collapse
Affiliation(s)
- Laurent Excoffier
- CMPG, Institute of Ecology and Evolution, Berne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Isabelle Dupanloup
- CMPG, Institute of Ecology and Evolution, Berne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Emilia Huerta-Sánchez
- Center for Theoretical Evolutionary Genomics, Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
| | - Vitor C. Sousa
- CMPG, Institute of Ecology and Evolution, Berne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Matthieu Foll
- CMPG, Institute of Ecology and Evolution, Berne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| |
Collapse
|
48
|
Sousa V, Hey J. Understanding the origin of species with genome-scale data: modelling gene flow. Nat Rev Genet 2013; 14:404-14. [PMID: 23657479 DOI: 10.1038/nrg3446] [Citation(s) in RCA: 181] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
As it becomes easier to sequence multiple genomes from closely related species, evolutionary biologists working on speciation are struggling to get the most out of very large population genomic data sets. Such data hold the potential to resolve long-standing questions in evolutionary biology about the role of gene exchange in species formation. In principle, the new population genomic data can be used to disentangle the conflicting roles of natural selection and gene flow during the divergence process. However, there are great challenges in taking full advantage of such data, especially with regard to including recombination in genetic models of the divergence process. Current data, models, methods and the potential pitfalls in using them will be considered here.
Collapse
Affiliation(s)
- Vitor Sousa
- Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, New Jersey 08854, USA
| | | |
Collapse
|
49
|
|