1
|
Jobling MA. Forensic genetics through the lens of Lewontin: population structure, ancestry and race. Philos Trans R Soc Lond B Biol Sci 2022; 377:20200422. [PMID: 35430883 PMCID: PMC9014189 DOI: 10.1098/rstb.2020.0422] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
In his famous 1972 paper, Richard Lewontin used ‘classical’ protein-based markers to show that greater than 85% of human genetic diversity was contained within, rather than between, populations. At that time, these same markers also formed the basis of forensic technology aiming to identify individuals. This review describes the evolution of forensic genetic methods into DNA profiling, and how the field has accounted for the apportionment of genetic diversity in considering the weight of forensic evidence. When investigative databases fail to provide a match to a crime-scene profile, specific markers can be used to seek intelligence about a suspect: these include inferences on population of origin (biogeographic ancestry) and externally visible characteristics, chiefly pigmentation of skin, hair and eyes. In this endeavour, ancestry and phenotypic variation are closely entangled. The markers used show patterns of inter- and intrapopulation diversity that are very atypical compared to the genome as a whole, and reinforce an apparent link between ancestry and racial divergence that is not systematically present otherwise. Despite the legacy of Lewontin's result, therefore, in a major area in which genetics coincides with issues of public interest, methods tend to exaggerate human differences and could thereby contribute to the reification of biological race. This article is part of the theme issue ‘Celebrating 50 years since Lewontin's apportionment of human diversity’.
Collapse
Affiliation(s)
- Mark A. Jobling
- Department of Genetics and Genome Biology, University of Leicester, University Road, Leicester LE1 7RH, UK
| |
Collapse
|
2
|
Waples RS, Waples RK, Ward EJ. Pseudoreplication in genomics-scale datasets. Mol Ecol Resour 2021; 22:503-518. [PMID: 34351073 DOI: 10.1111/1755-0998.13482] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 06/14/2021] [Accepted: 07/23/2021] [Indexed: 11/30/2022]
Abstract
In genomics-scale datasets, loci are closely packed within chromosomes and hence provide correlated information. Averaging across loci as if they were independent creates pseudoreplication, which reduces the effective degrees of freedom (df') compared to the nominal degrees of freedom, df. This issue has been known for some time, but consequences have not been systematically quantified across the entire genome. Here we measured pseudoreplication (quantified by the ratio df'/df) for a common metric of genetic differentiation (FST ) and a common measure of linkage disequilibrium between pairs of loci (r2 ). Based on data simulated using models (SLiM and msprime) that allow efficient forward-in-time and coalescent simulations while precisely controlling population pedigrees, we estimated df' and df'/df by measuring the rate of decline in the variance of mean FST and mean r2 as more loci were used. For both indices, df' increases with Ne and genome size, as expected. However, even for large Ne and large genomes, df' for mean r2 plateaus after a few thousand loci, and a variance components analysis indicates that the limiting factor is uncertainty associated with sampling individuals rather than genes. Pseudoreplication is less extreme for FST , but df'/df ≤0.01 can occur in datasets using tens of thousands of loci. Commonly-used block-jackknife methods consistently overestimated var(FST ), producing very conservative confidence intervals. Predicting df' based on our modeling results as a function of Ne , L, S, and genome size provides a robust way to quantify precision associated with genomics-scale datasets.
Collapse
Affiliation(s)
- Robin S Waples
- NOAA Fisheries, Northwest Fisheries Science Center, 2725 Montlake Blvd. East, Seattle, WA, 98112, USA
| | - Ryan K Waples
- Department of Biology, Section for Computational and RNA Biology, University of Copenhagen, Copenhagen, Denmark.,Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Eric J Ward
- NOAA Fisheries, Northwest Fisheries Science Center, 2725 Montlake Blvd. East, Seattle, WA, 98112, USA
| |
Collapse
|
3
|
Taylor D, Buckleton J. Can a reference 'match' an evidence profile if these have no loci in common? Forensic Sci Int Genet 2021; 53:102520. [PMID: 33930815 DOI: 10.1016/j.fsigen.2021.102520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Revised: 03/15/2021] [Accepted: 04/06/2021] [Indexed: 10/21/2022]
Abstract
Cold case reinvestigations are a common occurrence. Occasionally some of the original work was conducted up to 30 years ago using profiling systems of the early 1990s, which targeted HLA-DQA1, ApoB, D1S80 and D17S5. When contemporary work is carried out, if a suspect is identified they will be profiled in contemporary profiling kits such as GlobalFiler. It would be common to then also attempt to profile the evidence profiles in the same contemporary profiling kit. Imagine a scenario where two evidence samples, E1 and E2, had previously produced single-source profiles, but only E2 had any DNA extract left to re-profile with GlobalFiler. At the old loci E1 matched E2, and at the new loci E2 matched the suspect reference. Of interest to the investigation was whether anything could be said about the suspect being a donor of DNA to E1 even though the reference of the suspect and the profile from E1 had no loci in common, by using the information from the profile of E2. This paper explores that possibility.
Collapse
Affiliation(s)
- Duncan Taylor
- School of Biological Sciences, Flinders University, GPO Box 2100, Adelaide, SA 5001, Australia; Forensic Science SA, GPO Box 2790, Adelaide, SA 5000, Australia.
| | - John Buckleton
- Institute of Environmental Science and Research Limited, Private Bag 92021, Auckland 1142, New Zealand; University of Auckland, Department of Statistics, Auckland, New Zealand
| |
Collapse
|
4
|
Mukhopadhyay A, Chakraborty S. Replicator equations induced by microscopic processes in nonoverlapping population playing bimatrix games. CHAOS (WOODBURY, N.Y.) 2021; 31:023123. [PMID: 33653037 DOI: 10.1063/5.0032311] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Accepted: 01/27/2021] [Indexed: 06/12/2023]
Abstract
This paper is concerned with exploring the microscopic basis for the discrete versions of the standard replicator equation and the adjusted replicator equation. To this end, we introduce frequency-dependent selection-as a result of competition fashioned by game-theoretic consideration-into the Wright-Fisher process, a stochastic birth-death process. The process is further considered to be active in a generation-wise nonoverlapping finite population where individuals play a two-strategy bimatrix population game. Subsequently, connections among the corresponding master equation, the Fokker-Planck equation, and the Langevin equation are exploited to arrive at the deterministic discrete replicator maps in the limit of infinite population size.
Collapse
Affiliation(s)
- Archan Mukhopadhyay
- Department of Physics, Indian Institute of Technology Kanpur, Uttar Pradesh 208016, India
| | - Sagar Chakraborty
- Department of Physics, Indian Institute of Technology Kanpur, Uttar Pradesh 208016, India
| |
Collapse
|
5
|
Kaveh K, McAvoy A, Nowak MA. Environmental fitness heterogeneity in the Moran process. ROYAL SOCIETY OPEN SCIENCE 2019; 6:181661. [PMID: 30800394 PMCID: PMC6366185 DOI: 10.1098/rsos.181661] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Accepted: 11/30/2018] [Indexed: 06/09/2023]
Abstract
Many mathematical models of evolution assume that all individuals experience the same environment. Here, we study the Moran process in heterogeneous environments. The population is of finite size with two competing types, which are exposed to a fixed number of environmental conditions. Reproductive rate is determined by both the type and the environment. We first calculate the condition for selection to favour the mutant relative to the resident wild-type. In large populations, the mutant is favoured if and only if the mutant's spatial average reproductive rate exceeds that of the resident. But environmental heterogeneity elucidates an interesting asymmetry between the mutant and the resident. Specifically, mutant heterogeneity suppresses its fixation probability; if this heterogeneity is strong enough, it can even completely offset the effects of selection (including in large populations). By contrast, resident heterogeneity has no effect on a mutant's fixation probability in large populations and can amplify it in small populations.
Collapse
|
6
|
Tataru P, Simonsen M, Bataillon T, Hobolth A. Statistical Inference in the Wright-Fisher Model Using Allele Frequency Data. Syst Biol 2018; 66:e30-e46. [PMID: 28173553 PMCID: PMC5837693 DOI: 10.1093/sysbio/syw056] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2015] [Revised: 05/31/2016] [Accepted: 06/06/2016] [Indexed: 11/14/2022] Open
Abstract
The Wright–Fisher model provides an elegant mathematical framework for understanding allele frequency data. In particular, the model can be used to infer the demographic history of species and identify loci under selection. A crucial quantity for inference under the Wright–Fisher model is the distribution of allele frequencies (DAF). Despite the apparent simplicity of the model, the calculation of the DAF is challenging. We review and discuss strategies for approximating the DAF, and how these are used in methods that perform inference from allele frequency data. Various evolutionary forces can be incorporated in the Wright–Fisher model, and we consider these in turn. We begin our review with the basic bi-allelic Wright–Fisher model where random genetic drift is the only evolutionary force. We then consider mutation, migration, and selection. In particular, we compare diffusion-based and moment-based methods in terms of accuracy, computational efficiency, and analytical tractability. We conclude with a brief overview of the multi-allelic process with a general mutation model.
Collapse
Affiliation(s)
- Paula Tataru
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Maria Simonsen
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Thomas Bataillon
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Asger Hobolth
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| |
Collapse
|
7
|
Ohtsuki H, Innan H. Forward and backward evolutionary processes and allele frequency spectrum in a cancer cell population. Theor Popul Biol 2017; 117:43-50. [PMID: 28866007 DOI: 10.1016/j.tpb.2017.08.006] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2017] [Revised: 08/08/2017] [Accepted: 08/23/2017] [Indexed: 01/04/2023]
Abstract
A cancer grows from a single cell, thereby constituting a large cell population. In this work, we are interested in how mutations accumulate in a cancer cell population. We provide a theoretical framework of the stochastic process in a cancer cell population and obtain near exact expressions of allele frequency spectrum or AFS (only continuous approximation is involved) from both forward and backward treatments under a simple setting; all cells undergo cell divisions and die at constant rates, b and d, respectively, such that the entire population grows exponentially. This setting means that once a parental cancer cell is established, in the following growth phase, all mutations are assumed to have no effect on b or d (i.e., neutral or passengers). Our theoretical results show that the difference from organismal population genetics is mainly in the coalescent time scale, and the mutation rate is defined per cell division, not per time unit (e.g., generation). Except for these two factors, the basic logic is very similar between organismal and cancer population genetics, indicating that a number of well established theories of organismal population genetics could be translated to cancer population genetics with simple modifications.
Collapse
Affiliation(s)
- Hisashi Ohtsuki
- SOKENDAI, The Graduate University for Advanced Studies, Hayama, Kanagawa 240-0193, Japan
| | - Hideki Innan
- SOKENDAI, The Graduate University for Advanced Studies, Hayama, Kanagawa 240-0193, Japan.
| |
Collapse
|
8
|
A non-zero variance of Tajima's estimator for two sequences even for infinitely many unlinked loci. Theor Popul Biol 2017; 122:22-29. [PMID: 28341209 DOI: 10.1016/j.tpb.2017.03.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2016] [Revised: 02/12/2017] [Accepted: 03/03/2017] [Indexed: 10/19/2022]
Abstract
The population-scaled mutation rate, θ, is informative on the effective population size and is thus widely used in population genetics. We show that for two sequences and n unlinked loci, the variance of Tajima's estimator (θˆ), which is the average number of pairwise differences, does not vanish even as n→∞. The non-zero variance of θˆ results from a (weak) correlation between coalescence times even at unlinked loci, which, in turn, is due to the underlying fixed pedigree shared by gene genealogies at all loci. We derive the correlation coefficient under a diploid, discrete-time, Wright-Fisher model, and we also derive a simple, closed-form lower bound. We also obtain empirical estimates of the correlation of coalescence times under demographic models inspired by large-scale human genealogies. While the effect we describe is small (Varθˆ∕θ2≈ONe-1), it is important to recognize this feature of statistical population genetics, which runs counter to commonly held notions about unlinked loci.
Collapse
|
9
|
Distortion of genealogical properties when the sample is very large. Proc Natl Acad Sci U S A 2014; 111:2385-90. [PMID: 24469801 DOI: 10.1073/pnas.1322709111] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Study sample sizes in human genetics are growing rapidly, and in due course it will become routine to analyze samples with hundreds of thousands, if not millions, of individuals. In addition to posing computational challenges, such large sample sizes call for carefully reexamining the theoretical foundation underlying commonly used analytical tools. Here, we study the accuracy of the coalescent, a central model for studying the ancestry of a sample of individuals. The coalescent arises as a limit of a large class of random mating models, and it is an accurate approximation to the original model provided that the population size is sufficiently larger than the sample size. We develop a method for performing exact computation in the discrete-time Wright-Fisher (DTWF) model and compare several key genealogical quantities of interest with the coalescent predictions. For recently inferred demographic scenarios, we find that there are a significant number of multiple- and simultaneous-merger events under the DTWF model, which are absent in the coalescent by construction. Furthermore, for large sample sizes, there are noticeable differences in the expected number of rare variants between the coalescent and the DTWF model. To balance the trade-off between accuracy and computational efficiency, we propose a hybrid algorithm that uses the DTWF model for the recent past and the coalescent for the more distant past. Our results demonstrate that the hybrid method with only a handful of generations of the DTWF model leads to a frequency spectrum that is quite close to the prediction of the full DTWF model.
Collapse
|
10
|
Malaspinas AS, Slatkin M, Song YS. Match probabilities in a finite, subdivided population. Theor Popul Biol 2011; 79:55-63. [PMID: 21266180 DOI: 10.1016/j.tpb.2011.01.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2009] [Revised: 01/12/2011] [Accepted: 01/18/2011] [Indexed: 10/18/2022]
Abstract
We generalize a recently introduced graphical framework to compute the probability that haplotypes or genotypes of two individuals drawn from a finite, subdivided population match. As in the previous work, we assume an infinite-alleles model. We focus on the case of a population divided into two subpopulations, but the underlying framework can be applied to a general model of population subdivision. We examine the effect of population subdivision on the match probabilities and the accuracy of the product rule which approximates multi-locus match probabilities as a product of one-locus match probabilities. We quantify the deviation from predictions of the product rule by R, the ratio of the multi-locus match probability to the product of the one-locus match probabilities. We carry out the computation for two loci and find that ignoring subdivision can lead to underestimation of the match probabilities if the population under consideration actually has subdivision structure and the individuals originate from the same subpopulation. On the other hand, under a given model of population subdivision, we find that the ratio R for two loci is only slightly greater than 1 for a large range of symmetric and asymmetric migration rates. Keeping in mind that the infinite-alleles model is not the appropriate mutation model for STR loci, we conclude that, for two loci and biologically reasonable parameter values, population subdivision may lead to results that disfavor innocent suspects because of an increase in identity-by-descent in finite populations. On the other hand, for the same range of parameters, population subdivision does not lead to a substantial increase in linkage disequilibrium between loci. Those results are consistent with established practice.
Collapse
Affiliation(s)
- Anna-Sapfo Malaspinas
- Department of Integrative Biology, University of California, Berkeley, CA 94720, USA
| | | | | |
Collapse
|