1
|
Liu X, Ahsan Z, Martheswaran TK, Rosenberg NA. When is the allele-sharing dissimilarity between two populations exceeded by the allele-sharing dissimilarity of a population with itself? Stat Appl Genet Mol Biol 2023; 22:sagmb-2023-0004. [PMID: 38073574 PMCID: PMC10711674 DOI: 10.1515/sagmb-2023-0004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 11/10/2023] [Indexed: 12/18/2023]
Abstract
Allele-sharing statistics for a genetic locus measure the dissimilarity between two populations as a mean of the dissimilarity between random pairs of individuals, one from each population. Owing to within-population variation in genotype, allele-sharing dissimilarities can have the property that they have a nonzero value when computed between a population and itself. We consider the mathematical properties of allele-sharing dissimilarities in a pair of populations, treating the allele frequencies in the two populations parametrically. Examining two formulations of allele-sharing dissimilarity, we obtain the distributions of within-population and between-population dissimilarities for pairs of individuals. We then mathematically explore the scenarios in which, for certain allele-frequency distributions, the within-population dissimilarity - the mean dissimilarity between randomly chosen members of a population - can exceed the dissimilarity between two populations. Such scenarios assist in explaining observations in population-genetic data that members of a population can be empirically more genetically dissimilar from each other on average than they are from members of another population. For a population pair, however, the mathematical analysis finds that at least one of the two populations always possesses smaller within-population dissimilarity than the value of the between-population dissimilarity. We illustrate the mathematical results with an application to human population-genetic data.
Collapse
Affiliation(s)
- Xiran Liu
- Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA94305, USA
| | - Zarif Ahsan
- Department of Biology, Stanford University, Stanford, CA94305, USA
| | | | | |
Collapse
|
2
|
Abstract
It is often claimed that race is a social construct and that scientists studying race differences are disruptive racists. The recent April 2018 “Race Issue” of the widely distributed National Geographic Magazine (NG) provided its millions of readers with a particularly illustrative example of this position. As discussions of race issues often recur, in both scientific and lay literature, stir considerable polemics, and have political, societal and human implications, we found it of both scientific and general interest to identify and dissect the following partly overlapping key contentions of the NG race issue magazine: (1) Samuel Morton’s studies of brain size is reprehensible racism (2) Race does not relate to geographic location, (3) Races do not exist as we are all equals and Africans, (4) Admixture and displacement erase race differences as soon as they appear, and (5) Race is only skin color deep. Also examined is the claim that Race does not matter. When analyzed within syllogistic formalism, each of the claims is found theoretically and empirically unsustainable, as Morton’s continuously evolving race position is misrepresented, race relates significantly to geography, we are far from equals, races have definitely not been erased, and race, whether self-reported or defined by ancestry, lineage, ecotype, species, or genes, is much more than skin color deep. Race matters vitally for people and societies. We conclude that important research on existing population differences is hurt when widely respected institutions such as NG mobilize their full authority in a massively circulated attempt to betray its scientific and public readership by systematically misrepresenting historical sources and scientific positions, shaming past scientists, and by selectively suppressing unwanted or unacceptable results–acts included as examples of academic fraud by the National Academy of Sciences (US, 1986). Any unqualified a priori denial of the formative evolutionary aspects of individual and population differences threatens to impede the recent promising research on effects of genome wide allelic associations, which would lames us in the vital quest to develop rational solutions to associated globally pressing societal problems.
Collapse
|
3
|
Tal O, Tran TD. New perspectives on multilocus ancestry informativeness. Math Biosci 2018; 306:60-81. [PMID: 30385120 DOI: 10.1016/j.mbs.2018.10.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Revised: 10/24/2018] [Accepted: 10/25/2018] [Indexed: 10/28/2022]
Abstract
We present an axiomatic approach for multilocus informativeness measures for determining the amount of information that a set of polymorphic genetic markers provides about individual ancestry. We then reveal several surprising properties of a decision-theoretic based measure that is consistent with the set of proposed criteria for multilocus informativeness. In particular, these properties highlight the interplay between information originating from population priors and the information extractable from the population genetic variants. This analysis then reveals a certain deficiency of mutual information based multilocus informativeness measures when such population priors are incorporated. Finally, we analyse and quantify the inevitable inherent decrease in informativeness due to learning from finite population samples.
Collapse
Affiliation(s)
- Omri Tal
- Max-Planck-Institute for Mathematics in the Sciences, Inselstrasse 22, Leipzig D-04103 Germany.
| | - Tat Dat Tran
- Max-Planck-Institute for Mathematics in the Sciences, Inselstrasse 22, Leipzig D-04103 Germany.
| |
Collapse
|
4
|
Martin MD, Jay F, Castellano S, Slatkin M. Determination of genetic relatedness from low-coverage human genome sequences using pedigree simulations. Mol Ecol 2017; 26:4145-4157. [PMID: 28543951 DOI: 10.1111/mec.14188] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2015] [Accepted: 05/05/2017] [Indexed: 02/01/2023]
Abstract
We develop and evaluate methods for inferring relatedness among individuals from low-coverage DNA sequences of their genomes, with particular emphasis on sequences obtained from fossil remains. We suggest the major factors complicating the determination of relatedness among ancient individuals are sequencing depth, the number of overlapping sites, the sequencing error rate and the presence of contamination from present-day genetic sources. We develop a theoretical model that facilitates the exploration of these factors and their relative effects, via measurement of pairwise genetic distances, without calling genotypes, and determine the power to infer relatedness under various scenarios of varying sequencing depth, present-day contamination and sequencing error. The model is validated by a simulation study as well as the analysis of aligned sequences from present-day human genomes. We then apply the method to the recently published genome sequences of ancient Europeans, developing a statistical treatment to determine confidence in assigned relatedness that is, in some cases, more precise than previously reported. As the majority of ancient specimens are from animals, this method would be applicable to investigate kinship in nonhuman remains. The developed software grups (Genetic Relatedness Using Pedigree Simulations) is implemented in Python and freely available.
Collapse
Affiliation(s)
- Michael D Martin
- Department of Natural History, NTNU University Museum, Norwegian University of Science and Technology (NTNU), Trondheim, Norway.,Center for Theoretical Evolutionary Genomics, Department of Integrative Biology, University of California Berkeley, Berkeley, CA, USA
| | - Flora Jay
- Center for Theoretical Evolutionary Genomics, Department of Integrative Biology, University of California Berkeley, Berkeley, CA, USA.,Laboratoire de Recherche en Informatique, CNRS UMR 8623, Université Paris-Sud, Paris-Saclay, France
| | - Sergi Castellano
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Montgomery Slatkin
- Center for Theoretical Evolutionary Genomics, Department of Integrative Biology, University of California Berkeley, Berkeley, CA, USA
| |
Collapse
|
5
|
Tal O, Tran TD, Portegies J. From typical sequences to typical genotypes. J Theor Biol 2017; 419:159-183. [PMID: 28202283 DOI: 10.1016/j.jtbi.2017.02.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2016] [Accepted: 02/07/2017] [Indexed: 01/05/2023]
Abstract
We demonstrate an application of a core notion of information theory, typical sequences and their related properties, to analysis of population genetic data. Based on the asymptotic equipartition property (AEP) for nonstationary discrete-time sources producing independent symbols, we introduce the concepts of typical genotypes and population entropy and cross entropy rate. We analyze three perspectives on typical genotypes: a set perspective on the interplay of typical sets of genotypes from two populations, a geometric perspective on their structure in high dimensional space, and a statistical learning perspective on the prospects of constructing typical-set based classifiers. In particular, we show that such classifiers have a surprising resilience to noise originating from small population samples, and highlight the potential for further links between inference and communication.
Collapse
Affiliation(s)
- Omri Tal
- Max-Planck-Institute for Mathematics in the Sciences, Inselstrasse 22, D-04103 Leipzig, Germany.
| | - Tat Dat Tran
- Max-Planck-Institute for Mathematics in the Sciences, Inselstrasse 22, D-04103 Leipzig, Germany.
| | - Jacobus Portegies
- Max-Planck-Institute for Mathematics in the Sciences, Inselstrasse 22, D-04103 Leipzig, Germany.
| |
Collapse
|
6
|
Granot Y, Tal O, Rosset S, Skorecki K. On the Apportionment of Population Structure. PLoS One 2016; 11:e0160413. [PMID: 27505172 PMCID: PMC4978449 DOI: 10.1371/journal.pone.0160413] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 07/19/2016] [Indexed: 11/30/2022] Open
Abstract
Measures of population differentiation, such as FST, are traditionally derived from the partition of diversity within and between populations. However, the emergence of population clusters from multilocus analysis is a function of genetic structure (departures from panmixia) rather than of diversity. If the populations are close to panmixia, slight differences between the mean pairwise distance within and between populations (low FST) can manifest as strong separation between the populations, thus population clusters are often evident even when the vast majority of diversity is partitioned within populations rather than between them. For any given FST value, clusters can be tighter (more panmictic) or looser (more stratified), and in this respect higher FST does not always imply stronger differentiation. In this study we propose a measure for the partition of structure, denoted EST, which is more consistent with results from clustering schemes. Crucially, our measure is based on a statistic of the data that is a good measure of internal structure, mimicking the information extracted by unsupervised clustering or dimensionality reduction schemes. To assess the utility of our metric, we ranked various human (HGDP) population pairs based on FST and EST and found substantial differences in ranking order. EST ranking seems more consistent with population clustering and classification and possibly with geographic distance between populations. Thus, EST may at times outperform FST in identifying evolutionary significant differentiation.
Collapse
Affiliation(s)
- Yaron Granot
- Rappaport Faculty of Medicine and Research Institute, Technion–Israel Institute of Technology, and Rambam Medical Center, Haifa, Israel
- * E-mail:
| | - Omri Tal
- Max Planck Institute for Mathematics in the Sciences, Inselstr. 22-26, 04103, Leipzig, Germany
| | - Saharon Rosset
- School of Mathematical Sciences Tel Aviv University, Tel Aviv, Israel
| | - Karl Skorecki
- Rappaport Faculty of Medicine and Research Institute, Technion–Israel Institute of Technology, and Rambam Medical Center, Haifa, Israel
| |
Collapse
|
7
|
Sesardic N. Confusions about race: a new installment. STUDIES IN HISTORY AND PHILOSOPHY OF BIOLOGICAL AND BIOMEDICAL SCIENCES 2013; 44:287-293. [PMID: 23583351 DOI: 10.1016/j.shpsc.2013.03.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2013] [Accepted: 03/07/2013] [Indexed: 06/02/2023]
Affiliation(s)
- Neven Sesardic
- Department of Philosophy, Lingnan University, Hong Kong.
| |
Collapse
|
8
|
Population structure in a comprehensive genomic data set on human microsatellite variation. G3-GENES GENOMES GENETICS 2013; 3:891-907. [PMID: 23550135 PMCID: PMC3656735 DOI: 10.1534/g3.113.005728] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Over the past two decades, microsatellite genotypes have provided the data for landmark studies of human population-genetic variation. However, the various microsatellite data sets have been prepared with different procedures and sets of markers, so that it has been difficult to synthesize available data for a comprehensive analysis. Here, we combine eight human population-genetic data sets at the 645 microsatellite loci they share in common, accounting for procedural differences in the production of the different data sets, to assemble a single data set containing 5795 individuals from 267 worldwide populations. We perform a systematic analysis of genetic relatedness, detecting 240 intra-population and 92 inter-population pairs of previously unidentified close relatives and proposing standardized subsets of unrelated individuals for use in future studies. We then augment the human data with a data set of 84 chimpanzees at the 246 loci they share in common with the human samples. Multidimensional scaling and neighbor-joining analyses of these data sets offer new insights into the structure of human populations and enable a comparison of genetic variation patterns in chimpanzees with those in humans. Our combined data sets are the largest of their kind reported to date and provide a resource for use in human population-genetic studies.
Collapse
|