1
|
Resutik P, Aeschbacher S, Krützen M, Kratzer A, Haas C, Phillips C, Arora N. Comparative evaluation of the MAPlex, Precision ID Ancestry Panel, and VISAGE Basic Tool for biogeographical ancestry inference. Forensic Sci Int Genet 2023; 64:102850. [PMID: 36924679 DOI: 10.1016/j.fsigen.2023.102850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Revised: 02/21/2023] [Accepted: 02/23/2023] [Indexed: 02/27/2023]
Abstract
Biogeographical ancestry (BGA) inference from ancestry-informative markers (AIMs) has strong potential to support forensic investigations. Over the past two decades, several forensic panels composed of AIMs have been developed to predict ancestry at a continental scale. These panels typically comprise fewer than 200 AIMs and have been designed and tested with a limited set of populations. How well these panels recover patterns of genetic diversity relative to larger sets of markers, and how accurately they infer ancestry of individuals and populations not included in their design remains poorly understood. The lack of comparative studies addressing these aspects makes the selection of appropriate panels for forensic laboratories difficult. In this study, the model-based genetic clustering tool STRUCTURE was used to compare three popular forensic BGA panels: MAPlex, Precision ID Ancestry Panel (PIDAP), and VISAGE Basic Tool (VISAGE BT) relative to a genome-wide reference set of 10k SNPs. The genotypes for all these markers were obtained for a comprehensive set of 3957 individuals from 228 worldwide human populations. Our results indicate that at the broad continental scale (K=6) typically examined in forensic studies, all forensic panels produced similar genetic structure patterns compared to the reference set (G'≈90%) and had high classification performance across all regions (average AUC-PR > 97%). However, at K= 7 and K= 8, the forensic panels displayed some region-specific clustering deviations from the reference set, particularly in Europe and the region of East and South-East Asia, which may be attributed to differences in the design of the respective panels. Overall, the panel with the most consistent performance in all regions was VISAGE BT with an average weighted AUC̅W score of 96.26% across the three scales of geographical resolution investigated.
Collapse
Affiliation(s)
- Peter Resutik
- Zurich Institute of Forensic Medicine, University of Zurich, Switzerland.
| | - Simon Aeschbacher
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Switzerland
| | - Michael Krützen
- Department of Evolutionary Anthropology, University of Zurich, Switzerland
| | - Adelgunde Kratzer
- Zurich Institute of Forensic Medicine, University of Zurich, Switzerland
| | - Cordula Haas
- Zurich Institute of Forensic Medicine, University of Zurich, Switzerland
| | - Christopher Phillips
- Forensic Genetics Unit, Institute of Forensic Sciences, University of Santiago de Compostela, Spain
| | - Natasha Arora
- Zurich Institute of Forensic Medicine, University of Zurich, Switzerland.
| |
Collapse
|
2
|
Modeling SNP array ascertainment with Approximate Bayesian Computation for demographic inference. Sci Rep 2018; 8:10209. [PMID: 29977040 PMCID: PMC6033855 DOI: 10.1038/s41598-018-28539-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2018] [Accepted: 06/20/2018] [Indexed: 11/08/2022] Open
Abstract
Single nucleotide polymorphisms (SNPs) in commercial arrays have often been discovered in a small number of samples from selected populations. This ascertainment skews patterns of nucleotide diversity and affects population genetic inferences. We propose a demographic inference pipeline that explicitly models the SNP discovery protocol in an Approximate Bayesian Computation (ABC) framework. We simulated genomic regions according to a demographic model incorporating parameters for the divergence of three well-characterized HapMap populations and recreated the SNP distribution of a commercial array by varying the number of haploid samples and the allele frequency cut-off in the given regions. We then calculated summary statistics obtained from both the ascertained and genomic data and inferred ascertainment and demographic parameters. We implemented our pipeline to study the admixture process that gave rise to the present-day Mexican population. Our estimate of the time of admixture is closer to the historical dates than those in previous works which did not consider ascertainment bias. Although the use of whole genome sequences for demographic inference is becoming the norm, there are still underrepresented areas of the world from where only SNP array data are available. Our inference framework is applicable to those cases and will help with the demographic inference.
Collapse
|
3
|
Hudjashov G, Karafet TM, Lawson DJ, Downey S, Savina O, Sudoyo H, Lansing JS, Hammer MF, Cox MP. Complex Patterns of Admixture across the Indonesian Archipelago. Mol Biol Evol 2017; 34:2439-2452. [PMID: 28957506 PMCID: PMC5850824 DOI: 10.1093/molbev/msx196] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
Indonesia, an island nation as large as continental Europe, hosts a sizeable proportion of global human diversity, yet remains surprisingly undercharacterized genetically. Here, we substantially expand on existing studies by reporting genome-scale data for nearly 500 individuals from 25 populations in Island Southeast Asia, New Guinea, and Oceania, notably including previously unsampled islands across the Indonesian archipelago. We use high-resolution analyses of haplotype diversity to reveal fine detail of regional admixture patterns, with a particular focus on the Holocene. We find that recent population history within Indonesia is complex, and that populations from the Philippines made important genetic contributions in the early phases of the Austronesian expansion. Different, but interrelated processes, acted in the east and west. The Austronesian migration took several centuries to spread across the eastern part of the archipelago, where genetic admixture postdates the archeological signal. As with the Neolithic expansion further east in Oceania and in Europe, genetic mixing with local inhabitants in eastern Indonesia lagged behind the arrival of farming populations. In contrast, western Indonesia has a more complicated admixture history shaped by interactions with mainland Asian and Austronesian newcomers, which for some populations occurred more than once. Another layer of complexity in the west was introduced by genetic contact with South Asia and strong demographic events in isolated local groups.
Collapse
Affiliation(s)
- Georgi Hudjashov
- Statistics and Bioinformatics Group, Institute of Fundamental Sciences, Massey University, Palmerston North, New Zealand.,Estonian Biocentre, 51010 Tartu, Estonia
| | | | - Daniel J Lawson
- School of Social and Community Medicine, University of Bristol, Bristol, United Kingdom
| | - Sean Downey
- Department of Anthropology, University of Maryland, College Park, MD
| | - Olga Savina
- ARL Division of Biotechnology, University of Arizona, Tucson, AZ
| | - Herawati Sudoyo
- Genome Diversity and Diseases Laboratory, Eijkman Institute for Molecular Biology, Jakarta, Indonesia.,Department of Medical Biology, Faculty of Medicine, University of Indonesia, Jakarta, Indonesia.,Sydney Medical School, University of Sydney, Sydney, NSW, Australia
| | | | | | - Murray P Cox
- Statistics and Bioinformatics Group, Institute of Fundamental Sciences, Massey University, Palmerston North, New Zealand
| |
Collapse
|
4
|
Calderón R, Hernández CL, García-Varela G, Masciarelli D, Cuesta P. Inbreeding in Southeastern Spain : The Impact of Geography and Demography on Marital Mobility and Marital Distance Patterns (1900-1969). HUMAN NATURE (HAWTHORNE, N.Y.) 2017; 29:45-64. [PMID: 29159722 DOI: 10.1007/s12110-017-9305-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
In this paper, the structure of a southeastern Spanish population was studied for the first time with respect to its inbreeding patterns and its relationship with demographic and geographic factors. Data on consanguineous marriages (up to second cousins) from 1900 to 1969 were taken from ecclesiastic dispensations. Our results confirm that the patterns and trends of inbreeding in the study area are consistent with those previously observed in most non-Cantabrian Spanish populations. The rate of consanguineous marriages was apparently stable between 1900 and 1935 and then sharply decreased since 1940, which coincides with industrialization in Spain. A marked departure from Hardy-Weinberg expectations (0.25) in the ratio of first cousin (M22) to second cousin (M33) marriages in the study population (0.88) was observed. The high levels of endogamy (>80%) and its significant steadiness throughout the twentieth century is noteworthy. Accordingly, our results show that exogamous marriages were not only poorly represented but also that this reduced mobility (<6 km) suggests that the choice of a mate was preferentially local. We found higher mobility in M22 with respect to M33 cousin mating. The relationships between population size and consanguinity rates and inbreeding fit power-law distributions. A significant positive correlation was observed between inbreeding and elevation. Many Spanish populations have experienced a prolonged and considerable isolation across generations, which has led to high proportions of historical and local endogamy that is associated, in general, with high [Formula: see text] values. Thus, assessing genomic inbreeding using runs of homozygosity (ROH) in current Spanish populations could be an additional pertinent strategy for obtaining a more refined perspective regarding the population history inferred from the extent and frequency of ROH regions.
Collapse
Affiliation(s)
- R Calderón
- Departamento de Zoología y Antropología Física, Facultad de Biología, Universidad Complutense, Madrid, Spain.
| | - C L Hernández
- Departamento de Zoología y Antropología Física, Facultad de Biología, Universidad Complutense, Madrid, Spain
| | - G García-Varela
- Departamento de Zoología y Antropología Física, Facultad de Biología, Universidad Complutense, Madrid, Spain
| | - D Masciarelli
- Departamento de Zoología y Antropología Física, Facultad de Biología, Universidad Complutense, Madrid, Spain
| | - P Cuesta
- Centro de Proceso de Datos, Universidad Complutense, Madrid, Spain
| |
Collapse
|
5
|
Corny J, Galland M, Arzarello M, Bacon AM, Demeter F, Grimaud-Hervé D, Higham C, Matsumura H, Nguyen LC, Nguyen TKT, Nguyen V, Oxenham M, Sayavongkhamdy T, Sémah F, Shackelford LL, Détroit F. Dental phenotypic shape variation supports a multiple dispersal model for anatomically modern humans in Southeast Asia. J Hum Evol 2017; 112:41-56. [PMID: 29037415 DOI: 10.1016/j.jhevol.2017.08.017] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2016] [Revised: 08/23/2017] [Accepted: 08/24/2017] [Indexed: 01/05/2023]
Abstract
The population history of anatomically modern humans (AMH) in Southeast Asia (SEA) is a highly debated topic. The impact of sea level variations related to the Last Glacial Maximum (LGM) and the Neolithic diffusion on past population dispersals are two key issues. We have investigated competing AMH dispersal hypotheses in SEA through the analysis of dental phenotype shape variation on the basis of very large archaeological samples employing two complementary approaches. We first explored the structure of between- and within-group shape variation of permanent human molar crowns. Second, we undertook a direct test of competing hypotheses through a modeling approach. Our results identify a significant LGM-mediated AMH expansion and a strong biological impact of the spread of Neolithic farmers into SEA during the Holocene. The present work thus favors a "multiple AMH dispersal" hypothesis for the population history of SEA, reconciling phenotypic and recent genomic data.
Collapse
Affiliation(s)
- Julien Corny
- Aix Marseille Université, CNRS, EFS, ADES UMR 7268, 13916, Marseille, France.
| | - Manon Galland
- University College Dublin, School of Archaeology, Belfield, Dublin 4, Ireland; Muséum national d'Histoire naturelle, Musée de l'Homme, Département Homme et environnement, CNRS, UMR 7206, 75116, Paris, France
| | - Marta Arzarello
- Università degli Studi di Ferrara, Dipartimento Studi Umanistici, 44121, Ferrara, Italy
| | - Anne-Marie Bacon
- Université Paris-Descartes, Faculté de chirurgie dentaire, UMR 5288 CNRS, AMIS, 92120, Montrouge, France
| | - Fabrice Demeter
- Muséum national d'Histoire naturelle, Musée de l'Homme, Département Homme et environnement, CNRS, UMR 7206, 75116, Paris, France; Center for GeoGenetics, Copenhagen, Denmark
| | - Dominique Grimaud-Hervé
- Muséum national d'Histoire naturelle, Musée de l'Homme, Département Homme et environnement, CNRS, UMR 7194, 75116, Paris, France
| | - Charles Higham
- University of Otago, Department of Anthropology and Archaeology, Dunedin 9054, New Zealand
| | - Hirofumi Matsumura
- Sapporo Medical University, School of Health Science, Sapporo 060-8556, Japan
| | | | | | - Viet Nguyen
- Center for Southeast Asian Prehistory, 96/203 Hoang Quoc Viet, Hanoi, Viet Nam
| | - Marc Oxenham
- Australian National University, School of Archaeology and Anthropology, Canberra ACT 0200, Australia
| | - Thongsa Sayavongkhamdy
- Department of National Heritage, Ministry of Information and Culture, Vientiane, Lao People's Democratic Republic
| | - François Sémah
- Muséum national d'Histoire naturelle, Musée de l'Homme, Département Homme et environnement, CNRS, UMR 7194, 75116, Paris, France
| | | | - Florent Détroit
- Muséum national d'Histoire naturelle, Musée de l'Homme, Département Homme et environnement, CNRS, UMR 7194, 75116, Paris, France
| |
Collapse
|
6
|
Cherni L, Pakstis AJ, Boussetta S, Elkamel S, Frigi S, Khodjet-El-Khil H, Barton A, Haigh E, Speed WC, Ben Ammar Elgaaied A, Kidd JR, Kidd KK. Genetic variation in Tunisia in the context of human diversity worldwide. AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 2016; 161:62-71. [PMID: 27192181 PMCID: PMC5084816 DOI: 10.1002/ajpa.23008] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 04/22/2016] [Indexed: 11/09/2022]
Abstract
OBJECTIVES North Africa has a complex demographic history of migrations from within Africa, Europe, and the Middle East. However, population genetic studies, especially for autosomal genetic markers, are few relative to other world regions. We examined autosomal markers for eight Tunisian and Libyan populations in order to place them in a global context. MATERIALS AND METHODS Data were collected by TaqMan on 399 autosomal single nucleotide polymorphisms on 331 individuals from Tunisia and Libya. These data were combined with data on the same SNPs previously typed on 2585 individuals from 57 populations from around the world. Where meaningful, close by SNPs were combined into multiallelic haplotypes. Data were evaluated by clustering, principal components, and population tree analyses. For a subset of 102 SNPs, data from the literature on seven additional North African populations were included in analyses. RESULTS Average heterozygosity of the North African populations is high relative to our global samples, consistent with a complex demographic history. The Tunisian and Libyan samples form a discrete cluster in the global and regional views and can be separated from sub-Sahara, Middle East, and Europe. Within Tunisia the Nebeur and Smar are outlier groups. Across North Africa, pervasive East-West geographical patterns were not found. DISCUSSION Known historical migrations and invasions did not displace or homogenize the genetic variation in the region but rather enriched it. Even a small region like Tunisia contains considerable genetic diversity. Future studies across North Africa have the potential to increase our understanding of the historical demographic factors influencing the region. Am J Phys Anthropol 161:62-71, 2016. © 2016 The Authors American Journal of Physical Anthropology Published by Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Lotfi Cherni
- Laboratory of Genetics, Immunology and Human Pathology, Science Faculty of Tunis, University of Tunis El Manar, 2092, Tunis, Tunisia.,High Institute of Biotechnology, University of Monastir, Monastir, 5000, Tunisia
| | - Andrew J Pakstis
- Department of Genetics, Yale University School of Medicine, New Haven, CT, 06520
| | - Sami Boussetta
- Laboratory of Genetics, Immunology and Human Pathology, Science Faculty of Tunis, University of Tunis El Manar, 2092, Tunis, Tunisia
| | - Sarra Elkamel
- Laboratory of Genetics, Immunology and Human Pathology, Science Faculty of Tunis, University of Tunis El Manar, 2092, Tunis, Tunisia
| | - Sabeh Frigi
- Laboratory of Genetics, Immunology and Human Pathology, Science Faculty of Tunis, University of Tunis El Manar, 2092, Tunis, Tunisia
| | - Houssein Khodjet-El-Khil
- Laboratory of Genetics, Immunology and Human Pathology, Science Faculty of Tunis, University of Tunis El Manar, 2092, Tunis, Tunisia
| | - Alison Barton
- Department of Genetics, Yale University School of Medicine, New Haven, CT, 06520
| | - Eva Haigh
- Department of Genetics, Yale University School of Medicine, New Haven, CT, 06520
| | - William C Speed
- Department of Genetics, Yale University School of Medicine, New Haven, CT, 06520
| | - Amel Ben Ammar Elgaaied
- Laboratory of Genetics, Immunology and Human Pathology, Science Faculty of Tunis, University of Tunis El Manar, 2092, Tunis, Tunisia
| | - Judith R Kidd
- Department of Genetics, Yale University School of Medicine, New Haven, CT, 06520
| | - Kenneth K Kidd
- Department of Genetics, Yale University School of Medicine, New Haven, CT, 06520
| |
Collapse
|
7
|
VariantSpark: population scale clustering of genotype information. BMC Genomics 2015; 16:1052. [PMID: 26651996 PMCID: PMC4676146 DOI: 10.1186/s12864-015-2269-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2015] [Accepted: 12/01/2015] [Indexed: 02/04/2023] Open
Abstract
Background Genomic information is increasingly used in medical practice giving rise to the need for efficient analysis methodology able to cope with thousands of individuals and millions of variants. The widely used Hadoop MapReduce architecture and associated machine learning library, Mahout, provide the means for tackling computationally challenging tasks. However, many genomic analyses do not fit the Map-Reduce paradigm. We therefore utilise the recently developed Spark engine, along with its associated machine learning library, MLlib, which offers more flexibility in the parallelisation of population-scale bioinformatics tasks. The resulting tool, VariantSpark provides an interface from MLlib to the standard variant format (VCF), offers seamless genome-wide sampling of variants and provides a pipeline for visualising results. Results To demonstrate the capabilities of VariantSpark, we clustered more than 3,000 individuals with 80 Million variants each to determine the population structure in the dataset. VariantSpark is 80 % faster than the Spark-based genome clustering approach, adam, the comparable implementation using Hadoop/Mahout, as well as Admixture, a commonly used tool for determining individual ancestries. It is over 90 % faster than traditional implementations using R and Python. Conclusion The benefits of speed, resource consumption and scalability enables VariantSpark to open up the usage of advanced, efficient machine learning algorithms to genomic data. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2269-7) contains supplementary material, which is available to authorized users.
Collapse
|