1
|
Hünemeier T. Biogeographic Perspectives on Human Genetic Diversification. Mol Biol Evol 2024; 41:msae029. [PMID: 38349332 PMCID: PMC10917211 DOI: 10.1093/molbev/msae029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 01/31/2024] [Accepted: 02/07/2024] [Indexed: 03/08/2024] Open
Abstract
Modern humans originated in Africa 300,000 yr ago, and before leaving their continent of origin, they underwent a process of intense diversification involving complex demographic dynamics. Upon exiting Africa, different populations emerged on the four other inhabited continents, shaped by the interplay of various evolutionary processes, such as migrations, founder effects, and natural selection. Within each region, continental populations, in turn, diversified and evolved almost independently for millennia. As a backdrop to this diversification, introgressions from archaic species contributed to establishing different patterns of genetic diversity in different geographic regions, reshaping our understanding of our species' variability. With the increasing availability of genomic data, it has become possible to delineate the subcontinental human population structure precisely. However, the bias toward the genomic research focused on populations from the global North has limited our understanding of the real diversity of our species and the processes and events that guided different human groups throughout their evolutionary history. This perspective is part of a series of articles celebrating 40 yr since our journal, Molecular Biology and Evolution, was founded (Russo et al. 2024). The perspective is accompanied by virtual issues, a selection of papers on human diversification published by Genome Biology and Evolution and Molecular Biology and Evolution.
Collapse
Affiliation(s)
- Tábita Hünemeier
- Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade de São Paulo, São Paulo, SP, Brazil
- Population Genetics Department, Institute of Evolutionary Biology (IBE - CSIC/Universitat Pompeu Fabra), 08003 Barcelona, Spain
| |
Collapse
|
2
|
Lan Q, Lin Y, Wang X, Yuan X, Shen C, Zhu B. Targeted sequencing of high-density SNPs provides an enhanced tool for forensic applications and genetic landscape exploration in Chinese Korean ethnic group. Hum Genomics 2023; 17:107. [PMID: 38008719 PMCID: PMC10680316 DOI: 10.1186/s40246-023-00541-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 10/05/2023] [Indexed: 11/28/2023] Open
Abstract
BACKGROUND In this study, we present a NGS-based panel designed for sequencing 1993 SNP loci for forensic DNA investigation. This panel addresses unique challenges encountered in forensic practice and allows for a comprehensive population genetic study of the Chinese Korean ethnic group. To achieve this, we combine our results with datasets from the 1000 Genomes Project and the Human Genome Diversity Panel. RESULTS We demonstrate that this panel is a reliable tool for individual identification and parentage testing, even when dealing with degraded DNA samples featuring exceedingly low SNP detection rates. The performance of this panel for complex kinship determinations, such as half-sibling and grandparent-grandchild scenarios, is also validated by various kinship simulations. Population genetic studies indicate that this panel can uncover population substructures on both global and regional scales. Notably, the Han population can be distinguished from the ethnic minorities in the northern and southern regions of East Asia, suggesting its potential for regional ancestry inference. Furthermore, we highlight that the Chinese Korean ethnic group, along with various Han populations from different regional areas and certain northern ethnic minorities (Daur, Tujia, Japanese, Mongolian, Xibo), exhibit a higher degree of genetic affinities when examined from a genomic perspective. CONCLUSION This study provides convincing evidence that the NGS-based panel can serve as a reliable tool for various forensic applications. Moreover, it has helped to enhance our knowledge about the genetic landscape of the Chinese Korean ethnic group.
Collapse
Affiliation(s)
- Qiong Lan
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China
| | - Yifeng Lin
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China
| | - Xi Wang
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China
| | - Xi Yuan
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China
| | - Chunmei Shen
- Department of Laboratory Medicine, Nanfang Hospital, Southern Medical University, Guangzhou, China.
| | - Bofeng Zhu
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China.
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China.
- Key Laboratory of Shaanxi Province for Craniofacial Precision Medicine Research, College of Stomatology, Xi'an Jiaotong University, Xi'an, China.
| |
Collapse
|
3
|
Anderson-Trocmé L, Nelson D, Zabad S, Diaz-Papkovich A, Kryukov I, Baya N, Touvier M, Jeffery B, Dina C, Vézina H, Kelleher J, Gravel S. On the genes, genealogies, and geographies of Quebec. Science 2023; 380:849-855. [PMID: 37228217 DOI: 10.1126/science.add5300] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 04/24/2023] [Indexed: 05/27/2023]
Abstract
Population genetic models only provide coarse representations of real-world ancestry. We used a pedigree compiled from 4 million parish records and genotype data from 2276 French and 20,451 French Canadian individuals to finely model and trace French Canadian ancestry through space and time. The loss of ancestral French population structure and the appearance of spatial and regional structure highlights a wide range of population expansion models. Geographic features shaped migrations, and we find enrichments for migration, genetic, and genealogical relatedness patterns within river networks across regions of Quebec. Finally, we provide a freely accessible simulated whole-genome sequence dataset with spatiotemporal metadata for 1,426,749 individuals reflecting intricate French Canadian population structure. Such realistic population-scale simulations provide opportunities to investigate population genetics at an unprecedented resolution.
Collapse
Affiliation(s)
- Luke Anderson-Trocmé
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- McGill University Genome Centre, Montreal, QC, Canada
| | - Dominic Nelson
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- McGill University Genome Centre, Montreal, QC, Canada
| | - Shadi Zabad
- School of Computer Science, McGill University, Montreal, QC, Canada
| | - Alex Diaz-Papkovich
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- Quantitative Life Sciences, McGill University, Montreal, QC, Canada
| | - Ivan Kryukov
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- McGill University Genome Centre, Montreal, QC, Canada
| | - Nikolas Baya
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Mathilde Touvier
- Sorbonne Paris Nord University, INSERM U1153, INRAE U1125, CNAM, Nutritional Epidemiology Research Team (EREN), Epidemiology and Statistics Research Center, University Paris Cité (CRESS), Bobigny, France
| | - Ben Jeffery
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Christian Dina
- Nantes Université, CNRS, INSERM, l'institut du thorax, Nantes, France
| | - Hélène Vézina
- BALSAC Project, Université du Québec á Chicoutimi, Chicoutimi, QC, Canada
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- McGill University Genome Centre, Montreal, QC, Canada
| |
Collapse
|
4
|
Relevance of CYP2D6 Gene Variants in Population Genetic Differentiation. Pharmaceutics 2022; 14:pharmaceutics14112481. [PMID: 36432672 PMCID: PMC9694252 DOI: 10.3390/pharmaceutics14112481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 11/09/2022] [Accepted: 11/10/2022] [Indexed: 11/18/2022] Open
Abstract
A significant portion of the variability in complex features, such as drug response, is likely caused by human genetic diversity. One of the highly polymorphic pharmacogenes is CYP2D6, encoding an enzyme involved in the metabolism of about 25% of commonly prescribed drugs. In a directed search of the 1000 Genomes Phase III variation data, 86 single nucleotide polymorphisms (SNPs) in the CYP2D6 gene were extracted from the genotypes of 2504 individuals from 26 populations, and then used to reconstruct haplotypes. Analyses were performed using Haploview, Phase, and Arlequin softwares. Haplotype and nucleotide diversity were high in all populations, but highest in populations of African ancestry. Pairwise FST showed significant results for eleven SNPs, six of which were characteristic of African populations, while four SNPs were most common in East Asian populations. A principal component analysis of CYP2D6 haplotypes showed that African populations form one cluster, Asian populations form another cluster with East and South Asian populations separated, while European populations form the third cluster. Linkage disequilibrium showed that all African populations have three or more haplotype blocks within the CYP2D6 gene, while other world populations have one, except for Chinese Dai and Punjabi in Pakistan populations, which have two.
Collapse
|
5
|
Xu H, Fang Y, Zhao M, Lan Q, Mei S, Liu L, Bai X, Zhu B. Forensic Features and Genetic Structure Analyses of the Beijing Han Nationality Disclosed by a Self-Developed Panel Containing a Series of Ancestry Informative Deletion/Insertion Polymorphism Loci. Front Ecol Evol 2022. [DOI: 10.3389/fevo.2022.890153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The utilization of the ancestry informative markers to disclose the ancestral composition of a certain population and explore the genetic affinities between diverse populations is beneficial to inferring the biogeographic ancestry of unknown individuals and assisting in case detection, as well as avoiding the impacts of population stratification during genome-wide association analysis studies. In the present study, we applied an in-house ancestry informative deletion/insertion polymorphic multiplex amplification system to investigate the ancestral compositions of the Beijing Han population and analyze the genetic relationships between the Beijing Han population and 31 global reference populations. The results demonstrated that 32 loci of this self-developed panel containing 39 loci significantly contributed to the inference of genetic information for the Beijing Han population. The results of multiple population genetics statistical analyses indicated that the ancestral component and genetic architecture of the Beijing Han population were analogous to the reference East Asian populations, and that the Beijing Han population was genetically close to the reference East Asian populations.
Collapse
|
6
|
Species delimitation and mitonuclear discordance within a species complex of biting midges. Sci Rep 2022; 12:1730. [PMID: 35110675 PMCID: PMC8810881 DOI: 10.1038/s41598-022-05856-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 01/17/2022] [Indexed: 12/11/2022] Open
Abstract
The inability to distinguish between species can be a serious problem in groups responsible for pathogen transmission. Culicoides biting midges transmit many pathogenic agents infecting wildlife and livestock. In North America, the C. variipennis species complex contains three currently recognized species, only one of which is a known vector, but limited species-specific characters have hindered vector surveillance. Here, genomic data were used to investigate population structure and genetic differentiation within this species complex. Single nucleotide polymorphism data were generated for 206 individuals originating from 17 locations throughout the United States and Canada. Clustering analyses suggest the occurrence of two additional cryptic species within this complex. All five species were significantly differentiated in both sympatry and allopatry. Evidence of hybridization was detected in three different species pairings indicating incomplete reproductive isolation. Additionally, COI sequences were used to identify the hybrid parentage of these individuals, which illuminated discordance between the divergence of the mitochondrial and nuclear datasets.
Collapse
|
7
|
Mignotte A, Garros C, Dellicour S, Jacquot M, Gilbert M, Gardès L, Balenghien T, Duhayon M, Rakotoarivony I, de Wavrechin M, Huber K. High dispersal capacity of Culicoides obsoletus (Diptera: Ceratopogonidae), vector of bluetongue and Schmallenberg viruses, revealed by landscape genetic analyses. Parasit Vectors 2021; 14:93. [PMID: 33536057 PMCID: PMC7860033 DOI: 10.1186/s13071-020-04522-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 12/04/2020] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND In the last two decades, recurrent epizootics of bluetongue virus and Schmallenberg virus have been reported in the western Palearctic region. These viruses affect domestic cattle, sheep, goats and wild ruminants and are transmitted by native hematophagous midges of the genus Culicoides (Diptera: Ceratopogonidae). Culicoides dispersal is known to be stratified, i.e. due to a combination of dispersal processes occurring actively at short distances and passively or semi-actively at long distances, allowing individuals to jump hundreds of kilometers. METHODS Here, we aim to identify the environmental factors that promote or limit gene flow of Culicoides obsoletus, an abundant and widespread vector species in Europe, using an innovative framework integrating spatial, population genetics and statistical approaches. A total of 348 individuals were sampled in 46 sites in France and were genotyped using 13 newly designed microsatellite markers. RESULTS We found low genetic differentiation and a weak population structure for C. obsoletus across the country. Using three complementary inter-individual genetic distances, we did not detect any significant isolation by distance, but did detect significant anisotropic isolation by distance on a north-south axis. We employed a multiple regression on distance matrices approach to investigate the correlation between genetic and environmental distances. Among all the environmental factors that were tested, only cattle density seems to have an impact on C. obsoletus gene flow. CONCLUSIONS The high dispersal capacity of C. obsoletus over land found in the present study calls for a re-evaluation of the impact of Culicoides on virus dispersal, and highlights the urgent need to better integrate molecular, spatial and statistical information to guide vector-borne disease control.
Collapse
Affiliation(s)
- Antoine Mignotte
- ASTRE, Univ Montpellier, Cirad, INRAE, Montpellier, France
- Cirad, UMR ASTRE, 34398 Montpellier, France
| | - Claire Garros
- ASTRE, Univ Montpellier, Cirad, INRAE, Montpellier, France
- Cirad, UMR ASTRE, 34398 Montpellier, France
| | - Simon Dellicour
- Spatial Epidemiology Lab (SpELL), Université Libre de Bruxelles, CP160/12, 50, av. FD Roosevelt, 1050 Bruxelles, Belgium
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Herestraat 49, 3000 Leuven, Belgium
| | - Maude Jacquot
- Spatial Epidemiology Lab (SpELL), Université Libre de Bruxelles, CP160/12, 50, av. FD Roosevelt, 1050 Bruxelles, Belgium
- UMR EPIA, Université Clermont Auvergne, INRAE, VetAgro Sup, 63122 Saint-Genès-Champanelle, France
| | - Marius Gilbert
- Spatial Epidemiology Lab (SpELL), Université Libre de Bruxelles, CP160/12, 50, av. FD Roosevelt, 1050 Bruxelles, Belgium
| | - Laetitia Gardès
- ASTRE, Univ Montpellier, Cirad, INRAE, Montpellier, France
- Cirad, UMR ASTRE, 97170 Petit-Bourg, Guadeloupe France
| | - Thomas Balenghien
- ASTRE, Univ Montpellier, Cirad, INRAE, Montpellier, France
- Cirad, UMR ASTRE, 10100 Rabat, Morocco
- Unité Microbiologie, immunologie et maladies contagieuses, Institut Agronomique et Vétérinaire Hassan II, 10100 Rabat-Instituts, Morocco
| | - Maxime Duhayon
- ASTRE, Univ Montpellier, Cirad, INRAE, Montpellier, France
- Cirad, UMR ASTRE, 34398 Montpellier, France
| | - Ignace Rakotoarivony
- ASTRE, Univ Montpellier, Cirad, INRAE, Montpellier, France
- Cirad, UMR ASTRE, 34398 Montpellier, France
| | - Maïa de Wavrechin
- ASTRE, Univ Montpellier, Cirad, INRAE, Montpellier, France
- Cirad, UMR ASTRE, 34398 Montpellier, France
| | - Karine Huber
- ASTRE, Univ Montpellier, Cirad, INRAE, Montpellier, France
| |
Collapse
|
8
|
Battey CJ, Ralph PL, Kern AD. Space is the Place: Effects of Continuous Spatial Structure on Analysis of Population Genetic Data. Genetics 2020; 215:193-214. [PMID: 32209569 PMCID: PMC7198281 DOI: 10.1534/genetics.120.303143] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Accepted: 03/12/2020] [Indexed: 12/14/2022] Open
Abstract
Real geography is continuous, but standard models in population genetics are based on discrete, well-mixed populations. As a result, many methods of analyzing genetic data assume that samples are a random draw from a well-mixed population, but are applied to clustered samples from populations that are structured clinally over space. Here, we use simulations of populations living in continuous geography to study the impacts of dispersal and sampling strategy on population genetic summary statistics, demographic inference, and genome-wide association studies (GWAS). We find that most common summary statistics have distributions that differ substantially from those seen in well-mixed populations, especially when Wright's neighborhood size is < 100 and sampling is spatially clustered. "Stepping-stone" models reproduce some of these effects, but discretizing the landscape introduces artifacts that in some cases are exacerbated at higher resolutions. The combination of low dispersal and clustered sampling causes demographic inference from the site frequency spectrum to infer more turbulent demographic histories, but averaged results across multiple simulations revealed surprisingly little systematic bias. We also show that the combination of spatially autocorrelated environments and limited dispersal causes GWAS to identify spurious signals of genetic association with purely environmentally determined phenotypes, and that this bias is only partially corrected by regressing out principal components of ancestry. Last, we discuss the relevance of our simulation results for inference from genetic variation in real organisms.
Collapse
Affiliation(s)
- C J Battey
- Institute of Ecology Evolution, Department of Biology, University of Oregon, Eugene, Oregon
| | - Peter L Ralph
- Institute of Ecology Evolution, Department of Biology, University of Oregon, Eugene, Oregon
| | - Andrew D Kern
- Institute of Ecology Evolution, Department of Biology, University of Oregon, Eugene, Oregon
| |
Collapse
|
9
|
Irish JD, Morez A, Girdland Flink L, Phillips EL, Scott GR. Do dental nonmetric traits actually work as proxies for neutral genomic data? Some answers from continental‐ and global‐level analyses. AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 2020; 172:347-375. [DOI: 10.1002/ajpa.24052] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 02/25/2020] [Accepted: 03/10/2020] [Indexed: 11/07/2022]
Affiliation(s)
- Joel D. Irish
- School of Biological and Environmental Sciences Liverpool John Moores University Liverpool UK
- Evolutionary Studies Institute and Centre for Excellence in PaleoSciences University of the Witwatersrand South Africa
| | - Adeline Morez
- School of Biological and Environmental Sciences Liverpool John Moores University Liverpool UK
| | - Linus Girdland Flink
- School of Biological and Environmental Sciences Liverpool John Moores University Liverpool UK
- Department of Archaeology School of Geosciences, University of Aberdeen Aberdeen UK
| | - Emma L.W. Phillips
- School of Biological and Environmental Sciences Liverpool John Moores University Liverpool UK
| | - G. Richard Scott
- Anthropology Department University of Nevada Reno Reno, Nevada USA
| |
Collapse
|
10
|
Next generation sequencing of a set of ancestry-informative SNPs: ancestry assignment of three continental populations and estimating ancestry composition for Mongolians. Mol Genet Genomics 2020; 295:1027-1038. [PMID: 32206883 DOI: 10.1007/s00438-020-01660-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Accepted: 02/27/2020] [Indexed: 12/31/2022]
Abstract
When traditional short tandem repeat profiling fails to provide valuable information to arrest the criminal, forensic ancestry inference of the biological samples left at the crime scene will probably offer investigative leads and facilitate the investigation process of the case. That is why there are consistent efforts in developing panels for ancestry inference in forensic science. Presently, a 30-plex next generation sequencing-based assay was exploited in this study by assembling well-differentiated single nucleotide polymorphisms for ancestry assignment of unknown individuals from three continental populations (African, European and East Asian). And meanwhile, relatively balanced population-specific differentiation values were maintained to avoid the over-estimation or under-estimation of co-ancestry proportions in individuals with admixed ancestry. The principal component analysis and STRUCTURE analysis of reference populations, test populations and the studied Mongolian group indicated that the novel assay was efficient enough to determine the ancestry origin of an unknown individual from the three continental populations. Besides, ancestry membership proportion estimations for the Mongolian group revealed that a large fraction of the ancestry was contributed by East Asian genetic component (approximately 83.9%), followed by European (approximately 12.6%) and African genetic components (approximately 3.5%), respectively. And next generation sequencing technology applied in this study offers possibility to incorporate more single nucleotide polymorphisms for individual identification and phenotype prediction into the same assay to provide as many as possible investigative clues in the future.
Collapse
|
11
|
Lorente-Galdos B, Lao O, Serra-Vidal G, Santpere G, Kuderna LFK, Arauna LR, Fadhlaoui-Zid K, Pimenoff VN, Soodyall H, Zalloua P, Marques-Bonet T, Comas D. Whole-genome sequence analysis of a Pan African set of samples reveals archaic gene flow from an extinct basal population of modern humans into sub-Saharan populations. Genome Biol 2019; 20:77. [PMID: 31023378 PMCID: PMC6485163 DOI: 10.1186/s13059-019-1684-5] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Accepted: 03/28/2019] [Indexed: 12/30/2022] Open
Abstract
Background Population demography and gene flow among African groups, as well as the putative archaic introgression of ancient hominins, have been poorly explored at the genome level. Results Here, we examine 15 African populations covering all major continental linguistic groups, ecosystems, and lifestyles within Africa through analysis of whole-genome sequence data of 21 individuals sequenced at deep coverage. We observe a remarkable correlation among genetic diversity and geographic distance, with the hunter-gatherer groups being more genetically differentiated and having larger effective population sizes throughout most modern-human history. Admixture signals are found between neighbor populations from both hunter-gatherer and agriculturalists groups, whereas North African individuals are closely related to Eurasian populations. Regarding archaic gene flow, we test six complex demographic models that consider recent admixture as well as archaic introgression. We identify the fingerprint of an archaic introgression event in the sub-Saharan populations included in the models (~ 4.0% in Khoisan, ~ 4.3% in Mbuti Pygmies, and ~ 5.8% in Mandenka) from an early divergent and currently extinct ghost modern human lineage. Conclusion The present study represents an in-depth genomic analysis of a Pan African set of individuals, which emphasizes their complex relationships and demographic history at population level. Electronic supplementary material The online version of this article (10.1186/s13059-019-1684-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Belen Lorente-Galdos
- Departament de Ciències Experimentals i de la Salut, Institut de Biologia Evolutiva (UPF/CSIC), Universitat Pompeu Fabra, 08003, Barcelona, Spain.,Department of Neuroscience, Yale School of Medicine, New Haven, CT, USA
| | - Oscar Lao
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, 08028, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Gerard Serra-Vidal
- Departament de Ciències Experimentals i de la Salut, Institut de Biologia Evolutiva (UPF/CSIC), Universitat Pompeu Fabra, 08003, Barcelona, Spain
| | - Gabriel Santpere
- Departament de Ciències Experimentals i de la Salut, Institut de Biologia Evolutiva (UPF/CSIC), Universitat Pompeu Fabra, 08003, Barcelona, Spain.,Department of Neuroscience, Yale School of Medicine, New Haven, CT, USA
| | - Lukas F K Kuderna
- Departament de Ciències Experimentals i de la Salut, Institut de Biologia Evolutiva (UPF/CSIC), Universitat Pompeu Fabra, 08003, Barcelona, Spain
| | - Lara R Arauna
- Departament de Ciències Experimentals i de la Salut, Institut de Biologia Evolutiva (UPF/CSIC), Universitat Pompeu Fabra, 08003, Barcelona, Spain
| | - Karima Fadhlaoui-Zid
- College of Science, Department of Biology, Taibah University, Al Madinah, Al Monawarah, Saudi Arabia.,Higher Institute of Biotechnology of Beja, University of Jendouba, Avenue Habib Bourguiba, BP, 382, 9000, Beja, Tunisia
| | - Ville N Pimenoff
- Oncology Data Analytics Program, Bellvitge Biomedical Research Institute (ICO-IDIBELL), Consortium for Biomedical Research in Epidemiology and Public Health, Hospitalet de Llobregat, Barcelona, Spain.,Department of Archaeology, University of Helsinki, Helsinki, Finland
| | - Himla Soodyall
- Division of Human Genetics, School of Pathology, Faculty of Health Sciences, University of the Witwatersrand and National Health Laboratory Service, Johannesburg, South Africa
| | - Pierre Zalloua
- School of Medicine, The Lebanese American University, Beirut, 1102-2801, Lebanon
| | - Tomas Marques-Bonet
- Departament de Ciències Experimentals i de la Salut, Institut de Biologia Evolutiva (UPF/CSIC), Universitat Pompeu Fabra, 08003, Barcelona, Spain.,CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, 08028, Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats, ICREA, 08003, Barcelona, Spain
| | - David Comas
- Departament de Ciències Experimentals i de la Salut, Institut de Biologia Evolutiva (UPF/CSIC), Universitat Pompeu Fabra, 08003, Barcelona, Spain.
| |
Collapse
|
12
|
Geremew A, Woldemariam MG, Kefalew A, Stiers I, Triest L. Isotropic and anisotropic processes influence fine-scale spatial genetic structure of a keystone tropical plant. AOB PLANTS 2018; 10:plx076. [PMID: 29383234 PMCID: PMC5777495 DOI: 10.1093/aobpla/plx076] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Accepted: 01/02/2018] [Indexed: 06/07/2023]
Abstract
Limited seed or pollen dispersal enhances spatial genetic relatedness between individuals (fine-scale spatial genetic structure, FSGS), which usually decreases as a function of physical distance. However, such isotropic pattern of FSGS may not always occur when spatially asymmetric processes, for instance, wind direction during dispersal, are considered in wind-pollinated and -dispersed plants. This study assessed the pattern of FSGS in the keystone tropical wetland plant Cyperus papyrus (papyrus) as a function of these isotropic and anisotropic processes. We tested the hypothesis that the FSGS would be influenced by predominant wind direction during pollen and seed dispersal, as well as by the physical distance between individuals. We genotyped a total of 510 adults and 407 juveniles from three papyrus swamps (Ethiopia) using 15 microsatellite markers. In addition, the contemporary directional dispersal by wind was evaluated by seed release-recapture experiments and complemented with parentage analysis. Adults and juveniles differed in the strength of isotropic FSGS ranging from 0.09 to 0.13 and 0.12 to 0.16, respectively, and this suggests variation in dispersal distance. Anisotropic FSGS was found to be a function of asymmetric wind direction during dispersal/pollination that varied between sites. Historical gene dispersal distance was astoundingly low (<4 m), possibly due to localized seed rain. According to our contemporary dispersal estimates, mean pollen dispersal distances were longer than those of seed dispersal (101 and <55 m, respectively). More than two-thirds of seeds and half of pollen grains were locally dispersed (≤80 m). The difference in historical and contemporary dispersal distance probably resulted from the asymmetric wind direction due to change in vegetation cover in the surrounding matrix. We further concluded that, in addition to wind direction, post-dispersal processes could influence gene dispersal distance inferred from the FSGS.
Collapse
Affiliation(s)
- Addisie Geremew
- Department of Biology, Vrije Universiteit Brussel (VUB), Pleinlaan,Brussels, Belgium
| | | | - Alemayehu Kefalew
- Department of Plant Biology and Biodiversity Management, College of Natural Sciences, Addis Ababa University, Addis Ababa, Ethiopia
| | - Iris Stiers
- Department of Biology, Vrije Universiteit Brussel (VUB), Pleinlaan,Brussels, Belgium
| | - Ludwig Triest
- Department of Biology, Vrije Universiteit Brussel (VUB), Pleinlaan,Brussels, Belgium
| |
Collapse
|
13
|
Bhaskar A, Javanmard A, Courtade TA, Tse D. Novel probabilistic models of spatial genetic ancestry with applications to stratification correction in genome-wide association studies. Bioinformatics 2017; 33:879-885. [PMID: 28025204 PMCID: PMC5860619 DOI: 10.1093/bioinformatics/btw720] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2016] [Revised: 10/18/2016] [Accepted: 11/10/2016] [Indexed: 11/12/2022] Open
Abstract
Motivation Genetic variation in human populations is influenced by geographic ancestry due to spatial locality in historical mating and migration patterns. Spatial population structure in genetic datasets has been traditionally analyzed using either model-free algorithms, such as principal components analysis (PCA) and multidimensional scaling, or using explicit spatial probabilistic models of allele frequency evolution. We develop a general probabilistic model and an associated inference algorithm that unify the model-based and data-driven approaches to visualizing and inferring population structure. Our spatial inference algorithm can also be effectively applied to the problem of population stratification in genome-wide association studies (GWAS), where hidden population structure can create fictitious associations when population ancestry is correlated with both the genotype and the trait. Results Our algorithm Geographic Ancestry Positioning (GAP) relates local genetic distances between samples to their spatial distances, and can be used for visually discerning population structure as well as accurately inferring the spatial origin of individuals on a two-dimensional continuum. On both simulated and several real datasets from diverse human populations, GAP exhibits substantially lower error in reconstructing spatial ancestry coordinates compared to PCA. We also develop an association test that uses the ancestry coordinates inferred by GAP to accurately account for ancestry-induced correlations in GWAS. Based on simulations and analysis of a dataset of 10 metabolic traits measured in a Northern Finland cohort, which is known to exhibit significant population structure, we find that our method has superior power to current approaches. Availability and Implementation Our software is available at https://github.com/anand-bhaskar/gap . Contacts abhaskar@stanford.edu or ajavanma@usc.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anand Bhaskar
- Department of Genetics, Stanford University, Stanford, CA, USA
- Howard Hughes Medical Institute, Stanford University, Stanford, CA, USA
| | - Adel Javanmard
- Marshall School of Business, University of Southern California, Los Angeles, CA, USA
| | - Thomas A Courtade
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
| | - David Tse
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
- Department of Electrical Engineering, Stanford University, Stanford, CA, USA
| |
Collapse
|
14
|
Ringbauer H, Coop G, Barton NH. Inferring Recent Demography from Isolation by Distance of Long Shared Sequence Blocks. Genetics 2017; 205:1335-1351. [PMID: 28108588 PMCID: PMC5340342 DOI: 10.1534/genetics.116.196220] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2016] [Accepted: 01/13/2017] [Indexed: 12/12/2022] Open
Abstract
Recently it has become feasible to detect long blocks of nearly identical sequence shared between pairs of genomes. These identity-by-descent (IBD) blocks are direct traces of recent coalescence events and, as such, contain ample signal to infer recent demography. Here, we examine sharing of such blocks in two-dimensional populations with local migration. Using a diffusion approximation to trace genetic ancestry, we derive analytical formulas for patterns of isolation by distance of IBD blocks, which can also incorporate recent population density changes. We introduce an inference scheme that uses a composite-likelihood approach to fit these formulas. We then extensively evaluate our theory and inference method on a range of scenarios using simulated data. We first validate the diffusion approximation by showing that the theoretical results closely match the simulated block-sharing patterns. We then demonstrate that our inference scheme can accurately and robustly infer dispersal rate and effective density, as well as bounds on recent dynamics of population density. To demonstrate an application, we use our estimation scheme to explore the fit of a diffusion model to Eastern European samples in the Population Reference Sample data set. We show that ancestry diffusing with a rate of [Formula: see text] during the last centuries, combined with accelerating population growth, can explain the observed exponential decay of block sharing with increasing pairwise sample distance.
Collapse
Affiliation(s)
- Harald Ringbauer
- Institute of Science and Technology Austria, A-3400 Klosterneuburg, Austria
| | - Graham Coop
- Department of Evolution and Ecology, University of California, Davis, California 95616
- Center for Population Biology, University of California, Davis, California 95616
| | - Nicholas H Barton
- Institute of Science and Technology Austria, A-3400 Klosterneuburg, Austria
| |
Collapse
|
15
|
Overcoming the dichotomy between open and isolated populations using genomic data from a large European dataset. Sci Rep 2017; 7:41614. [PMID: 28145502 PMCID: PMC5286425 DOI: 10.1038/srep41614] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2016] [Accepted: 12/22/2016] [Indexed: 01/01/2023] Open
Abstract
Human populations are often dichotomized into “isolated” and “open” categories using cultural and/or geographical barriers to gene flow as differential criteria. Although widespread, the use of these alternative categories could obscure further heterogeneity due to inter-population differences in effective size, growth rate, and timing or amount of gene flow. We compared intra and inter-population variation measures combining novel and literature data relative to 87,818 autosomal SNPs in 14 open populations and 10 geographic and/or linguistic European isolates. Patterns of intra-population diversity were found to vary considerably more among isolates, probably due to differential levels of drift and inbreeding. The relatively large effective size estimated for some population isolates challenges the generalized view that they originate from small founding groups. Principal component scores based on measures of intra-population variation of isolated and open populations were found to be distributed along a continuum, with an area of intersection between the two groups. Patterns of inter-population diversity were even closer, as we were able to detect some differences between population groups only for a few multidimensional scaling dimensions. Therefore, different lines of evidence suggest that dichotomizing human populations into open and isolated groups fails to capture the actual relations among their genomic features.
Collapse
|
16
|
Novembre J, Peter BM. Recent advances in the study of fine-scale population structure in humans. Curr Opin Genet Dev 2016; 41:98-105. [PMID: 27662060 DOI: 10.1016/j.gde.2016.08.007] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Revised: 08/18/2016] [Accepted: 08/24/2016] [Indexed: 01/17/2023]
Abstract
Empowered by modern genotyping and large samples, population structure can be accurately described and quantified even when it only explains a fraction of a percent of total genetic variance. This is especially relevant and interesting for humans, where fine-scale population structure can both confound disease-mapping studies and reveal the history of migration and divergence that shaped our species' diversity. Here we review notable recent advances in the detection, use, and understanding of population structure. Our work addresses multiple areas where substantial progress is being made: improved statistics and models for better capturing differentiation, admixture, and the spatial distribution of variation; computational speed-ups that allow methods to scale to modern data; and advances in haplotypic modeling that have wide ranging consequences for the analysis of population structure. We conclude by outlining four important open challenges: the limitations of discrete population models, uncertainty in individual origins, the incorporation of both fine-scale structure and ancient DNA in parametric models, and the development of efficient computational tools, particularly for haplotype-based methods.
Collapse
Affiliation(s)
- John Novembre
- Department of Human Genetics, University of Chicago, IL 60636, United States; Department of Ecology and Evolutionary Biology, University of Chicago, IL 60636, United States
| | - Benjamin M Peter
- Department of Human Genetics, University of Chicago, IL 60636, United States
| |
Collapse
|
17
|
Saeb ATM, Al-Naqeb D. The Impact of Evolutionary Driving Forces on Human Complex Diseases: A Population Genetics Approach. SCIENTIFICA 2016; 2016:2079704. [PMID: 27313952 PMCID: PMC4904122 DOI: 10.1155/2016/2079704] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 03/22/2016] [Indexed: 06/06/2023]
Abstract
Investigating the molecular evolution of human genome has paved the way to understand genetic adaptation of humans to the environmental changes and corresponding complex diseases. In this review, we discussed the historical origin of genetic diversity among human populations, the evolutionary driving forces that can affect genetic diversity among populations, and the effects of human movement into new environments and gene flow on population genetic diversity. Furthermore, we presented the role of natural selection on genetic diversity and complex diseases. Then we reviewed the disadvantageous consequences of historical selection events in modern time and their relation to the development of complex diseases. In addition, we discussed the effect of consanguinity on the incidence of complex diseases in human populations. Finally, we presented the latest information about the role of ancient genes acquired from interbreeding with ancient hominids in the development of complex diseases.
Collapse
Affiliation(s)
- Amr T. M. Saeb
- Strategic Center for Diabetes Research, College of Medicine, King Saud University, P.O. Box 18397, Riyadh 11415, Saudi Arabia
| | - Dhekra Al-Naqeb
- Strategic Center for Diabetes Research, College of Medicine, King Saud University, P.O. Box 18397, Riyadh 11415, Saudi Arabia
| |
Collapse
|
18
|
Duforet-Frebourg N, Slatkin M. Isolation-by-distance-and-time in a stepping-stone model. Theor Popul Biol 2016; 108:24-35. [PMID: 26592162 PMCID: PMC4779737 DOI: 10.1016/j.tpb.2015.11.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2015] [Revised: 10/26/2015] [Accepted: 11/03/2015] [Indexed: 01/30/2023]
Abstract
With the great advances in ancient DNA extraction, genetic data are now obtained from geographically separated individuals from both present and past. However, population genetics theory about the joint effect of space and time has not been thoroughly studied. Based on the classical stepping-stone model, we develop the theory of Isolation by distance and time. We derive the correlation of allele frequencies between demes in the case where ancient samples are present, and investigate the impact of edge effects with forward-in-time simulations. We also derive results about coalescent times in circular and toroidal models. As one of the most common ways to investigate population structure is principal components analysis (PCA), we evaluate the impact of our theory on PCA plots. Our results demonstrate that time between samples is an important factor. Ancient samples tend to be drawn to the center of a PCA plot.
Collapse
Affiliation(s)
- Nicolas Duforet-Frebourg
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, United States.
| | - Montgomery Slatkin
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, United States
| |
Collapse
|
19
|
Duforet-Frebourg N, Luu K, Laval G, Bazin E, Blum MGB. Detecting Genomic Signatures of Natural Selection with Principal Component Analysis: Application to the 1000 Genomes Data. Mol Biol Evol 2016; 33:1082-93. [PMID: 26715629 PMCID: PMC4776707 DOI: 10.1093/molbev/msv334] [Citation(s) in RCA: 87] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
To characterize natural selection, various analytical methods for detecting candidate genomic regions have been developed. We propose to perform genome-wide scans of natural selection using principal component analysis (PCA). We show that the common FST index of genetic differentiation between populations can be viewed as the proportion of variance explained by the principal components. Considering the correlations between genetic variants and each principal component provides a conceptual framework to detect genetic variants involved in local adaptation without any prior definition of populations. To validate the PCA-based approach, we consider the 1000 Genomes data (phase 1) considering 850 individuals coming from Africa, Asia, and Europe. The number of genetic variants is of the order of 36 millions obtained with a low-coverage sequencing depth (3×). The correlations between genetic variation and each principal component provide well-known targets for positive selection (EDAR, SLC24A5, SLC45A2, DARC), and also new candidate genes (APPBPP2, TP1A1, RTTN, KCNMA, MYO5C) and noncoding RNAs. In addition to identifying genes involved in biological adaptation, we identify two biological pathways involved in polygenic adaptation that are related to the innate immune system (beta defensins) and to lipid metabolism (fatty acid omega oxidation). An additional analysis of European data shows that a genome scan based on PCA retrieves classical examples of local adaptation even when there are no well-defined populations. PCA-based statistics, implemented in the PCAdapt R package and the PCAdapt fast open-source software, retrieve well-known signals of human adaptation, which is encouraging for future whole-genome sequencing project, especially when defining populations is difficult.
Collapse
Affiliation(s)
- Nicolas Duforet-Frebourg
- TIMC-IMAG UMR 5525, Univ. Grenoble Alpes, Grenoble, France CNRS, TIMC-IMAG, Grenoble, France Department of Integrative Biology, University of California, Berkeley
| | - Keurcien Luu
- TIMC-IMAG UMR 5525, Univ. Grenoble Alpes, Grenoble, France CNRS, TIMC-IMAG, Grenoble, France
| | - Guillaume Laval
- Department of Genomes and Genetics, Institut Pasteur, Human Evolutionary Genetics, Paris, France Centre National De La Recherche Scientifique, URA3012, Paris, France
| | - Eric Bazin
- CNRS, Laboratoire D'ecologie Alpine UMR 5553, Univ. Grenoble Alpes, Grenoble, France
| | - Michael G B Blum
- TIMC-IMAG UMR 5525, Univ. Grenoble Alpes, Grenoble, France CNRS, TIMC-IMAG, Grenoble, France
| |
Collapse
|
20
|
Messina F, Scano G, Contini I, Martínez-Labarga C, De Stefano GF, Rickards O. Linking between genetic structure and geographical distance: Study of the maternal gene pool in the Ethiopian population. Ann Hum Biol 2016; 44:53-69. [PMID: 26883569 DOI: 10.3109/03014460.2016.1155646] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Background The correlation between genetics and geographical distance has already been examined through the study of the dispersion of human populations, especially in terms of uniparental genetic markers. Aim The present work characterises, at the level of the mitochondrial DNA (mtDNA), two new samples of Amhara and Oromo populations from Ethiopia to evaluate the possible pattern of distribution for mtDNA variation and to test the hypothesis of the Isolation-by-Distance (IBD) model among African, European and Middle-Eastern populations. Subjects and methods This study analysed 173 individuals belonging to two ethnic groups of Ethiopia, Amhara and Oromo, by assaying HVS-I and HVS-II of mtDNA D-loop and informative coding region SNPs of mtDNA. Results The analysis suggests a relationship between genetic and geographic distances, affirming that the mtDNA pool of Africa, Europe and the Middle East might be coherent with the IBD model. Moreover, the mtDNA gene pools of the Sub-Saharan African and Mediterranean populations were very different. Conclusion In this study the pattern of mtDNA distribution, beginning with the Ethiopian plateau, was tested in the IBD model. It could be affirmed that, on a continent scale, the mtDNA pool of Africa, Europe and the Middle East might fall under the IBD model.
Collapse
Affiliation(s)
- Francesco Messina
- a Center of Molecular Anthropology for Ancient DNA Study, Department of Biology , University of Rome 'Tor Vergata' , Via della Ricerca Scientifica n. 1 , 00133 Rome , Italy
| | - Giuseppina Scano
- a Center of Molecular Anthropology for Ancient DNA Study, Department of Biology , University of Rome 'Tor Vergata' , Via della Ricerca Scientifica n. 1 , 00133 Rome , Italy
| | - Irene Contini
- a Center of Molecular Anthropology for Ancient DNA Study, Department of Biology , University of Rome 'Tor Vergata' , Via della Ricerca Scientifica n. 1 , 00133 Rome , Italy
| | - Cristina Martínez-Labarga
- a Center of Molecular Anthropology for Ancient DNA Study, Department of Biology , University of Rome 'Tor Vergata' , Via della Ricerca Scientifica n. 1 , 00133 Rome , Italy
| | - Gian Franco De Stefano
- a Center of Molecular Anthropology for Ancient DNA Study, Department of Biology , University of Rome 'Tor Vergata' , Via della Ricerca Scientifica n. 1 , 00133 Rome , Italy
| | - Olga Rickards
- a Center of Molecular Anthropology for Ancient DNA Study, Department of Biology , University of Rome 'Tor Vergata' , Via della Ricerca Scientifica n. 1 , 00133 Rome , Italy
| |
Collapse
|
21
|
Wollstein A, Lao O. Detecting individual ancestry in the human genome. INVESTIGATIVE GENETICS 2015; 6:7. [PMID: 25937887 PMCID: PMC4416275 DOI: 10.1186/s13323-015-0019-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/13/2014] [Accepted: 01/12/2015] [Indexed: 01/26/2023]
Abstract
Detecting and quantifying the population substructure present in a sample of individuals are of main interest in the fields of genetic epidemiology, population genetics, and forensics among others. To date, several algorithms have been proposed for estimating the amount of genetic ancestry within an individual. In the present review, we introduce the most widely used methods in population genetics for detecting individual genetic ancestry. We further show, by means of simulations, the performance of popular algorithms for detecting individual ancestry in various controlled demographic scenarios. Finally, we provide some hints on how to interpret the results from these algorithms.
Collapse
Affiliation(s)
- Andreas Wollstein
- Department of Forensic Molecular Biology, Erasmus MC University Medical Center Rotterdam, 3000 CA Rotterdam, The Netherlands ; Section of Evolutionary Biology, Department of Biology II, University of Munich, 82152 Planegg-Martinsried, Germany
| | - Oscar Lao
- Department of Forensic Molecular Biology, Erasmus MC University Medical Center Rotterdam, 3000 CA Rotterdam, The Netherlands ; Current address: Centro Nacional de Análisis Genómico, Baldiri Reixac, 4, Barcleona Science Park - Tower I, 08028 Barcelona, Spain
| |
Collapse
|
22
|
Fine-scale human genetic structure in Western France. Eur J Hum Genet 2014; 23:831-6. [PMID: 25182131 DOI: 10.1038/ejhg.2014.175] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2013] [Revised: 07/21/2014] [Accepted: 07/30/2014] [Indexed: 11/08/2022] Open
Abstract
The difficulties arising from association analysis with rare variants underline the importance of suitable reference population cohorts, which integrate detailed spatial information. We analyzed a sample of 1684 individuals from Western France, who were genotyped at genome-wide level, from two cohorts D.E.S.I.R and CavsGen. We found that fine-scale population structure occurs at the scale of Western France, with distinct admixture proportions for individuals originating from the Brittany Region and the Vendée Department. Genetic differentiation increases with distance at a high rate in these two parts of Northwestern France and linkage disequilibrium is higher in Brittany suggesting a lower effective population size. When looking for genomic regions informative about Breton origin, we found two prominent associated regions that include the lactase region and the HLA complex. For both the lactase and the HLA regions, there is a low differentiation between Bretons and Irish, and this is also found at the genome-wide level. At a more refined scale, and within the Pays de la Loire Region, we also found evidence of fine-scale population structure, although principal component analysis showed that individuals from different departments cannot be confidently discriminated. Because of the evidence for fine-scale genetic structure in Western France, we anticipate that rare and geographically localized variants will be identified in future full-sequence analyses.
Collapse
|
23
|
Fumagalli M, Sironi M. Human genome variability, natural selection and infectious diseases. Curr Opin Immunol 2014; 30:9-16. [PMID: 24880709 DOI: 10.1016/j.coi.2014.05.001] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2014] [Revised: 04/29/2014] [Accepted: 05/02/2014] [Indexed: 01/04/2023]
Abstract
The recent availability of large-scale sequencing DNA data allowed researchers to investigate how genomic variation is distributed among populations. While demographic factors explain genome-wide population genetic diversity levels, scans for signatures of natural selection pinpointed several regions under non-neutral evolution. Recent studies found an enrichment of immune-related genes subjected to natural selection, suggesting that pathogens and infectious diseases have imposed a strong selective pressure throughout human history. Pathogen-mediated selection often targeted regulatory sites of genes belonging to the same biological pathway. Results from these studies have the potential to identify mutations that modulate infection susceptibility by integrating a population genomic approach with molecular immunology data and large-scale functional annotations.
Collapse
Affiliation(s)
- Matteo Fumagalli
- UCL Genetics Institute, Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, United Kingdom.
| | - Manuela Sironi
- Bioinformatics - Scientific Institute IRCCS E.MEDEA, 23842 Bosisio Parini, Italy
| |
Collapse
|
24
|
Population genomic analysis of ancient and modern genomes yields new insights into the genetic ancestry of the Tyrolean Iceman and the genetic structure of Europe. PLoS Genet 2014; 10:e1004353. [PMID: 24809476 PMCID: PMC4014435 DOI: 10.1371/journal.pgen.1004353] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2013] [Accepted: 03/19/2014] [Indexed: 11/24/2022] Open
Abstract
Genome sequencing of the 5,300-year-old mummy of the Tyrolean Iceman, found in 1991 on a glacier near the border of Italy and Austria, has yielded new insights into his origin and relationship to modern European populations. A key finding of that study was an apparent recent common ancestry with individuals from Sardinia, based largely on the Y chromosome haplogroup and common autosomal SNP variation. Here, we compiled and analyzed genomic datasets from both modern and ancient Europeans, including genome sequence data from over 400 Sardinians and two ancient Thracians from Bulgaria, to investigate this result in greater detail and determine its implications for the genetic structure of Neolithic Europe. Using whole-genome sequencing data, we confirm that the Iceman is, indeed, most closely related to Sardinians. Furthermore, we show that this relationship extends to other individuals from cultural contexts associated with the spread of agriculture during the Neolithic transition, in contrast to individuals from a hunter-gatherer context. We hypothesize that this genetic affinity of ancient samples from different parts of Europe with Sardinians represents a common genetic component that was geographically widespread across Europe during the Neolithic, likely related to migrations and population expansions associated with the spread of agriculture. The analysis of the genome of the Tyrolean Iceman, a 5,300 year old mummy from Central Europe, revealed a surprising recent common ancestry with modern Sardinians for this ancient genome. However, this study was limited both by the availability of data from Sardinians and by a lack of genomic data from other ancient European samples. Here, we use genomic data from modern Sardinians and from ancient European individuals from different geographic regions and cultural contexts, to demonstrate that this ancestry component is shared among individuals associated with the onset of agriculture in Europe. Our results thus suggest that the Iceman's Sardinian ancestry actually reflects a more widespread genetic component related to the migration of people during the Neolithic transition in Central Europe.
Collapse
|
25
|
Duforet-Frebourg N, Blum MGB. Nonstationary patterns of isolation-by-distance: inferring measures of local genetic differentiation with Bayesian kriging. Evolution 2014; 68:1110-23. [PMID: 24372175 PMCID: PMC4285919 DOI: 10.1111/evo.12342] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2013] [Accepted: 12/13/2013] [Indexed: 11/27/2022]
Abstract
Patterns of isolation-by-distance (IBD) arise when population differentiation increases with increasing geographic distances. Patterns of IBD are usually caused by local spatial dispersal, which explains why differences of allele frequencies between populations accumulate with distance. However, spatial variations of demographic parameters such as migration rate or population density can generate nonstationary patterns of IBD where the rate at which genetic differentiation accumulates varies across space. To characterize nonstationary patterns of IBD, we infer local genetic differentiation based on Bayesian kriging. Local genetic differentiation for a sampled population is defined as the average genetic differentiation between the sampled population and fictive neighboring populations. To avoid defining populations in advance, the method can also be applied at the scale of individuals making it relevant for landscape genetics. Inference of local genetic differentiation relies on a matrix of pairwise similarity or dissimilarity between populations or individuals such as matrices of FST between pairs of populations. Simulation studies show that maps of local genetic differentiation can reveal barriers to gene flow but also other patterns such as continuous variations of gene flow across habitat. The potential of the method is illustrated with two datasets: single nucleotide polymorphisms from human Swedish populations and dominant markers for alpine plant species.
Collapse
Affiliation(s)
- Nicolas Duforet-Frebourg
- Laboratoire TIMC-IMAG, Centre National de la Recherche Scientifique, Université Joseph Fourier, Grenoble, France
| | | |
Collapse
|
26
|
Fumagalli M. Assessing the effect of sequencing depth and sample size in population genetics inferences. PLoS One 2013; 8:e79667. [PMID: 24260275 PMCID: PMC3832539 DOI: 10.1371/journal.pone.0079667] [Citation(s) in RCA: 90] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2013] [Accepted: 09/23/2013] [Indexed: 12/31/2022] Open
Abstract
Next-Generation Sequencing (NGS) technologies have dramatically revolutionised research in many fields of genetics. The ability to sequence many individuals from one or multiple populations at a genomic scale has greatly enhanced population genetics studies and made it a data-driven discipline. Recently, researchers have proposed statistical modelling to address genotyping uncertainty associated with NGS data. However, an ongoing debate is whether it is more beneficial to increase the number of sequenced individuals or the per-sample sequencing depth for estimating genetic variation. Through extensive simulations, I assessed the accuracy of estimating nucleotide diversity, detecting polymorphic sites, and predicting population structure under different experimental scenarios. Results show that the greatest accuracy for estimating population genetics parameters is achieved by employing a large sample size, despite single individuals being sequenced at low depth. Under some circumstances, the minimum sequencing depth for obtaining accurate estimates of allele frequencies and to identify polymorphic sites is [Formula: see text], where both alleles are more likely to have been sequenced. On the other hand, inferences of population structure are more accurate at very large sample sizes, even with extremely low sequencing depth. This all points to the conclusion that under various experimental scenarios, in cost-limited population genetics studies, large sample sizes at low sequencing depth are desirable to achieve high accuracy. These findings will help researchers design their experimental set-ups and guide further investigation on the effect of protocol design for genetic research.
Collapse
Affiliation(s)
- Matteo Fumagalli
- Department of Integrative Biology, University of California, Berkeley, California, United States of America
| |
Collapse
|