1
|
Forien R, Ringbauer H, Coop G. Demographic inference for spatially heterogeneous populations using long shared haplotypes. Theor Popul Biol 2024; 159:108-124. [PMID: 38492811 DOI: 10.1016/j.tpb.2024.03.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 03/04/2024] [Accepted: 03/12/2024] [Indexed: 03/18/2024]
Abstract
We introduce a modified spatial Λ-Fleming-Viot process to model the ancestry of individuals in a population occupying a continuous spatial habitat divided into two areas by a sharp discontinuity of the dispersal rate and effective population density. We derive an analytical formula for the expected number of shared haplotype segments between two individuals depending on their sampling locations. This formula involves the transition density of a skew diffusion which appears as a scaling limit of the ancestral lineages of individuals in this model. We then show that this formula can be used to infer the dispersal parameters and the effective population density of both regions, using a composite likelihood approach, and we demonstrate the efficiency of this method on a range of simulated data sets.
Collapse
Affiliation(s)
- Raphaël Forien
- INRAE - BioSP, Centre INRAE PACA, 228 route de l'aérodrome, Domaine St-Paul - Site Agroparc, 84914, Avignon Cedex 9, France.
| | - Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103, Leipzig, Germany.
| | - Graham Coop
- Center for Population Biology, Department of Evolution and Ecology, University of California, 2320 Storer Hall, CA 95616, Davis, United States.
| |
Collapse
|
2
|
Huang Z, Kelleher J, Chan YB, Balding DJ. Estimating evolutionary and demographic parameters via ARG-derived IBD. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.07.583855. [PMID: 38559261 PMCID: PMC10979897 DOI: 10.1101/2024.03.07.583855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Inference of demographic and evolutionary parameters from a sample of genome sequences often proceeds by first inferring identical-by-descent (IBD) genome segments. By exploiting efficient data encoding based on the ancestral recombination graph (ARG), we obtain three major advantages over current approaches: (i) no need to impose a length threshold on IBD segments, (ii) IBD can be defined without the hard-to-verify requirement of no recombination, and (iii) computation time can be reduced with little loss of statistical efficiency using only the IBD segments from a set of sequence pairs that scales linearly with sample size. We first demonstrate powerful inferences when true IBD information is available from simulated data. For IBD inferred from real data, we propose an approximate Bayesian computation inference algorithm and use it to show that poorly-inferred short IBD segments can improve estimation precision. We show estimation precision similar to a previously-published estimator despite a 4 000-fold reduction in data used for inference. Computational cost limits model complexity in our approach, but we are able to incorporate unknown nuisance parameters and model misspecification, still finding improved parameter inference.
Collapse
Affiliation(s)
- Zhendong Huang
- Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Australia
| | - Jerome Kelleher
- Oxford Big Data Institute, University of Oxford, United Kingdom
| | - Yao-ban Chan
- Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Australia
| | - David J. Balding
- Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Australia
| |
Collapse
|
3
|
Freudiger A, Jovanovic VM, Huang Y, Snyder-Mackler N, Conrad DF, Miller B, Montague MJ, Westphal H, Stadler PF, Bley S, Horvath JE, Brent LJN, Platt ML, Ruiz-Lambides A, Tung J, Nowick K, Ringbauer H, Widdig A. Taking identity-by-descent analysis into the wild: Estimating realized relatedness in free-ranging macaques. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.09.574911. [PMID: 38260273 PMCID: PMC10802400 DOI: 10.1101/2024.01.09.574911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Biological relatedness is a key consideration in studies of behavior, population structure, and trait evolution. Except for parent-offspring dyads, pedigrees capture relatedness imperfectly. The number and length of DNA segments that are identical-by-descent (IBD) yield the most precise estimates of relatedness. Here, we leverage novel methods for estimating locus-specific IBD from low coverage whole genome resequencing data to demonstrate the feasibility and value of resolving fine-scaled gradients of relatedness in free-living animals. Using primarily 4-6× coverage data from a rhesus macaque (Macaca mulatta) population with available long-term pedigree data, we show that we can call the number and length of IBD segments across the genome with high accuracy even at 0.5× coverage. The resulting estimates demonstrate substantial variation in genetic relatedness within kin classes, leading to overlapping distributions between kin classes. They identify cryptic genetic relatives that are not represented in the pedigree and reveal elevated recombination rates in females relative to males, which allows us to discriminate maternal and paternal kin using genotype data alone. Our findings represent a breakthrough in the ability to understand the predictors and consequences of genetic relatedness in natural populations, contributing to our understanding of a fundamental component of population structure in the wild.
Collapse
Affiliation(s)
- Annika Freudiger
- Behavioral Ecology Research Group, Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Vladimir M Jovanovic
- Human Biology and Primate Evolution, Institut für Zoologie, Freie Universität Berlin, Berlin, Germany
- Bioinformatics Solution Center, Freie Universität Berlin, Berlin, Germany
| | - Yilei Huang
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany
| | - Noah Snyder-Mackler
- Center for Evolution & Medicine, School of Life Sciences, Arizona State University, Tempe, USA
| | - Donald F Conrad
- Division of Genetics, Oregon National Primate Research Center, Portland, Oregon, USA
| | - Brian Miller
- Division of Genetics, Oregon National Primate Research Center, Portland, Oregon, USA
| | - Michael J Montague
- Department of Neuroscience, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Hendrikje Westphal
- Behavioral Ecology Research Group, Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany
| | - Peter F Stadler
- Bioinformatics Group, Institute of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, Austria
- Facultad de Ciencias, Universidad Nacional de Colombia, Bogotá, Colombia
- Santa Fe Institute, Santa Fe, NM, USA
| | - Stefanie Bley
- Behavioral Ecology Research Group, Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Julie E Horvath
- Department of Biological and Biomedical Sciences, North Carolina Central University, North Carolina, Durham, USA
- Research and Collections Section, North Carolina Museum of Natural Sciences, North Carolina, Raleigh, USA
- Department of Biological Sciences, North Carolina State University, North Carolina, Raleigh, USA
- Department of Evolutionary Anthropology, Duke University, North Carolina, Durham, USA
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Lauren J N Brent
- Centre for Research in Animal Behaviour, University of Exeter, Exeter, UK
| | - Michael L Platt
- Department of Neuroscience, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Marketing Department, the Wharton School of Business, University of Pennsylvania, Philadelphia, PA, USA
- Department of Psychology, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
| | - Angelina Ruiz-Lambides
- Cayo Santiago Field Station, Caribbean Primate Research Center, University of Puerto Rico, Punta Santiago, Puerto Rico
| | - Jenny Tung
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Department of Evolutionary Anthropology, Duke University, North Carolina, Durham, USA
- Department of Biology, Duke University, Durham, North Carolina, USA
- Duke University Population Research Institute, Durham, North Carolina, USA
| | - Katja Nowick
- Human Biology and Primate Evolution, Institut für Zoologie, Freie Universität Berlin, Berlin, Germany
- Bioinformatics Solution Center, Freie Universität Berlin, Berlin, Germany
| | - Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Anja Widdig
- Behavioral Ecology Research Group, Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany
| |
Collapse
|
4
|
Ringbauer H, Huang Y, Akbari A, Mallick S, Olalde I, Patterson N, Reich D. Accurate detection of identity-by-descent segments in human ancient DNA. Nat Genet 2024; 56:143-151. [PMID: 38123640 PMCID: PMC10786714 DOI: 10.1038/s41588-023-01582-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 10/20/2023] [Indexed: 12/23/2023]
Abstract
Long DNA segments shared between two individuals, known as identity-by-descent (IBD), reveal recent genealogical connections. Here we introduce ancIBD, a method for identifying IBD segments in ancient human DNA (aDNA) using a hidden Markov model and imputed genotype probabilities. We demonstrate that ancIBD accurately identifies IBD segments >8 cM for aDNA data with an average depth of >0.25× for whole-genome sequencing or >1× for 1240k single nucleotide polymorphism capture data. Applying ancIBD to 4,248 ancient Eurasian individuals, we identify relatives up to the sixth degree and genealogical connections between archaeological groups. Notably, we reveal long IBD sharing between Corded Ware and Yamnaya groups, indicating that the Yamnaya herders of the Pontic-Caspian Steppe and the Steppe-related ancestry in various European Corded Ware groups share substantial co-ancestry within only a few hundred years. These results show that detecting IBD segments can generate powerful insights into the growing aDNA record, both on a small scale relevant to life stories and on a large scale relevant to major cultural-historical events.
Collapse
Affiliation(s)
- Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany.
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA.
| | - Yilei Huang
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, Universität Leipzig, Leipzig, Germany
| | - Ali Akbari
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Swapan Mallick
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Iñigo Olalde
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- BIOMICs Research Group, University of the Basque Country, Vitoria-Gasteiz, Spain
- Ikerbasque-Basque Foundation of Science, Bilbao, Spain
| | - Nick Patterson
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA.
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
5
|
Dinh BL, Tang E, Taparra K, Nakatsuka N, Chen F, Chiang CWK. Recombination map tailored to Native Hawaiians may improve robustness of genomic scans for positive selection. Hum Genet 2024; 143:85-99. [PMID: 38157018 PMCID: PMC10794367 DOI: 10.1007/s00439-023-02625-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Accepted: 11/25/2023] [Indexed: 01/03/2024]
Abstract
Recombination events establish the patterns of haplotypic structure in a population and estimates of recombination rates are used in several downstream population and statistical genetic analyses. Using suboptimal maps from distantly related populations may reduce the efficacy of genomic analyses, particularly for underrepresented populations such as the Native Hawaiians. To overcome this challenge, we constructed recombination maps using genome-wide array data from two study samples of Native Hawaiians: one reflecting the current admixed state of Native Hawaiians (NH map) and one based on individuals of enriched Polynesian ancestries (PNS map) with the potential to be used for less admixed Polynesian populations such as the Samoans. We found the recombination landscape to be less correlated with those from other continental populations (e.g. Spearman's rho = 0.79 between PNS and CEU (Utah residents with Northern and Western European ancestry) compared to 0.92 between YRI (Yoruba in Ibadan, Nigeria) and CEU at 50 kb resolution), likely driven by the unique demographic history of the Native Hawaiians. PNS also shared the fewest recombination hotspots with other populations (e.g. 8% of hotspots shared between PNS and CEU compared to 27% of hotspots shared between YRI and CEU). We found that downstream analyses in the Native Hawaiian population, such as local ancestry inference, imputation, and IBD segment and relatedness detections, would achieve similar efficacy when using the NH map compared to an omnibus map. However, for genome scans of adaptive loci using integrated haplotype scores, we found several loci with apparent genome-wide significant signals (|Z-score|> 4) in Native Hawaiians that would not have been significant when analyzed using NH-specific maps. Population-specific recombination maps may therefore improve the robustness of haplotype-based statistics and help us better characterize the evolutionary history that may underlie Native Hawaiian-specific health conditions that persist today.
Collapse
Affiliation(s)
- Bryan L Dinh
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Echo Tang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Kekoa Taparra
- Department of Radiation Oncology, Stanford University, Palo Alto, CA, USA
| | | | - Fei Chen
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Charleston W K Chiang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
6
|
Dinh BL, Tang E, Taparra K, Nakatsuka N, Chen F, Chiang CWK. Recombination map tailored to Native Hawaiians improves robustness of genomic scans for positive selection. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.12.548735. [PMID: 37503129 PMCID: PMC10370006 DOI: 10.1101/2023.07.12.548735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Recombination events establish the patterns of haplotypic structure in a population and estimates of recombination rates are used in several downstream population and statistical genetic analyses. Using suboptimal maps from distantly related populations may reduce the efficacy of genomic analyses, particularly for underrepresented populations such as the Native Hawaiians. To overcome this challenge, we constructed recombination maps using genome-wide array data from two study samples of Native Hawaiians: one reflecting the current admixed state of Native Hawaiians (NH map), and one based on individuals of enriched Polynesian ancestries (PNS map) with the potential to be used for less admixed Polynesian populations such as the Samoans. We found the recombination landscape to be less correlated with those from other continental populations (e.g. Spearman's rho = 0.79 between PNS and CEU (Utah residents with Northern and Western European ancestry) compared to 0.92 between YRI (Yoruba in Ibadan, Nigeria) and CEU at 50 kb resolution), likely driven by the unique demographic history of the Native Hawaiians. PNS also shared the fewest recombination hotspots with other populations (e.g. 8% of hotspots shared between PNS and CEU compared to 27% of hotspots shared between YRI and CEU). We found that downstream analyses in the Native Hawaiian population, such as local ancestry inference, imputation, and IBD segment and relatedness detections, would achieve similar efficacy when using the NH map compared to an omnibus map. However, for genome scans of adaptive loci using integrated haplotype scores, we found several loci with apparent genome-wide significant signals (|Z-score| > 4) in Native Hawaiians that would not have been significant when analyzed using NH-specific maps. Population-specific recombination maps may therefore improve the robustness of haplotype-based statistics and help us better characterize the evolutionary history that may underlie Native Hawaiian-specific health conditions that persist today.
Collapse
Affiliation(s)
- Bryan L Dinh
- Department of Quantitative and Computational Biology, University of Southern California
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Echo Tang
- Department of Quantitative and Computational Biology, University of Southern California
| | - Kekoa Taparra
- Department of Radiation Oncology, Stanford University, Palo Alto, California
| | | | - Fei Chen
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Charleston W K Chiang
- Department of Quantitative and Computational Biology, University of Southern California
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| |
Collapse
|
7
|
Moorjani P, Hellenthal G. Methods for Assessing Population Relationships and History Using Genomic Data. Annu Rev Genomics Hum Genet 2023; 24:305-332. [PMID: 37220313 PMCID: PMC11040641 DOI: 10.1146/annurev-genom-111422-025117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Genetic data contain a record of our evolutionary history. The availability of large-scale datasets of human populations from various geographic areas and timescales, coupled with advances in the computational methods to analyze these data, has transformed our ability to use genetic data to learn about our evolutionary past. Here, we review some of the widely used statistical methods to explore and characterize population relationships and history using genomic data. We describe the intuition behind commonly used approaches, their interpretation, and important limitations. For illustration, we apply some of these techniques to genome-wide autosomal data from 929 individuals representing 53 worldwide populations that are part of the Human Genome Diversity Project. Finally, we discuss the new frontiers in genomic methods to learn about population history. In sum, this review highlights the power (and limitations) of DNA to infer features of human evolutionary history, complementing the knowledge gleaned from other disciplines, such as archaeology, anthropology, and linguistics.
Collapse
Affiliation(s)
- Priya Moorjani
- Department of Molecular and Cell Biology and Center for Computational Biology, University of California, Berkeley, California, USA;
| | - Garrett Hellenthal
- UCL Genetics Institute and Research Department of Genetics, Evolution, and Environment, University College London, London, United Kingdom;
| |
Collapse
|
8
|
Forien R, Ringbauer H, Coop G. Demographic inference for spatially heterogeneous populations using long shared haplotypes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.13.544589. [PMID: 37398501 PMCID: PMC10312651 DOI: 10.1101/2023.06.13.544589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
We introduce a modified spatial Λ-Fleming-Viot process to model the ancestry of individuals in a population occupying a continuous spatial habitat divided into two areas by a sharp discontinuity of the dispersal rate and effective population density. We derive an analytical formula for the expected number of shared haplotype segments between two individuals depending on their sampling locations. This formula involves the transition density of a skew diffusion which appears as a scaling limit of the ancestral lineages of individuals in this model. We then show that this formula can be used to infer the dispersal parameters and the effective population density of both regions, using a composite likelihood approach, and we demonstrate the efficiency of this method on a range of simulated data sets.
Collapse
Affiliation(s)
- Raphaël Forien
- INRAE - BioSP, Centre INRAE PACA, 228 route de l’aérodrome, Domaine St-Paul - Site Agroparc, 84914, Avignon Cedex 9, France
| | - Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103, Leipzig, Germany
| | - Graham Coop
- Center for Population Biology, Department of Evolution and Ecology, University of California, 2320 Storer Hall, CA 95616, Davis, United States
| |
Collapse
|
9
|
Ringbauer H, Huang Y, Akbari A, Mallick S, Patterson N, Reich D. ancIBD - Screening for identity by descent segments in human ancient DNA. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.08.531671. [PMID: 36945531 PMCID: PMC10028887 DOI: 10.1101/2023.03.08.531671] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Long DNA sequences shared between two individuals, known as Identical by descent (IBD) segments, are a powerful signal for identifying close and distant biological relatives because they only arise when the pair shares a recent common ancestor. Existing methods to call IBD segments between present-day genomes cannot be straightforwardly applied to ancient DNA data (aDNA) due to typically low coverage and high genotyping error rates. We present ancIBD, a method to identify IBD segments for human aDNA data implemented as a Python package. Our approach is based on a Hidden Markov Model, using as input genotype probabilities imputed based on a modern reference panel of genomic variation. Through simulation and downsampling experiments, we demonstrate that ancIBD robustly identifies IBD segments longer than 8 centimorgan for aDNA data with at least either 0.25x average whole-genome sequencing (WGS) coverage depth or at least 1x average depth for in-solution enrichment experiments targeting a widely used aDNA SNP set ('1240k'). This application range allows us to screen a substantial fraction of the aDNA record for IBD segments and we showcase two downstream applications. First, leveraging the fact that biological relatives up to the sixth degree are expected to share multiple long IBD segments, we identify relatives between 10,156 ancient Eurasian individuals and document evidence of long-distance migration, for example by identifying a pair of two approximately fifth-degree relatives who were buried 1410km apart in Central Asia 5000 years ago. Second, by applying ancIBD, we reveal new details regarding the spread of ancestry related to Steppe pastoralists into Europe starting 5000 years ago. We find that the first individuals in Central and Northern Europe carrying high amounts of Steppe-ancestry, associated with the Corded Ware culture, share high rates of long IBD (12-25 cM) with Yamnaya herders of the Pontic-Caspian steppe, signaling a strong bottleneck and a recent biological connection on the order of only few hundred years, providing evidence that the Yamnaya themselves are a main source of Steppe ancestry in Corded Ware people. We also detect elevated sharing of long IBD segments between Corded Ware individuals and people associated with the Globular Amphora culture (GAC) from Poland and Ukraine, who were Copper Age farmers not yet carrying Steppe-like ancestry. These IBD links appear for all Corded Ware groups in our analysis, indicating that individuals related to GAC contexts must have had a major demographic impact early on in the genetic admixtures giving rise to various Corded Ware groups across Europe. These results show that detecting IBD segments in aDNA can generate new insights both on a small scale, relevant to understanding the life stories of people, and on the macroscale, relevant to large-scale cultural-historical events.
Collapse
Affiliation(s)
- Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Yilei Huang
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, Universität Leipzig, Leipzig, Germanÿ
| | - Ali Akbari
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Swapan Mallick
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Nick Patterson
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
10
|
Fortes-Lima C, Tříska P, Čížková M, Podgorná E, Diallo MY, Schlebusch CM, Černý V. Demographic and Selection Histories of Populations Across the Sahel/Savannah Belt. Mol Biol Evol 2022; 39:6731090. [PMID: 36173804 PMCID: PMC9582163 DOI: 10.1093/molbev/msac209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
The Sahel/Savannah belt harbors diverse populations with different demographic histories and different subsistence patterns. However, populations from this large African region are notably under-represented in genomic research. To investigate the population structure and adaptation history of populations from the Sahel/Savannah space, we generated dense genome-wide genotype data of 327 individuals-comprising 14 ethnolinguistic groups, including 10 previously unsampled populations. Our results highlight fine-scale population structure and complex patterns of admixture, particularly in Fulani groups and Arabic-speaking populations. Among all studied Sahelian populations, only the Rashaayda Arabic-speaking population from eastern Sudan shows a lack of gene flow from African groups, which is consistent with the short history of this population in the African continent. They are recent migrants from Saudi Arabia with evidence of strong genetic isolation during the last few generations and a strong demographic bottleneck. This population also presents a strong selection signal in a genomic region around the CNR1 gene associated with substance dependence and chronic stress. In Western Sahelian populations, signatures of selection were detected in several other genetic regions, including pathways associated with lactase persistence, immune response, and malaria resistance. Taken together, these findings refine our current knowledge of genetic diversity, population structure, migration, admixture and adaptation of human populations in the Sahel/Savannah belt and contribute to our understanding of human history and health.
Collapse
Affiliation(s)
- Cesar Fortes-Lima
- Human Evolution, Department of Organismal Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Petr Tříska
- Archaeogenetics Laboratory, Institute of Archaeology of the Czech Academy of Sciences, Prague, Czech Republic
| | - Martina Čížková
- Archaeogenetics Laboratory, Institute of Archaeology of the Czech Academy of Sciences, Prague, Czech Republic
| | - Eliška Podgorná
- Archaeogenetics Laboratory, Institute of Archaeology of the Czech Academy of Sciences, Prague, Czech Republic
| | - Mame Yoro Diallo
- Archaeogenetics Laboratory, Institute of Archaeology of the Czech Academy of Sciences, Prague, Czech Republic,Department of Anthropology and Human Genetics, Faculty of Science, Charles University, Prague, Czech Republic
| | | | | |
Collapse
|
11
|
Fan C, Mancuso N, Chiang CWK. A genealogical estimate of genetic relationships. Am J Hum Genet 2022; 109:812-824. [PMID: 35417677 PMCID: PMC9118131 DOI: 10.1016/j.ajhg.2022.03.016] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 03/25/2022] [Indexed: 12/23/2022] Open
Abstract
The application of genetic relationships among individuals, characterized by a genetic relationship matrix (GRM), has far-reaching effects in human genetics. However, the current standard to calculate the GRM treats linked markers as independent and does not explicitly model the underlying genealogical history of the study sample. Here, we propose a coalescent-informed framework, namely the expected GRM (eGRM), to infer the expected relatedness between pairs of individuals given an ancestral recombination graph (ARG) of the sample. Through extensive simulations, we show that the eGRM is an unbiased estimate of latent pairwise genome-wide relatedness and is robust when computed with ARG inferred from incomplete genetic data. As a result, the eGRM better captures the structure of a population than the canonical GRM, even when using the same genetic information. More importantly, our framework allows a principled approach to estimate the eGRM at different time depths of the ARG, thereby revealing the time-varying nature of population structure in a sample. When applied to SNP array genotypes from a population sample from Northern and Eastern Finland, we find that clustering analysis with the eGRM reveals population structure driven by subpopulations that would not be apparent via the canonical GRM and that temporally the population model is consistent with recent divergence and expansion. Taken together, our proposed eGRM provides a robust tree-centric estimate of relatedness with wide application to genetic studies.
Collapse
Affiliation(s)
- Caoqi Fan
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| | - Nicholas Mancuso
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Charleston W K Chiang
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
12
|
Sticca EL, Belbin GM, Gignoux CR. Current Developments in Detection of Identity-by-Descent Methods and Applications. Front Genet 2021; 12:722602. [PMID: 34567074 PMCID: PMC8461052 DOI: 10.3389/fgene.2021.722602] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 08/24/2021] [Indexed: 01/23/2023] Open
Abstract
Identity-by-descent (IBD), the detection of shared segments inherited from a common ancestor, is a fundamental concept in genomics with broad applications in the characterization and analysis of genomes. While historically the concept of IBD was extensively utilized through linkage analyses and in studies of founder populations, applications of IBD-based methods subsided during the genome-wide association study era. This was primarily due to the computational expense of IBD detection, which becomes increasingly relevant as the field moves toward the analysis of biobank-scale datasets that encompass individuals from highly diverse backgrounds. To address these computational barriers, the past several years have seen new methodological advances enabling IBD detection for datasets in the hundreds of thousands to millions of individuals, enabling novel analyses at an unprecedented scale. Here, we describe the latest innovations in IBD detection and describe opportunities for the application of IBD-based methods across a broad range of questions in the field of genomics.
Collapse
Affiliation(s)
- Evan L Sticca
- Human Medical Genetics and Genomics Program and Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Gillian M Belbin
- Institute for Genomic Health, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Christopher R Gignoux
- Human Medical Genetics and Genomics Program and Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| |
Collapse
|
13
|
Parental relatedness through time revealed by runs of homozygosity in ancient DNA. Nat Commun 2021; 12:5425. [PMID: 34521843 PMCID: PMC8440622 DOI: 10.1038/s41467-021-25289-w] [Citation(s) in RCA: 81] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 07/21/2021] [Indexed: 02/08/2023] Open
Abstract
Parental relatedness of present-day humans varies substantially across the globe, but little is known about the past. Here we analyze ancient DNA, leveraging that parental relatedness leaves genomic traces in the form of runs of homozygosity. We present an approach to identify such runs in low-coverage ancient DNA data aided by haplotype information from a modern phased reference panel. Simulation and experiments show that this method robustly detects runs of homozygosity longer than 4 centimorgan for ancient individuals with at least 0.3 × coverage. Analyzing genomic data from 1,785 ancient humans who lived in the last 45,000 years, we detect low rates of first cousin or closer unions across most ancient populations. Moreover, we find a marked decay in background parental relatedness co-occurring with or shortly after the advent of sedentary agriculture. We observe this signal, likely linked to increasing local population sizes, across several geographic transects worldwide.
Collapse
|
14
|
Freyman WA, McManus KF, Shringarpure SS, Jewett EM, Bryc K, Auton A. Fast and Robust Identity-by-Descent Inference with the Templated Positional Burrows-Wheeler Transform. Mol Biol Evol 2021; 38:2131-2151. [PMID: 33355662 PMCID: PMC8097300 DOI: 10.1093/molbev/msaa328] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Estimating the genomic location and length of identical-by-descent (IBD) segments among individuals is a crucial step in many genetic analyses. However, the exponential growth in the size of biobank and direct-to-consumer genetic data sets makes accurate IBD inference a significant computational challenge. Here we present the templated positional Burrows-Wheeler transform (TPBWT) to make fast IBD estimates robust to genotype and phasing errors. Using haplotype data simulated over pedigrees with realistic genotyping and phasing errors, we show that the TPBWT outperforms other state-of-the-art IBD inference algorithms in terms of speed and accuracy. For each phase-aware method, we explore the false positive and false negative rates of inferring IBD by segment length and characterize the types of error commonly found. Our results highlight the fragility of most phased IBD inference methods; the accuracy of IBD estimates can be highly sensitive to the quality of haplotype phasing. Additionally, we compare the performance of the TPBWT against a widely used phase-free IBD inference approach that is robust to phasing errors. We introduce both in-sample and out-of-sample TPBWT-based IBD inference algorithms and demonstrate their computational efficiency on massive-scale data sets with millions of samples. Furthermore, we describe the binary file format for TPBWT-compressed haplotypes that results in fast and efficient out-of-sample IBD computes against very large cohort panels. Finally, we demonstrate the utility of the TPBWT in a brief empirical analysis, exploring geographic patterns of haplotype sharing within Mexico. Hierarchical clustering of IBD shared across regions within Mexico reveals geographically structured haplotype sharing and a strong signal of isolation by distance. Our software implementation of the TPBWT is freely available for noncommercial use in the code repository (https://github.com/23andMe/phasedibd, last accessed January 11, 2021).
Collapse
|
15
|
Sengupta D, Choudhury A, Fortes-Lima C, Aron S, Whitelaw G, Bostoen K, Gunnink H, Chousou-Polydouri N, Delius P, Tollman S, Gómez-Olivé FX, Norris S, Mashinya F, Alberts M, Hazelhurst S, Schlebusch CM, Ramsay M. Genetic substructure and complex demographic history of South African Bantu speakers. Nat Commun 2021; 12:2080. [PMID: 33828095 PMCID: PMC8027885 DOI: 10.1038/s41467-021-22207-y] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 02/10/2021] [Indexed: 02/01/2023] Open
Abstract
South Eastern Bantu-speaking (SEB) groups constitute more than 80% of the population in South Africa. Despite clear linguistic and geographic diversity, the genetic differences between these groups have not been systematically investigated. Based on genome-wide data of over 5000 individuals, representing eight major SEB groups, we provide strong evidence for fine-scale population structure that broadly aligns with geographic distribution and is also congruent with linguistic phylogeny (separation of Nguni, Sotho-Tswana and Tsonga speakers). Although differential Khoe-San admixture plays a key role, the structure persists after Khoe-San ancestry-masking. The timing of admixture, levels of sex-biased gene flow and population size dynamics also highlight differences in the demographic histories of individual groups. The comparisons with five Iron Age farmer genomes further support genetic continuity over ~400 years in certain regions of the country. Simulated trait genome-wide association studies further show that the observed population structure could have major implications for biomedical genomics research in South Africa.
Collapse
Affiliation(s)
- Dhriti Sengupta
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Ananyo Choudhury
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Cesar Fortes-Lima
- Human Evolution, Department of Organismal Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Shaun Aron
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Gavin Whitelaw
- KwaZulu-Natal Museum, Pietermaritzburg, South Africa
- School of Geography, Archaeology & Environmental Studies, University of the Witwatersrand, Johannesburg, South Africa
| | - Koen Bostoen
- UGent Centre for Bantu Studies, Department of Languages and Cultures, Ghent University, Ghent, Belgium
| | - Hilde Gunnink
- UGent Centre for Bantu Studies, Department of Languages and Cultures, Ghent University, Ghent, Belgium
| | - Natalia Chousou-Polydouri
- Department of Comparative Linguistic Science and Center for the Interdisciplinary Study of Language Evolution, University of Zürich, Zürich, Switzerland
| | - Peter Delius
- Department of History, University of the Witwatersrand, Johannesburg, South Africa
| | - Stephen Tollman
- MRC/Wits Rural Public Health and Health Transitions Research Unit (Agincourt), School of Public Health, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - F Xavier Gómez-Olivé
- MRC/Wits Rural Public Health and Health Transitions Research Unit (Agincourt), School of Public Health, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Shane Norris
- MRC/Wits Developmental Pathways for Health Research Unit, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Felistas Mashinya
- Department of Pathology and Medical Sciences; School of Health Care Sciences, Faculty of Health Sciences, University of Limpopo, Polokwane, South Africa
| | - Marianne Alberts
- Department of Pathology and Medical Sciences; School of Health Care Sciences, Faculty of Health Sciences, University of Limpopo, Polokwane, South Africa
| | - Scott Hazelhurst
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
- School of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg, South Africa
| | - Carina M Schlebusch
- Human Evolution, Department of Organismal Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
- SciLifeLab, Uppsala, Sweden
- Palaeo-Research Institute, University of Johannesburg, Johannesburg, South Africa
| | - Michèle Ramsay
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa.
- Division of Human Genetics, National Health Laboratory Service and School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa.
| |
Collapse
|
16
|
Seidensticker D, Hubau W, Verschuren D, Fortes-Lima C, de Maret P, Schlebusch CM, Bostoen K. Population collapse in Congo rainforest from 400 CE urges reassessment of the Bantu Expansion. SCIENCE ADVANCES 2021; 7:7/7/eabd8352. [PMID: 33579711 PMCID: PMC7880602 DOI: 10.1126/sciadv.abd8352] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 12/28/2020] [Indexed: 06/12/2023]
Abstract
The present-day distribution of Bantu languages is commonly thought to reflect the early stages of the Bantu Expansion, the greatest migration event in African prehistory. Using 1149 radiocarbon dates linked to 115 pottery styles recovered from 726 sites throughout the Congo rainforest and adjacent areas, we show that this is not the case. Two periods of more intense human activity, each consisting of an expansion phase with widespread pottery styles and a regionalization phase with many more local pottery styles, are separated by a widespread population collapse between 400 and 600 CE followed by major resettlement centuries later. Coinciding with wetter climatic conditions, the collapse was possibly promoted by a prolonged epidemic. Comparison of our data with genetic and linguistic evidence further supports a spread-over-spread model for the dispersal of Bantu speakers and their languages.
Collapse
Affiliation(s)
- Dirk Seidensticker
- Department of Languages and Cultures, BantUGent-UGent Centre for Bantu Studies, Ghent University, Ghent, Belgium.
| | - Wannes Hubau
- Department of Languages and Cultures, BantUGent-UGent Centre for Bantu Studies, Ghent University, Ghent, Belgium.
- Royal Museum for Central Africa, Service of Wood Biology, Tervuren, Belgium
| | - Dirk Verschuren
- Department of Biology, Limnology Unit, Ghent University, Ghent, Belgium
| | - Cesar Fortes-Lima
- Department of Organismal Biology, Human Evolution, Uppsala University, Uppsala, Sweden
| | - Pierre de Maret
- Faculté de Philosophie et Sciences sociales, Université libre de Bruxelles, Brussels, Belgium
| | - Carina M Schlebusch
- Department of Organismal Biology, Human Evolution, Uppsala University, Uppsala, Sweden
- Palaeo-Research Institute, University of Johannesburg, Auckland Park, South Africa
- SciLifeLab, Uppsala, Sweden
| | - Koen Bostoen
- Department of Languages and Cultures, BantUGent-UGent Centre for Bantu Studies, Ghent University, Ghent, Belgium
| |
Collapse
|
17
|
Kling D, Phillips C, Kennett D, Tillmar A. Investigative genetic genealogy: Current methods, knowledge and practice. Forensic Sci Int Genet 2021; 52:102474. [PMID: 33592389 DOI: 10.1016/j.fsigen.2021.102474] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 01/12/2021] [Accepted: 01/27/2021] [Indexed: 12/15/2022]
Abstract
Investigative genetic genealogy (IGG) has emerged as a new, rapidly growing field of forensic science. We describe the process whereby dense SNP data, commonly comprising more than half a million markers, are employed to infer distant relationships. By distant we refer to degrees of relatedness exceeding that of first cousins. We review how methods of relationship matching and SNP analysis on an enlarged scale are used in a forensic setting to identify a suspect in a criminal investigation or a missing person. There is currently a strong need in forensic genetics not only to understand the underlying models to infer relatedness but also to fully explore the DNA technologies and data used in IGG. This review brings together many of the topics and examines their effectiveness and operational limits, while suggesting future directions for their forensic validation. We further investigated the methods used by the major direct-to-consumer (DTC) genetic ancestry testing companies as well as submitting a questionnaire where providers of forensic genetic genealogy summarized their operation/services. Although most of the DTC market, and genetic genealogy in general, has undisclosed, proprietary algorithms we review the current knowledge where information has been discussed and published more openly.
Collapse
Affiliation(s)
- Daniel Kling
- Department of Forensic Genetics and Forensic Toxicology, National Board of Forensic Medicine, Linköping, Sweden; Department of Forensic Sciences, Oslo University Hospital, Oslo, Norway.
| | - Christopher Phillips
- Forensic Genetics Unit, Institute of Forensic Sciences, University of Santiago de Compostela, Santiago de Compostela, Spain.
| | - Debbie Kennett
- Research Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, United Kingdom
| | - Andreas Tillmar
- Department of Forensic Genetics and Forensic Toxicology, National Board of Forensic Medicine, Linköping, Sweden; Department of Biomedical and Clinical Sciences, Faculty of Medicine and Health Sciences, Linköping University, Linköping, Sweden
| |
Collapse
|
18
|
Browning SR, Browning BL. Probabilistic Estimation of Identity by Descent Segment Endpoints and Detection of Recent Selection. Am J Hum Genet 2020; 107:895-910. [PMID: 33053335 PMCID: PMC7553009 DOI: 10.1016/j.ajhg.2020.09.010] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 09/25/2020] [Indexed: 12/18/2022] Open
Abstract
Most methods for fast detection of identity by descent (IBD) segments report identity by state segments without any quantification of the uncertainty in the endpoints and lengths of the IBD segments. We present a method for determining the posterior probability distribution of IBD segment endpoints. Our approach accounts for genotype errors, recent mutations, and gene conversions which disrupt DNA sequence identity within IBD segments, and it can be applied to large cohorts with whole-genome sequence or SNP array data. We find that our method's estimates of uncertainty are well calibrated for homogeneous samples. We quantify endpoint uncertainty for 77.7 billion IBD segments from 408,883 individuals of white British ancestry in the UK Biobank, and we use these IBD segments to find regions showing evidence of recent natural selection. We show that many spurious selection signals are eliminated by the use of unbiased estimates of IBD segment endpoints and a pedigree-based genetic map. Eleven of the twelve regions with the greatest evidence for recent selection in our scan have been identified as selected in previous analyses using different approaches. Our computationally efficient method for quantifying IBD segment endpoint uncertainty is implemented in the open source ibd-ends software package.
Collapse
Affiliation(s)
- Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
| | - Brian L Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA; Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
19
|
Zhou Y, Browning BL, Browning SR. Population-Specific Recombination Maps from Segments of Identity by Descent. Am J Hum Genet 2020; 107:137-148. [PMID: 32533945 PMCID: PMC7332656 DOI: 10.1016/j.ajhg.2020.05.016] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Accepted: 05/20/2020] [Indexed: 12/26/2022] Open
Abstract
Recombination rates vary significantly across the genome, and estimates of recombination rates are needed for downstream analyses such as haplotype phasing and genotype imputation. Existing methods for recombination rate estimation are limited by insufficient amounts of informative genetic data or by high computational cost. We present a method and software, called IBDrecomb, for using segments of identity by descent to infer recombination rates. IBDrecomb can be applied to sequenced population cohorts to obtain high-resolution, population-specific recombination maps. In simulated admixed data, IBDrecomb obtains higher accuracy than admixture-based estimation of recombination rates. When applied to 2,500 simulated individuals, IBDrecomb obtains similar accuracy to a linkage-disequilibrium (LD)-based method applied to 96 individuals (the largest number for which computation is tractable). Compared to LD-based maps, our IBD-based maps have the advantage of estimating recombination rates in the recent past rather than the distant past. We used IBDrecomb to generate new recombination maps for European Americans and for African Americans from TOPMed sequence data from the Framingham Heart Study (1,626 unrelated individuals) and the Jackson Heart Study (2,046 unrelated individuals), and we compare them to LD-based, admixture-based, and family-based maps.
Collapse
Affiliation(s)
- Ying Zhou
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
| | - Brian L Browning
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
20
|
Zhou Y, Browning SR, Browning BL. A Fast and Simple Method for Detecting Identity-by-Descent Segments in Large-Scale Data. Am J Hum Genet 2020; 106:426-437. [PMID: 32169169 PMCID: PMC7118582 DOI: 10.1016/j.ajhg.2020.02.010] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 02/12/2020] [Indexed: 12/24/2022] Open
Abstract
Segments of identity by descent (IBD) are used in many genetic analyses. We present a method for detecting identical-by-descent haplotype segments in phased genotype data. Our method, called hap-IBD, combines a compressed representation of haplotype data, the positional Burrows-Wheeler transform, and multi-threaded execution to produce very fast analysis times. An attractive feature of hap-IBD is its simplicity: the input parameters clearly and precisely define the IBD segments that are reported, so that program correctness can be confirmed by users. We evaluate hap-IBD and four state-of-the-art IBD segment detection methods (GERMLINE, iLASH, RaPID, and TRUFFLE) using UK Biobank chromosome 20 data and simulated sequence data. We show that hap-IBD detects IBD segments faster and more accurately than competing methods, and that hap-IBD is the only method that can rapidly and accurately detect short 2-4 centiMorgan (cM) IBD segments in the full UK Biobank data. Analysis of 485,346 UK Biobank samples through the use of hap-IBD with 12 computational threads detects 231.5 billion autosomal IBD segments with length ≥2 cM in 24.4 h.
Collapse
Affiliation(s)
- Ying Zhou
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Brian L Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA; Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
21
|
Leitwein M, Duranton M, Rougemont Q, Gagnaire PA, Bernatchez L. Using Haplotype Information for Conservation Genomics. Trends Ecol Evol 2020; 35:245-258. [DOI: 10.1016/j.tree.2019.10.012] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 10/18/2019] [Accepted: 10/28/2019] [Indexed: 12/19/2022]
|
22
|
Beichman AC, Huerta-Sanchez E, Lohmueller KE. Using Genomic Data to Infer Historic Population Dynamics of Nonmodel Organisms. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2018. [DOI: 10.1146/annurev-ecolsys-110617-062431] [Citation(s) in RCA: 89] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Genome sequence data are now being routinely obtained from many nonmodel organisms. These data contain a wealth of information about the demographic history of the populations from which they originate. Many sophisticated statistical inference procedures have been developed to infer the demographic history of populations from this type of genomic data. In this review, we discuss the different statistical methods available for inference of demography, providing an overview of the underlying theory and logic behind each approach. We also discuss the types of data required and the pros and cons of each method. We then discuss how these methods have been applied to a variety of nonmodel organisms. We conclude by presenting some recommendations for researchers looking to use genomic data to infer demographic history.
Collapse
Affiliation(s)
- Annabel C. Beichman
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095, USA
| | - Emilia Huerta-Sanchez
- Department of Molecular and Cell Biology, University of California, Merced, California 95343, USA
- Current affiliation: Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island 02912, USA
| | - Kirk E. Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095, USA
- Interdepartmental Program in Bioinformatics and Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California 90095, USA
| |
Collapse
|
23
|
Fortes-Lima C, Bybjerg-Grauholm J, Marin-Padrón LC, Gomez-Cabezas EJ, Bækvad-Hansen M, Hansen CS, Le P, Hougaard DM, Verdu P, Mors O, Parra EJ, Marcheco-Teruel B. Exploring Cuba's population structure and demographic history using genome-wide data. Sci Rep 2018; 8:11422. [PMID: 30061702 PMCID: PMC6065444 DOI: 10.1038/s41598-018-29851-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Accepted: 07/10/2018] [Indexed: 12/31/2022] Open
Abstract
Cuba is the most populated country in the Caribbean and has a rich and heterogeneous genetic heritage. Here, we take advantage of dense genomic data from 860 Cuban individuals to reconstruct the genetic structure and ancestral origins of this population. We found distinct admixture patterns between and within the Cuban provinces. Eastern provinces have higher African and Native American ancestry contributions (average 26% and 10%, respectively) than the rest of the Cuban provinces (average 17% and 5%, respectively). Furthermore, in the Eastern Cuban region, we identified more intense sex-specific admixture patterns, strongly biased towards European male and African/Native American female ancestries. Our subcontinental ancestry analyses in Cuba highlight the Iberian population as the best proxy European source population, South American and Mesoamerican populations as the closest Native American ancestral component, and populations from West Central and Central Africa as the best proxy sources of the African ancestral component. Finally, we found complex admixture processes involving two migration pulses from both Native American and African sources. Most of the inferred Native American admixture events happened early during the Cuban colonial period, whereas the African admixture took place during the slave trade and more recently as a probable result of large-scale migrations from Haiti.
Collapse
Affiliation(s)
- Cesar Fortes-Lima
- UMR7206 Eco-Anthropology and Ethno-Biology, CNRS-MNHN-University Paris Diderot, Musée de l'Homme, Paris, 75016, France
| | - Jonas Bybjerg-Grauholm
- Department for Congenital Disorders, Statens Serum Institut, Copenhagen, 2300, Denmark.,The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus University, Aarhus, 8000, Denmark
| | | | | | - Marie Bækvad-Hansen
- Department for Congenital Disorders, Statens Serum Institut, Copenhagen, 2300, Denmark.,The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus University, Aarhus, 8000, Denmark
| | - Christine Søholm Hansen
- Department for Congenital Disorders, Statens Serum Institut, Copenhagen, 2300, Denmark.,The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus University, Aarhus, 8000, Denmark
| | - Phuong Le
- Department of Anthropology, University of Toronto, Mississauga, ON L5L 1C6, Canada
| | - David Michael Hougaard
- Department for Congenital Disorders, Statens Serum Institut, Copenhagen, 2300, Denmark.,The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus University, Aarhus, 8000, Denmark
| | - Paul Verdu
- UMR7206 Eco-Anthropology and Ethno-Biology, CNRS-MNHN-University Paris Diderot, Musée de l'Homme, Paris, 75016, France
| | - Ole Mors
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus University, Aarhus, 8000, Denmark.,Psychosis Research Unit, Aarhus University Hospital, Risskov, Aarhus, 8240, Denmark
| | - Esteban J Parra
- Department of Anthropology, University of Toronto, Mississauga, ON L5L 1C6, Canada.
| | | |
Collapse
|
24
|
Belbin GM, Odgis J, Sorokin EP, Yee MC, Kohli S, Glicksberg BS, Gignoux CR, Wojcik GL, Van Vleck T, Jeff JM, Linderman M, Schurmann C, Ruderfer D, Cai X, Merkelson A, Justice AE, Young KL, Graff M, North KE, Peters U, James R, Hindorff L, Kornreich R, Edelmann L, Gottesman O, Stahl EE, Cho JH, Loos RJ, Bottinger EP, Nadkarni GN, Abul-Husn NS, Kenny EE. Genetic identification of a common collagen disease in puerto ricans via identity-by-descent mapping in a health system. eLife 2017; 6:25060. [PMID: 28895531 PMCID: PMC5595434 DOI: 10.7554/elife.25060] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2017] [Accepted: 08/09/2017] [Indexed: 11/16/2022] Open
Abstract
Achieving confidence in the causality of a disease locus is a complex task that often requires supporting data from both statistical genetics and clinical genomics. Here we describe a combined approach to identify and characterize a genetic disorder that leverages distantly related patients in a health system and population-scale mapping. We utilize genomic data to uncover components of distant pedigrees, in the absence of recorded pedigree information, in the multi-ethnic BioMe biobank in New York City. By linking to medical records, we discover a locus associated with both elevated genetic relatedness and extreme short stature. We link the gene, COL27A1, with a little-known genetic disease, previously thought to be rare and recessive. We demonstrate that disease manifests in both heterozygotes and homozygotes, indicating a common collagen disorder impacting up to 2% of individuals of Puerto Rican ancestry, leading to a better understanding of the continuum of complex and Mendelian disease. Diseases often run in families. These disease are frequently linked to changes in DNA that are passed down through generations. Close family members may share these disease-causing mutations; so may distant relatives who inherited the same mutation from a common ancestor long ago. Geneticists use a method called linkage mapping to trace a disease found in multiple members of a family over generations to genetic changes in a shared ancestor. This allows scientists to pinpoint the exact place in the genome the disease-causing mutation occurred. Using computer algorithms, scientists can apply the same technique to identify mutations that distant relatives inherited from a common ancestor. Belbin et al. used this computational technique to identify a mutation that may cause unusually short stature or bone and joint problems in up to 2% of people of Puerto Rican descent. In the experiments, the genomes of about 32,000 New Yorkers who have volunteered to participate in the BioMe Biobank and their health records were used to search for genetic changes linked to extremely short stature. The search revealed that people who inherited two copies of this mutation from their parents were likely to be extremely short or to have bone and joint problems. People who inherited one copy had an increased likelihood of joint or bone problems. This mutation affects a gene responsible for making a form of protein called collagen that is important for bone growth. The analysis suggests the mutation first arose in a Native American ancestor living in Puerto Rico around the time that European colonization began. The mutation had previously been linked to a disorder called Steel syndrome that was thought to be rare. Belbin et al. showed this condition is actually fairly common in people whose ancestors recently came from Puerto Rico, but may often go undiagnosed by their physicians. The experiments emphasize the importance of including diverse populations in genetic studies, as studies of people of predominantly European descent would likely have missed the link between this disease and mutation.
Collapse
Affiliation(s)
- Gillian Morven Belbin
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States.,Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States.,The Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Jacqueline Odgis
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Elena P Sorokin
- Department of Genetics, Stanford University School of Medicine, Stanford, United States
| | - Muh-Ching Yee
- Department of Plant Biology, Carnegie Institution for Science, Stanford, United States
| | - Sumita Kohli
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Benjamin S Glicksberg
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States.,The Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, United States.,Harris Center for Precision Wellness, Icahn School of Medicine at Mt Sinai, New York, United States
| | - Christopher R Gignoux
- Department of Genetics, Stanford University School of Medicine, Stanford, United States
| | - Genevieve L Wojcik
- Department of Genetics, Stanford University School of Medicine, Stanford, United States
| | - Tielman Van Vleck
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Janina M Jeff
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Michael Linderman
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States.,The Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Claudia Schurmann
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Douglas Ruderfer
- Broad Institute, Cambridge, United States.,Division of Psychiatric Genomics, Icahn School of Medicine at Mt Sinai, New York, United States.,Center for Statistical Genetics, Icahn School of Medicine at Mt Sinai, New York, United States
| | - Xiaoqiang Cai
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Amanda Merkelson
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Anne E Justice
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, United States
| | - Kristin L Young
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, United States
| | - Misa Graff
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, United States
| | - Kari E North
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, United States
| | - Ulrike Peters
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, United States.,Department of Epidemiology, University of Washington School of Public Health, Seattle, United States
| | - Regina James
- National Institute on Minority Health and Health Disparities, National Institutes of Health, Bethesda, United States
| | - Lucia Hindorff
- National Human Genome Research Institute, National Institutes of Health, Bethesda, United States
| | - Ruth Kornreich
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Lisa Edelmann
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Omri Gottesman
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Eli Ea Stahl
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States.,The Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, United States.,Harris Center for Precision Wellness, Icahn School of Medicine at Mt Sinai, New York, United States.,Broad Institute, Cambridge, United States
| | - Judy H Cho
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States.,Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States.,Division of Gastroenterology, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Ruth Jf Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States.,The Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Erwin P Bottinger
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Girish N Nadkarni
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Noura S Abul-Husn
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States.,Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States.,The Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Eimear E Kenny
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States.,Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States.,The Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, United States.,Center for Statistical Genetics, Icahn School of Medicine at Mt Sinai, New York, United States
| |
Collapse
|
25
|
Bjelland DW, Lingala U, Patel PS, Jones M, Keller MC. A fast and accurate method for detection of IBD shared haplotypes in genome-wide SNP data. Eur J Hum Genet 2017; 25:617-624. [PMID: 28176766 PMCID: PMC5437913 DOI: 10.1038/ejhg.2017.6] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2016] [Revised: 11/22/2016] [Accepted: 12/24/2016] [Indexed: 11/08/2022] Open
Abstract
Identical by descent (IBD) segments are used to understand a number of fundamental issues in genetics. IBD segments are typically detected using long stretches of identical alleles between haplotypes in phased, whole-genome SNP data. Phase or SNP call errors in genomic data can degrade accuracy of IBD detection and lead to false-positive/negative calls and to under/overextension of true IBD segments. Furthermore, the number of comparisons increases quadratically with sample size, requiring high computational efficiency. We developed a new IBD segment detection program, FISHR (Find IBD Shared Haplotypes Rapidly), in an attempt to accurately detect IBD segments and to better estimate their endpoints using an algorithm that is fast enough to be deployed on very large whole-genome SNP data sets. We compared the performance of FISHR to three leading IBD segment detection programs: GERMLINE, refined IBD, and HaploScore. Using simulated and real genomic sequence data, we show that FISHR is slightly more accurate than all programs at detecting long (>3 cm) IBD segments but slightly less accurate than refined IBD at detecting short (~1 cm) IBD segments. More centrally, FISHR outperforms all programs in determining the true endpoints of IBD segments, which is crucial for several applications of IBD information. FISHR takes two to three times longer than GERMLINE to run, whereas both GERMLINE and FISHR were orders of magnitude faster than refined IBD and HaploScore. Overall, FISHR provides accurate IBD detection in unrelated individuals and is computationally efficient enough to be utilized on large SNP data sets >60 000 individuals.
Collapse
Affiliation(s)
- Douglas W Bjelland
- Institute for Behavioral Genetics, University of Colorado at Boulder, Boulder, CO, USA
| | - Uday Lingala
- Institute for Behavioral Genetics, University of Colorado at Boulder, Boulder, CO, USA
| | - Piyush S Patel
- Institute for Behavioral Genetics, University of Colorado at Boulder, Boulder, CO, USA
| | - Matt Jones
- Department of Psychology & Neuroscience, University of Colorado at Boulder, Boulder, CO, USA
| | - Matthew C Keller
- Institute for Behavioral Genetics, University of Colorado at Boulder, Boulder, CO, USA
- Department of Psychology & Neuroscience, University of Colorado at Boulder, Boulder, CO, USA
| |
Collapse
|
26
|
Ringbauer H, Coop G, Barton NH. Inferring Recent Demography from Isolation by Distance of Long Shared Sequence Blocks. Genetics 2017; 205:1335-1351. [PMID: 28108588 PMCID: PMC5340342 DOI: 10.1534/genetics.116.196220] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2016] [Accepted: 01/13/2017] [Indexed: 12/12/2022] Open
Abstract
Recently it has become feasible to detect long blocks of nearly identical sequence shared between pairs of genomes. These identity-by-descent (IBD) blocks are direct traces of recent coalescence events and, as such, contain ample signal to infer recent demography. Here, we examine sharing of such blocks in two-dimensional populations with local migration. Using a diffusion approximation to trace genetic ancestry, we derive analytical formulas for patterns of isolation by distance of IBD blocks, which can also incorporate recent population density changes. We introduce an inference scheme that uses a composite-likelihood approach to fit these formulas. We then extensively evaluate our theory and inference method on a range of scenarios using simulated data. We first validate the diffusion approximation by showing that the theoretical results closely match the simulated block-sharing patterns. We then demonstrate that our inference scheme can accurately and robustly infer dispersal rate and effective density, as well as bounds on recent dynamics of population density. To demonstrate an application, we use our estimation scheme to explore the fit of a diffusion model to Eastern European samples in the Population Reference Sample data set. We show that ancestry diffusing with a rate of [Formula: see text] during the last centuries, combined with accelerating population growth, can explain the observed exponential decay of block sharing with increasing pairwise sample distance.
Collapse
Affiliation(s)
- Harald Ringbauer
- Institute of Science and Technology Austria, A-3400 Klosterneuburg, Austria
| | - Graham Coop
- Department of Evolution and Ecology, University of California, Davis, California 95616
- Center for Population Biology, University of California, Davis, California 95616
| | - Nicholas H Barton
- Institute of Science and Technology Austria, A-3400 Klosterneuburg, Austria
| |
Collapse
|
27
|
Gao F, Keinan A. Explosive genetic evidence for explosive human population growth. Curr Opin Genet Dev 2016; 41:130-139. [PMID: 27710906 PMCID: PMC5161661 DOI: 10.1016/j.gde.2016.09.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Revised: 08/26/2016] [Accepted: 09/11/2016] [Indexed: 11/19/2022]
Abstract
The advent of next-generation sequencing technology has allowed the collection of vast amounts of genetic variation data. A recurring discovery from studying larger and larger samples of individuals had been the extreme, previously unexpected, excess of very rare genetic variants, which has been shown to be mostly due to the recent explosive growth of human populations. Here, we review recent literature that inferred recent changes in population size in different human populations and with different methodologies, with many pointing to recent explosive growth, especially in European populations for which more data has been available. We also review the state-of-the-art methods and software for the inference of historical population size changes that lead to these discoveries. Finally, we discuss the implications of recent population growth on personalized genomics, on purifying selection in the non-equilibrium state it entails and, as a consequence, on the genetic architecture underlying complex disease and the performance of mapping methods in discovering rare variants that contribute to complex disease risk.
Collapse
Affiliation(s)
- Feng Gao
- Department of Biological Statistics and Computational Biology, Ithaca, NY 14850, United States
| | - Alon Keinan
- Department of Biological Statistics and Computational Biology, Ithaca, NY 14850, United States.
| |
Collapse
|
28
|
Novembre J, Peter BM. Recent advances in the study of fine-scale population structure in humans. Curr Opin Genet Dev 2016; 41:98-105. [PMID: 27662060 DOI: 10.1016/j.gde.2016.08.007] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Revised: 08/18/2016] [Accepted: 08/24/2016] [Indexed: 01/17/2023]
Abstract
Empowered by modern genotyping and large samples, population structure can be accurately described and quantified even when it only explains a fraction of a percent of total genetic variance. This is especially relevant and interesting for humans, where fine-scale population structure can both confound disease-mapping studies and reveal the history of migration and divergence that shaped our species' diversity. Here we review notable recent advances in the detection, use, and understanding of population structure. Our work addresses multiple areas where substantial progress is being made: improved statistics and models for better capturing differentiation, admixture, and the spatial distribution of variation; computational speed-ups that allow methods to scale to modern data; and advances in haplotypic modeling that have wide ranging consequences for the analysis of population structure. We conclude by outlining four important open challenges: the limitations of discrete population models, uncertainty in individual origins, the incorporation of both fine-scale structure and ancient DNA in parametric models, and the development of efficient computational tools, particularly for haplotype-based methods.
Collapse
Affiliation(s)
- John Novembre
- Department of Human Genetics, University of Chicago, IL 60636, United States; Department of Ecology and Evolutionary Biology, University of Chicago, IL 60636, United States
| | - Benjamin M Peter
- Department of Human Genetics, University of Chicago, IL 60636, United States
| |
Collapse
|