1
|
Patel RA, Weiß CL, Zhu H, Mostafavi H, Simons YB, Spence JP, Pritchard JK. Conditional frequency spectra as a tool for studying selection on complex traits in biobanks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.15.599126. [PMID: 38948697 PMCID: PMC11212903 DOI: 10.1101/2024.06.15.599126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Natural selection on complex traits is difficult to study in part due to the ascertainment inherent to genome-wide association studies (GWAS). The power to detect a trait-associated variant in GWAS is a function of frequency and effect size - but for traits under selection, the effect size of a variant determines the strength of selection against it, constraining its frequency. To account for GWAS ascertainment, we propose studying the joint distribution of allele frequencies across populations, conditional on the frequencies in the GWAS cohort. Before considering these conditional frequency spectra, we first characterized the impact of selection and non-equilibrium demography on allele frequency dynamics forwards and backwards in time. We then used these results to understand conditional frequency spectra under realistic human demography. Finally, we investigated empirical conditional frequency spectra for GWAS variants associated with 106 complex traits, finding compelling evidence for either stabilizing or purifying selection. Our results provide insight into polygenic score portability and other properties of variants ascertained with GWAS, highlighting the utility of conditional frequency spectra.
Collapse
Affiliation(s)
- Roshni A. Patel
- Department of Genetics, Stanford University School of Medicine, Stanford, CA
| | - Clemens L. Weiß
- Stanford Cancer Institute Core, Stanford University School of Medicine, Stanford, CA
| | - Huisheng Zhu
- Department of Biology, Stanford University, Stanford, CA
| | - Hakhamanesh Mostafavi
- Center for Human Genetics and Genomics, New York University School of Medicine, New York, NY
- Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY
| | | | - Jeffrey P. Spence
- Department of Genetics, Stanford University School of Medicine, Stanford, CA
| | - Jonathan K. Pritchard
- Department of Genetics, Stanford University School of Medicine, Stanford, CA
- Department of Biology, Stanford University, Stanford, CA
| |
Collapse
|
2
|
Lyulina AS, Liu Z, Good BH. Linkage equilibrium between rare mutations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.28.587282. [PMID: 38617331 PMCID: PMC11014483 DOI: 10.1101/2024.03.28.587282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Recombination breaks down genetic linkage by reshuffling existing variants onto new genetic backgrounds. These dynamics are traditionally quantified by examining the correlations between alleles, and how they decay as a function of the recombination rate. However, the magnitudes of these correlations are strongly influenced by other evolutionary forces like natural selection and genetic drift, making it difficult to tease out the effects of recombination. Here we introduce a theoretical framework for analyzing an alternative family of statistics that measure the homoplasy produced by recombination. We derive analytical expressions that predict how these statistics depend on the rates of recombination and recurrent mutation, the strength of negative selection and genetic drift, and the present-day frequencies of the mutant alleles. We find that the degree of homoplasy can strongly depend on this frequency scale, which reflects the underlying timescales over which these mutations occurred. We show how these scaling properties can be used to isolate the effects of recombination, and discuss their implications for the rates of horizontal gene transfer in bacteria.
Collapse
Affiliation(s)
- Anastasia S Lyulina
- Department of Biology, Stanford University, Stanford, CA 94305, USA
- Department of Applied Physics, Stanford University, Stanford, CA 94305, USA
| | - Zhiru Liu
- Department of Applied Physics, Stanford University, Stanford, CA 94305, USA
| | - Benjamin H Good
- Department of Biology, Stanford University, Stanford, CA 94305, USA
- Department of Applied Physics, Stanford University, Stanford, CA 94305, USA
- Chan Zuckerberg Biohub - San Francisco, San Francisco, CA 94158, USA
| |
Collapse
|
3
|
Lopez Fang L, Peede D, Ortega-Del Vecchyo D, McTavish EJ, Huerta-Sánchez E. Leveraging shared ancestral variation to detect local introgression. PLoS Genet 2024; 20:e1010155. [PMID: 38190420 PMCID: PMC10798638 DOI: 10.1371/journal.pgen.1010155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 01/19/2024] [Accepted: 12/04/2023] [Indexed: 01/10/2024] Open
Abstract
Introgression is a common evolutionary phenomenon that results in shared genetic material across non-sister taxa. Existing statistical methods such as Patterson's D statistic can detect introgression by measuring an excess of shared derived alleles between populations. The D statistic is effective to detect genome-wide patterns of introgression but can give spurious inferences of introgression when applied to local regions. We propose a new statistic, D+, that leverages both shared ancestral and derived alleles to infer local introgressed regions. Incorporating both shared derived and ancestral alleles increases the number of informative sites per region, improving our ability to identify local introgression. We use a coalescent framework to derive the expected value of this statistic as a function of different demographic parameters under an instantaneous admixture model and use coalescent simulations to compute the power and precision of D+. While the power of D and D+ is comparable, D+ has better precision than D. We apply D+ to empirical data from the 1000 Genome Project and Heliconius butterflies to infer local targets of introgression in humans and in butterflies.
Collapse
Affiliation(s)
- Lesly Lopez Fang
- Department of Life & Environmental Sciences, University of California, Merced, Merced, California, United States of America
- Quantitative & Systems Biology Graduate Group, University of California, Merced, Merced, California, United States of America
| | - David Peede
- Department of Ecology, Evolution and Organismal Biology, Brown University, Providence, Rhode Island, United States of America
- Center for Computational Biology, Brown University, Providence, Rhode Island, United States of America
- Institute at Brown for Environment and Society, Brown University, Providence, Rhode Island, United States of America
| | - Diego Ortega-Del Vecchyo
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Santiago de Querétaro, Querétaro, México
| | - Emily Jane McTavish
- Department of Life & Environmental Sciences, University of California, Merced, Merced, California, United States of America
- Quantitative & Systems Biology Graduate Group, University of California, Merced, Merced, California, United States of America
| | - Emilia Huerta-Sánchez
- Department of Ecology, Evolution and Organismal Biology, Brown University, Providence, Rhode Island, United States of America
- Center for Computational Biology, Brown University, Providence, Rhode Island, United States of America
| |
Collapse
|
4
|
Fournier R, Tsangalidou Z, Reich D, Palamara PF. Haplotype-based inference of recent effective population size in modern and ancient DNA samples. Nat Commun 2023; 14:7945. [PMID: 38040695 PMCID: PMC10692198 DOI: 10.1038/s41467-023-43522-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Accepted: 11/10/2023] [Indexed: 12/03/2023] Open
Abstract
Individuals sharing recent ancestors are likely to co-inherit large identical-by-descent (IBD) genomic regions. The distribution of these IBD segments in a population may be used to reconstruct past demographic events such as effective population size variation, but accurate IBD detection is difficult in ancient DNA data and in underrepresented populations with limited reference data. In this work, we introduce an accurate method for inferring effective population size variation during the past ~2000 years in both modern and ancient DNA data, called HapNe. HapNe infers recent population size fluctuations using either IBD sharing (HapNe-IBD) or linkage disequilibrium (HapNe-LD), which does not require phasing and can be computed in low coverage data, including data sets with heterogeneous sampling times. HapNe shows improved accuracy in a range of simulated demographic scenarios compared to currently available methods for IBD-based and LD-based inference of recent effective population size, while requiring fewer computational resources. We apply HapNe to several modern populations from the 1,000 Genomes Project, the UK Biobank, the Allen Ancient DNA Resource, and recently published samples from Iron Age Britain, detecting multiple instances of recent effective population size variation across these groups.
Collapse
Affiliation(s)
| | | | - David Reich
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Pier Francesco Palamara
- Department of Statistics, University of Oxford, Oxford, UK.
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.
| |
Collapse
|
5
|
Medina-Muñoz SG, Ortega-Del Vecchyo D, Cruz-Hervert LP, Ferreyra-Reyes L, García-García L, Moreno-Estrada A, Ragsdale AP. Demographic modeling of admixed Latin American populations from whole genomes. Am J Hum Genet 2023; 110:1804-1816. [PMID: 37725976 PMCID: PMC10577084 DOI: 10.1016/j.ajhg.2023.08.015] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 08/17/2023] [Accepted: 08/23/2023] [Indexed: 09/21/2023] Open
Abstract
Demographic models of Latin American populations often fail to fully capture their complex evolutionary history, which has been shaped by both recent admixture and deeper-in-time demographic events. To address this gap, we used high-coverage whole-genome data from Indigenous American ancestries in present-day Mexico and existing genomes from across Latin America to infer multiple demographic models that capture the impact of different timescales on genetic diversity. Our approach, which combines analyses of allele frequencies and ancestry tract length distributions, represents a significant improvement over current models in predicting patterns of genetic variation in admixed Latin American populations. We jointly modeled the contribution of European, African, East Asian, and Indigenous American ancestries into present-day Latin American populations. We infer that the ancestors of Indigenous Americans and East Asians diverged ∼30 thousand years ago, and we characterize genetic contributions of recent migrations from East and Southeast Asia to Peru and Mexico. Our inferred demographic histories are consistent across different genomic regions and annotations, suggesting that our inferences are robust to the potential effects of linked selection. In conjunction with published distributions of fitness effects for new nonsynonymous mutations in humans, we show in large-scale simulations that our models recover important features of both neutral and deleterious variation. By providing a more realistic framework for understanding the evolutionary history of Latin American populations, our models can help address the historical under-representation of admixed groups in genomics research and can be a valuable resource for future studies of populations with complex admixture and demographic histories.
Collapse
Affiliation(s)
- Santiago G Medina-Muñoz
- National Laboratory of Genomics for Biodiversity (LANGEBIO), Advanced Genomics Unit (UGA), CINVESTAV, Irapuato, Guanajuato 36824, Mexico
| | - Diego Ortega-Del Vecchyo
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de Mexico, Juriquilla, Querétaro 76230, Mexico
| | | | | | | | - Andrés Moreno-Estrada
- National Laboratory of Genomics for Biodiversity (LANGEBIO), Advanced Genomics Unit (UGA), CINVESTAV, Irapuato, Guanajuato 36824, Mexico.
| | - Aaron P Ragsdale
- National Laboratory of Genomics for Biodiversity (LANGEBIO), Advanced Genomics Unit (UGA), CINVESTAV, Irapuato, Guanajuato 36824, Mexico; Department of Integrative Biology, University of Wisconsin-Madison, Madison, WI 53706, USA.
| |
Collapse
|
6
|
Laetsch DR, Bisschop G, Martin SH, Aeschbacher S, Setter D, Lohse K. Demographically explicit scans for barriers to gene flow using gIMble. PLoS Genet 2023; 19:e1010999. [PMID: 37816069 PMCID: PMC10610087 DOI: 10.1371/journal.pgen.1010999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 10/27/2023] [Accepted: 09/25/2023] [Indexed: 10/12/2023] Open
Abstract
Identifying regions of the genome that act as barriers to gene flow between recently diverged taxa has remained challenging given the many evolutionary forces that generate variation in genetic diversity and divergence along the genome, and the stochastic nature of this variation. Progress has been impeded by a conceptual and methodological divide between analyses that infer the demographic history of speciation and genome scans aimed at identifying locally maladaptive alleles i.e. genomic barriers to gene flow. Here we implement genomewide IM blockwise likelihood estimation (gIMble), a composite likelihood approach for the quantification of barriers, that bridges this divide. This analytic framework captures background selection and selection against barriers in a model of isolation with migration (IM) as heterogeneity in effective population size (Ne) and effective migration rate (me), respectively. Variation in both effective demographic parameters is estimated in sliding windows via pre-computed likelihood grids. gIMble includes modules for pre-processing/filtering of genomic data and performing parametric bootstraps using coalescent simulations. To demonstrate the new approach, we analyse data from a well-studied pair of sister species of tropical butterflies with a known history of post-divergence gene flow: Heliconius melpomene and H. cydno. Our analyses uncover both large-effect barrier loci (including well-known wing-pattern genes) and a genome-wide signal of a polygenic barrier architecture.
Collapse
Affiliation(s)
- Dominik R. Laetsch
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh, United Kingdom
| | - Gertjan Bisschop
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh, United Kingdom
| | - Simon H. Martin
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh, United Kingdom
| | - Simon Aeschbacher
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
| | - Derek Setter
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh, United Kingdom
| | - Konrad Lohse
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
7
|
Flegontov P, Işıldak U, Maier R, Yüncü E, Changmai P, Reich D. Modeling of African population history using f-statistics is biased when applying all previously proposed SNP ascertainment schemes. PLoS Genet 2023; 19:e1010931. [PMID: 37676865 PMCID: PMC10508636 DOI: 10.1371/journal.pgen.1010931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2023] [Revised: 09/19/2023] [Accepted: 08/21/2023] [Indexed: 09/09/2023] Open
Abstract
f-statistics have emerged as a first line of analysis for making inferences about demographic history from genome-wide data. Not only are they guaranteed to allow robust tests of the fits of proposed models of population history to data when analyzing full genome sequencing data-that is, all single nucleotide polymorphisms (SNPs) in the individuals being analyzed-but they are also guaranteed to allow robust tests of models for SNPs ascertained as polymorphic in a population that is an outgroup in a phylogenetic sense to all groups being analyzed. True "outgroup ascertainment" is in practice impossible in humans because our species has arisen from a substructured ancestral population that does not descend from a homogeneous ancestral population going back many hundreds of thousands of years into the past. However, initial studies suggested that non-outgroup-ascertainment schemes might produce robust enough results using f-statistics, and that motivated widespread fitting of models to data using non-outgroup-ascertained SNP panels such as the "Affymetrix Human Origins array" which has been genotyped on thousands of modern individuals from hundreds of populations, or the "1240k" in-solution enrichment reagent which has been the source of about 70% of published genome-wide data for ancient humans. In this study, we show that while analyses of population history using such panels work well for studies of relationships among non-African populations and one African outgroup, when co-modeling more than one sub-Saharan African and/or archaic human groups (Neanderthals and Denisovans), fitting of f-statistics to such SNP sets is expected to frequently lead to false rejection of true demographic histories, and failure to reject incorrect models. Analyzing panels of SNPs polymorphic in archaic humans, which has been suggested as a solution for the ascertainment problem, has limited statistical power and retains important biases. However, by carrying out simulations of diverse demographic histories, we show that bias in inferences based on f-statistics can be minimized by ascertaining on variants common in a union of diverse African groups; such ascertainment retains high statistical power while allowing co-analysis of archaic and modern groups.
Collapse
Affiliation(s)
- Pavel Flegontov
- Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
- Kalmyk Research Center of the Russian Academy of Sciences, Elista, Russia
| | - Ulaş Işıldak
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
| | - Robert Maier
- Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Eren Yüncü
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
| | - Piya Changmai
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
- Howard Hughes Medical Institute, Harvard Medical School, Boston, Massachusetts, United States of America
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
| |
Collapse
|
8
|
Ragsdale AP, Thornton KR. Multiple Sources of Uncertainty Confound Inference of Historical Human Generation Times. Mol Biol Evol 2023; 40:msad160. [PMID: 37450583 PMCID: PMC10404577 DOI: 10.1093/molbev/msad160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 07/05/2023] [Accepted: 07/07/2023] [Indexed: 07/18/2023] Open
Abstract
Wang et al. (2023) recently proposed an approach to infer the history of human generation intervals from changes in mutation profiles over time. As the relative proportions of different mutation types depend on the ages of parents, binning variants by the time they arose allows for the inference of changes in average paternal and maternal generation intervals. Applying this approach to published allele age estimates, Wang et al. (2023) inferred long-lasting sex differences in average generation times and surprisingly found that ancestral generation times of West African populations remained substantially higher than those of Eurasian populations extending tens of thousands of generations into the past. Here, we argue that the results and interpretations in Wang et al. (2023) are primarily driven by noise and biases in input data and a lack of validation using independent approaches for estimating allele ages. With the recent development of methods to reconstruct genome-wide gene genealogies, coalescence times, and allele ages, we caution that downstream analyses may be strongly influenced by uncharacterized biases in their output.
Collapse
Affiliation(s)
- Aaron P Ragsdale
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, WI, USA
| | - Kevin R Thornton
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA, USA
| |
Collapse
|
9
|
Ragsdale AP, Weaver TD, Atkinson EG, Hoal EG, Möller M, Henn BM, Gravel S. A weakly structured stem for human origins in Africa. Nature 2023; 617:755-763. [PMID: 37198480 PMCID: PMC10208968 DOI: 10.1038/s41586-023-06055-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 04/05/2023] [Indexed: 05/19/2023]
Abstract
Despite broad agreement that Homo sapiens originated in Africa, considerable uncertainty surrounds specific models of divergence and migration across the continent1. Progress is hampered by a shortage of fossil and genomic data, as well as variability in previous estimates of divergence times1. Here we seek to discriminate among such models by considering linkage disequilibrium and diversity-based statistics, optimized for rapid, complex demographic inference2. We infer detailed demographic models for populations across Africa, including eastern and western representatives, and newly sequenced whole genomes from 44 Nama (Khoe-San) individuals from southern Africa. We infer a reticulated African population history in which present-day population structure dates back to Marine Isotope Stage 5. The earliest population divergence among contemporary populations occurred 120,000 to 135,000 years ago and was preceded by links between two or more weakly differentiated ancestral Homo populations connected by gene flow over hundreds of thousands of years. Such weakly structured stem models explain patterns of polymorphism that had previously been attributed to contributions from archaic hominins in Africa2-7. In contrast to models with archaic introgression, we predict that fossil remains from coexisting ancestral populations should be genetically and morphologically similar, and that only an inferred 1-4% of genetic differentiation among contemporary human populations can be attributed to genetic drift between stem populations. We show that model misspecification explains the variation in previous estimates of divergence times, and argue that studying a range of models is key to making robust inferences about deep history.
Collapse
Affiliation(s)
- Aaron P Ragsdale
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, WI, USA
| | - Timothy D Weaver
- Department of Anthropology, University of California, Davis, CA, USA
| | - Elizabeth G Atkinson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Eileen G Hoal
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, Stellenbosch University, Cape Town, South Africa
- South African Medical Research Council Centre for Tuberculosis Research, Stellenbosch University, Cape Town, South Africa
- Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Marlo Möller
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, Stellenbosch University, Cape Town, South Africa
- South African Medical Research Council Centre for Tuberculosis Research, Stellenbosch University, Cape Town, South Africa
- Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Brenna M Henn
- Department of Anthropology, University of California, Davis, CA, USA.
- Genome Center, University of California, Davis, CA, USA.
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montreal, Quebec, Canada.
| |
Collapse
|
10
|
Fan S, Spence JP, Feng Y, Hansen MEB, Terhorst J, Beltrame MH, Ranciaro A, Hirbo J, Beggs W, Thomas N, Nyambo T, Mpoloka SW, Mokone GG, Njamnshi A, Folkunang C, Meskel DW, Belay G, Song YS, Tishkoff SA. Whole-genome sequencing reveals a complex African population demographic history and signatures of local adaptation. Cell 2023; 186:923-939.e14. [PMID: 36868214 PMCID: PMC10568978 DOI: 10.1016/j.cell.2023.01.042] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 10/16/2022] [Accepted: 01/30/2023] [Indexed: 03/05/2023]
Abstract
We conduct high coverage (>30×) whole-genome sequencing of 180 individuals from 12 indigenous African populations. We identify millions of unreported variants, many predicted to be functionally important. We observe that the ancestors of southern African San and central African rainforest hunter-gatherers (RHG) diverged from other populations >200 kya and maintained a large effective population size. We observe evidence for ancient population structure in Africa and for multiple introgression events from "ghost" populations with highly diverged genetic lineages. Although currently geographically isolated, we observe evidence for gene flow between eastern and southern Khoesan-speaking hunter-gatherer populations lasting until ∼12 kya. We identify signatures of local adaptation for traits related to skin color, immune response, height, and metabolic processes. We identify a positively selected variant in the lightly pigmented San that influences pigmentation in vitro by regulating the enhancer activity and gene expression of PDPK1.
Collapse
Affiliation(s)
- Shaohua Fan
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China; Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jeffrey P Spence
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Yuanqing Feng
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Matthew E B Hansen
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jonathan Terhorst
- Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Marcia H Beltrame
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Alessia Ranciaro
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jibril Hirbo
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - William Beggs
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Neil Thomas
- Computer Science Division, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Thomas Nyambo
- Department of Biochemistry, Kampala International University in Tanzania, P.O. Box 9790, Dar es Salaam, Tanzania
| | - Sununguko Wata Mpoloka
- Department of Biological Sciences, Faculty of Science, University of Botswana Gaborone, Private Bag UB 0022, Gaborone, Botswana
| | - Gaonyadiwe George Mokone
- Department of Biomedical Sciences, Faculty of Medicine, University of Botswana Gaborone, Private Bag UB 0022, Gaborone, Botswana
| | - Alfred Njamnshi
- Department of Neurology, Central Hospital Yaoundé; Brain Research Africa Initiative (BRAIN), Neuroscience Lab, Faculty of Medicine and Biomedical Sciences, The University of Yaoundé I, P.O. Box 337, Yaoundé, Cameroon
| | - Charles Folkunang
- Department of Pharmacotoxicology and Pharmacokinetics, Faculty of Medicine and Biomedical Sciences, The University of Yaoundé I, P.O. Box 337, Yaoundé, Cameroon
| | - Dawit Wolde Meskel
- Department of Microbial Cellular and Molecular Biology, Addis Ababa University, P.O. Box 1176, Addis Ababa, Ethiopia
| | - Gurja Belay
- Department of Microbial Cellular and Molecular Biology, Addis Ababa University, P.O. Box 1176, Addis Ababa, Ethiopia
| | - Yun S Song
- Computer Science Division, University of California, Berkeley, Berkeley, CA 94720, USA; Department of Statistics, University of California, Berkeley, Berkeley, CA 94720, USA; Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| | - Sarah A Tishkoff
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
11
|
Noskova E, Abramov N, Iliutkin S, Sidorin A, Dobrynin P, Ulyantsev VI. GADMA2: more efficient and flexible demographic inference from genetic data. Gigascience 2022; 12:giad059. [PMID: 37609916 PMCID: PMC10445054 DOI: 10.1093/gigascience/giad059] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Revised: 01/31/2023] [Accepted: 07/05/2023] [Indexed: 08/24/2023] Open
Abstract
BACKGROUND Inference of complex demographic histories is a source of information about events that happened in the past of studied populations. Existing methods for demographic inference typically require input from the researcher in the form of a parameterized model. With an increased variety of methods and tools, each with its own interface, the model specification becomes tedious and error-prone. Moreover, optimization algorithms used to find model parameters sometimes turn out to be inefficient, for instance, by being not properly tuned or highly dependent on a user-provided initialization. The open-source software GADMA addresses these problems, providing automatic demographic inference. It proposes a common interface for several likelihood engines and provides global parameters optimization based on a genetic algorithm. RESULTS Here, we introduce the new GADMA2 software and provide a detailed description of the added and expanded features. It has a renovated core code base, new likelihood engines, an updated optimization algorithm, and a flexible setup for automatic model construction. We provide a full overview of GADMA2 enhancements, compare the performance of supported likelihood engines on simulated data, and demonstrate an example of GADMA2 usage on 2 empirical datasets. CONCLUSIONS We demonstrate the better performance of a genetic algorithm in GADMA2 by comparing it to the initial version and other existing optimization approaches. Our experiments on simulated data indicate that GADMA2's likelihood engines are able to provide accurate estimations of demographic parameters even for misspecified models. We improve model parameters for 2 empirical datasets of inbred species.
Collapse
Affiliation(s)
- Ekaterina Noskova
- Computer Technologies Laboratory, ITMO University, St. Petersburg 197101, Russia
| | | | - Stanislav Iliutkin
- Computer Technologies Laboratory, ITMO University, St. Petersburg 197101, Russia
| | - Anton Sidorin
- Laboratory of Biochemical Genetics, St. Petersburg State University, St. Petersburg 199034, Russia
| | - Pavel Dobrynin
- Computer Technologies Laboratory, ITMO University, St. Petersburg 197101, Russia
- Human Genetics Laboratory, Vavilov Institute of General Genetics RAS, Moscow 119991, Russia
| | - Vladimir I Ulyantsev
- Computer Technologies Laboratory, ITMO University, St. Petersburg 197101, Russia
| |
Collapse
|
12
|
Reilly PF, Tjahjadi A, Miller SL, Akey JM, Tucci S. The contribution of Neanderthal introgression to modern human traits. Curr Biol 2022; 32:R970-R983. [PMID: 36167050 PMCID: PMC9741939 DOI: 10.1016/j.cub.2022.08.027] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Neanderthals, our closest extinct relatives, lived in western Eurasia from 400,000 years ago until they went extinct around 40,000 years ago. DNA retrieved from ancient specimens revealed that Neanderthals mated with modern human contemporaries. As a consequence, introgressed Neanderthal DNA survives scattered across the human genome such that 1-4% of the genome of present-day people outside Africa are inherited from Neanderthal ancestors. Patterns of Neanderthal introgressed genomic sequences suggest that Neanderthal alleles had distinct fates in the modern human genetic background. Some Neanderthal alleles facilitated human adaptation to new environments such as novel climate conditions, UV exposure levels and pathogens, while others had deleterious consequences. Here, we review the body of work on Neanderthal introgression over the past decade. We describe how evolutionary forces shaped the genomic landscape of Neanderthal introgression and highlight the impact of introgressed alleles on human biology and phenotypic variation.
Collapse
Affiliation(s)
| | - Audrey Tjahjadi
- Department of Anthropology, Yale University, New Haven, CT, USA
| | | | - Joshua M Akey
- Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
| | - Serena Tucci
- Department of Anthropology, Yale University, New Haven, CT, USA; Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA.
| |
Collapse
|
13
|
Churchill SE, Keys K, Ross AH. Midfacial Morphology and Neandertal-Modern Human Interbreeding. BIOLOGY 2022; 11:1163. [PMID: 36009790 PMCID: PMC9404802 DOI: 10.3390/biology11081163] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 07/29/2022] [Accepted: 08/01/2022] [Indexed: 06/15/2023]
Abstract
Ancient DNA from, Neandertal and modern human fossils, and comparative morphological analyses of them, reveal a complex history of interbreeding between these lineages and the introgression of Neandertal genes into modern human genomes. Despite substantial increases in our knowledge of these events, the timing and geographic location of hybridization events remain unclear. Six measures of facial size and shape, from regional samples of Neandertals and early modern humans, were used in a multivariate exploratory analysis to try to identify regions in which early modern human facial morphology was more similar to that of Neandertals, which might thus represent regions of greater introgression of Neandertal genes. The results of canonical variates analysis and hierarchical cluster analysis suggest important affinities in facial morphology between both Middle and Upper Paleolithic early modern humans of the Near East with Neandertals, highlighting the importance of this region for interbreeding between the two lineages.
Collapse
Affiliation(s)
- Steven E. Churchill
- Department of Evolutionary Anthropology, Duke University, Durham, NC 27708, USA;
- Centre for the Exploration of the Deep Human Journey, University of the Witwatersrand, Johannesburg 2050, South Africa
| | - Kamryn Keys
- Human Identification & Forensic Analysis Laboratory, Department of Biological Sciences, North Carolina State University, Raleigh, NC 27695, USA;
| | - Ann H. Ross
- Human Identification & Forensic Analysis Laboratory, Department of Biological Sciences, North Carolina State University, Raleigh, NC 27695, USA;
| |
Collapse
|
14
|
Chen DS, Clark AG, Wolfner MF. Octopaminergic/tyraminergic Tdc2 neurons regulate biased sperm usage in female Drosophila melanogaster. Genetics 2022; 221:6613932. [PMID: 35736370 DOI: 10.1093/genetics/iyac097] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Accepted: 06/04/2022] [Indexed: 11/14/2022] Open
Abstract
In polyandrous internally fertilizing species, a multiply-mated female can use stored sperm from different males in a biased manner to fertilize her eggs. The female's ability to assess sperm quality and compatibility is essential for her reproductive success, and represents an important aspect of postcopulatory sexual selection. In Drosophila melanogaster, previous studies demonstrated that the female nervous system plays an active role in influencing progeny paternity proportion, and suggested a role for octopaminergic/tyraminergic Tdc2 neurons in this process. Here, we report that inhibiting Tdc2 neuronal activity causes females to produce a higher-than-normal proportion of first-male progeny. This difference is not due to differences in sperm storage or release, but instead is attributable to the suppression of second-male sperm usage bias that normally occurs in control females. We further show that a subset of Tdc2 neurons innervating the female reproductive tract is largely responsible for the progeny proportion phenotype that is observed when Tdc2 neurons are inhibited globally. On the contrary, overactivation of Tdc2 neurons does not further affect sperm storage and release or progeny proportion. These results suggest that octopaminergic/tyraminergic signaling allows a multiply-mated female to bias sperm usage, and identify a new role for the female nervous system in postcopulatory sexual selection.
Collapse
Affiliation(s)
- Dawn S Chen
- Department of Molecular Biology and Genetics, Cornell University, Ithaca NY 14853, USA
| | - Andrew G Clark
- Department of Molecular Biology and Genetics, Cornell University, Ithaca NY 14853, USA
| | - Mariana F Wolfner
- Department of Molecular Biology and Genetics, Cornell University, Ithaca NY 14853, USA
| |
Collapse
|
15
|
Biddanda A, Steinrücken M, Novembre J. Properties of Two-Locus Genealogies and Linkage Disequilibrium in Temporally Structured Samples. Genetics 2022; 221:6549526. [PMID: 35294015 PMCID: PMC9245597 DOI: 10.1093/genetics/iyac038] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 02/06/2022] [Indexed: 11/13/2022] Open
Abstract
Archaeogenetics has been revolutionary, revealing insights into demographic history and recent positive selection. However, most studies to date have ignored the non-random association of genetic variants at different loci (i.e., linkage disequilibrium, LD). This may be in part because basic properties of LD in samples from different times are still not well understood. Here, we derive several results for summary statistics of haplotypic variation under a model with time-stratified sampling: 1) The correlation between the number of pairwise differences observed between time-staggered samples (πΔt) in models with and without strict population continuity; 2) The product of the LD coefficient, D, between ancient and modern samples, which is a measure of haplotypic similarity between modern and ancient samples; and 3) The expected switch rate in the Li and Stephens haplotype copying model. The latter has implications for genotype imputation and phasing in ancient samples with modern reference panels. Overall, these results provide a characterization of how haplotype patterns are affected by sample age, recombination rates, and population sizes. We expect these results will help guide the interpretation and analysis of haplotype data from ancient and modern samples.
Collapse
Affiliation(s)
- Arjun Biddanda
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - Matthias Steinrücken
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA.,Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637, USA
| | - John Novembre
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA.,Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
16
|
Friedlander E, Steinrücken M. A numerical framework for genetic hitchhiking in populations of variable size. Genetics 2022; 220:6526396. [PMID: 35143667 PMCID: PMC8893261 DOI: 10.1093/genetics/iyac012] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 12/27/2021] [Indexed: 11/13/2022] Open
Abstract
Natural selection on beneficial or deleterious alleles results in an increase or decrease, respectively, of their frequency within the population. Due to chromosomal linkage, the dynamics of the selected site affect the genetic variation at nearby neutral loci in a process commonly referred to as genetic hitchhiking. Changes in population size, however, can yield patterns in genomic data that mimic the effects of selection. Accurately modeling these dynamics is thus crucial to understanding how selection and past population size changes impact observed patterns of genetic variation. Here, we model the evolution of haplotype frequencies with the Wright-Fisher diffusion to study the impact of selection on linked neutral variation. Explicit solutions are not known for the dynamics of this diffusion when selection and recombination act simultaneously. Thus, we present a method for numerically evaluating the Wright-Fisher diffusion dynamics of 2 linked loci separated by a certain recombination distance when selection is acting. We can account for arbitrary population size histories explicitly using this approach. A key step in the method is to express the moments of the associated transition density, or sampling probabilities, as solutions to ordinary differential equations. Numerically solving these differential equations relies on a novel accurate and numerically efficient technique to estimate higher order moments from lower order moments. We demonstrate how this numerical framework can be used to quantify the reduction and recovery of genetic diversity around a selected locus over time and elucidate distortions in the site-frequency-spectra of neutral variation linked to loci under selection in various demographic settings. The method can be readily extended to more general modes of selection and applied in likelihood frameworks to detect loci under selection and infer the strength of the selective pressure.
Collapse
Affiliation(s)
- Eric Friedlander
- Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637, USA,Department of Mathematics, Saint Norbert College, Green Bay, WI 54115, USA
| | - Matthias Steinrücken
- Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637, USA,Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA,Corresponding author: Department of Ecology & Evolution, The University of Chicago, 1101 E. 57th Street, Chicago, IL 60637, USA.
| |
Collapse
|
17
|
Good BH. Linkage disequilibrium between rare mutations. Genetics 2022; 220:6503502. [PMID: 35100407 PMCID: PMC8982034 DOI: 10.1093/genetics/iyac004] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 12/21/2021] [Indexed: 01/13/2023] Open
Abstract
The statistical associations between mutations, collectively known as linkage disequilibrium, encode important information about the evolutionary forces acting within a population. Yet in contrast to single-site analogues like the site frequency spectrum, our theoretical understanding of linkage disequilibrium remains limited. In particular, little is currently known about how mutations with different ages and fitness costs contribute to expected patterns of linkage disequilibrium, even in simple settings where recombination and genetic drift are the major evolutionary forces. Here, I introduce a forward-time framework for predicting linkage disequilibrium between pairs of neutral and deleterious mutations as a function of their present-day frequencies. I show that the dynamics of linkage disequilibrium become much simpler in the limit that mutations are rare, where they admit a simple heuristic picture based on the trajectories of the underlying lineages. I use this approach to derive analytical expressions for a family of frequency-weighted linkage disequilibrium statistics as a function of the recombination rate, the frequency scale, and the additive and epistatic fitness costs of the mutations. I find that the frequency scale can have a dramatic impact on the shapes of the resulting linkage disequilibrium curves, reflecting the broad range of time scales over which these correlations arise. I also show that the differences between neutral and deleterious linkage disequilibrium are not purely driven by differences in their mutation frequencies and can instead display qualitative features that are reminiscent of epistasis. I conclude by discussing the implications of these results for recent linkage disequilibrium measurements in bacteria. This forward-time approach may provide a useful framework for predicting linkage disequilibrium across a range of evolutionary scenarios.
Collapse
Affiliation(s)
- Benjamin H Good
- Department of Applied Physics, Stanford University, Stanford, CA 94305, USA,Corresponding author: Department of Applied Physics, Stanford University, Clark Center, 318 Campus Drive, Stanford, CA 94305, USA.
| |
Collapse
|
18
|
Montinaro F, Pankratov V, Yelmen B, Pagani L, Mondal M. Revisiting the out of Africa event with a deep-learning approach. Am J Hum Genet 2021; 108:2037-2051. [PMID: 34626535 PMCID: PMC8595897 DOI: 10.1016/j.ajhg.2021.09.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 09/09/2021] [Indexed: 10/20/2022] Open
Abstract
Anatomically modern humans evolved around 300 thousand years ago in Africa. They started to appear in the fossil record outside of Africa as early as 100 thousand years ago, although other hominins existed throughout Eurasia much earlier. Recently, several studies argued in favor of a single out of Africa event for modern humans on the basis of whole-genome sequence analyses. However, the single out of Africa model is in contrast with some of the findings from fossil records, which support two out of Africa events, and uniparental data, which propose a back to Africa movement. Here, we used a deep-learning approach coupled with approximate Bayesian computation and sequential Monte Carlo to revisit these hypotheses from the whole-genome sequence perspective. Our results support the back to Africa model over other alternatives. We estimated that there are two sequential separations between Africa and out of African populations happening around 60-90 thousand years ago and separated by 13-15 thousand years. One of the populations resulting from the more recent split has replaced the older West African population to a large extent, while the other one has founded the out of Africa populations.
Collapse
Affiliation(s)
- Francesco Montinaro
- Institute of Genomics, University of Tartu, Tartu 51010, Estonia; Department of Biology-Genetics, University of Bari, Bari 70124, Italy
| | - Vasili Pankratov
- Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Burak Yelmen
- Institute of Genomics, University of Tartu, Tartu 51010, Estonia; Institute of Molecular and Cell Biology, University of Tartu, Tartu 51010, Estonia; Université Paris-Saclay, CNRS UMR 9015, INRIA, Laboratoire Interdisciplinaire des Sciences du Numérique, 91400 Orsay, France
| | - Luca Pagani
- Institute of Genomics, University of Tartu, Tartu 51010, Estonia; Department of Biology, University of Padova, Padova 35121, Italy
| | - Mayukh Mondal
- Institute of Genomics, University of Tartu, Tartu 51010, Estonia.
| |
Collapse
|
19
|
Gutenkunst RN. dadi.CUDA: Accelerating Population Genetics Inference with Graphics Processing Units. Mol Biol Evol 2021; 38:2177-2178. [PMID: 33480999 PMCID: PMC8097298 DOI: 10.1093/molbev/msaa305] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
dadi is a popular but computationally intensive program for inferring models of demographic history and natural selection from population genetic data. I show that running dadi on a Graphics Processing Unit can dramatically speed computation compared with the CPU implementation, with minimal user burden. Motivated by this speed increase, I also extended dadi to four- and five-population models. This functionality is available in dadi version 2.1.0, https://bitbucket.org/gutenkunstlab/dadi/.
Collapse
Affiliation(s)
- Ryan N Gutenkunst
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ, USA
| |
Collapse
|
20
|
Ahlquist KD, Bañuelos MM, Funk A, Lai J, Rong S, Villanea FA, Witt KE. Our Tangled Family Tree: New Genomic Methods Offer Insight into the Legacy of Archaic Admixture. Genome Biol Evol 2021; 13:evab115. [PMID: 34028527 PMCID: PMC8480178 DOI: 10.1093/gbe/evab115] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 05/07/2021] [Accepted: 05/22/2021] [Indexed: 11/30/2022] Open
Abstract
The archaic ancestry present in the human genome has captured the imagination of both scientists and the wider public in recent years. This excitement is the result of new studies pushing the envelope of what we can learn from the archaic genetic information that has survived for over 50,000 years in the human genome. Here, we review the most recent ten years of literature on the topic of archaic introgression, including the current state of knowledge on Neanderthal and Denisovan introgression, as well as introgression from other as-yet unidentified archaic populations. We focus this review on four topics: 1) a reimagining of human demographic history, including evidence for multiple admixture events between modern humans, Neanderthals, Denisovans, and other archaic populations; 2) state-of-the-art methods for detecting archaic ancestry in population-level genomic data; 3) how these novel methods can detect archaic introgression in modern African populations; and 4) the functional consequences of archaic gene variants, including how those variants were co-opted into novel function in modern human populations. The goal of this review is to provide a simple-to-access reference for the relevant methods and novel data, which has changed our understanding of the relationship between our species and its siblings. This body of literature reveals the large degree to which the genetic legacy of these extinct hominins has been integrated into the human populations of today.
Collapse
Affiliation(s)
- K D Ahlquist
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, USA
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, Rhode Island, USA
| | - Mayra M Bañuelos
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, USA
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, Rhode Island, USA
| | - Alyssa Funk
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, USA
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, Rhode Island, USA
| | - Jiaying Lai
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, USA
- Brown Center for Biomedical Informatics, Brown University, Providence, Rhode Island, USA
| | - Stephen Rong
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, USA
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, Rhode Island, USA
| | - Fernando A Villanea
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, USA
- Department of Anthropology, University of Colorado Boulder, Colorado, USA
| | - Kelsey E Witt
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, USA
- Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island, USA
| |
Collapse
|
21
|
Garcia JA, Lohmueller KE. Negative linkage disequilibrium between amino acid changing variants reveals interference among deleterious mutations in the human genome. PLoS Genet 2021; 17:e1009676. [PMID: 34319975 PMCID: PMC8351996 DOI: 10.1371/journal.pgen.1009676] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 08/09/2021] [Accepted: 06/22/2021] [Indexed: 11/18/2022] Open
Abstract
Evolutionary forces like Hill-Robertson interference and negative epistasis can lead to deleterious mutations being found on distinct haplotypes. However, the extent to which these forces depend on the selection and dominance coefficients of deleterious mutations and shape genome-wide patterns of linkage disequilibrium (LD) in natural populations with complex demographic histories has not been tested. In this study, we first used forward-in-time simulations to predict how negative selection impacts LD. Under models where deleterious mutations have additive effects on fitness, deleterious variants less than 10 kb apart tend to be carried on different haplotypes relative to pairs of synonymous SNPs. In contrast, for recessive mutations, there is no consistent ordering of how selection coefficients affect LD decay, due to the complex interplay of different evolutionary effects. We then examined empirical data of modern humans from the 1000 Genomes Project. LD between derived alleles at nonsynonymous SNPs is lower compared to pairs of derived synonymous variants, suggesting that nonsynonymous derived alleles tend to occur on different haplotypes more than synonymous variants. This result holds when controlling for potential confounding factors by matching SNPs for frequency in the sample (allele count), physical distance, magnitude of background selection, and genetic distance between pairs of variants. Lastly, we introduce a new statistic HR(j) which allows us to detect interference using unphased genotypes. Application of this approach to high-coverage human genome sequences confirms our finding that nonsynonymous derived alleles tend to be located on different haplotypes more often than are synonymous derived alleles. Our findings suggest that interference may play a pervasive role in shaping patterns of LD between deleterious variants in the human genome, and consequently influences genome-wide patterns of LD.
Collapse
Affiliation(s)
- Jesse A. Garcia
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California, United States of America
| | - Kirk E. Lohmueller
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California, United States of America
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, United States of America
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California, United States of America
| |
Collapse
|
22
|
Abstract
Recent studies suggest that admixture with archaic hominins played an important role in facilitating biological adaptations to new environments. For example, interbreeding with Denisovans facilitated the adaptation to high-altitude environments on the Tibetan Plateau. Specifically, the EPAS1 gene, a transcription factor that regulates the response to hypoxia, exhibits strong signatures of both positive selection and introgression from Denisovans in Tibetan individuals. Interestingly, despite being geographically closer to the Denisova Cave, East Asian populations do not harbor as much Denisovan ancestry as populations from Melanesia. Recently, two studies have suggested two independent waves of Denisovan admixture into East Asians, one of which is shared with South Asians and Oceanians. Here, we leverage data from EPAS1 in 78 Tibetan individuals to interrogate which of these two introgression events introduced the EPAS1 beneficial sequence into the ancestral population of Tibetans, and we use the distribution of introgressed segment lengths at this locus to infer the timing of the introgression and selection event. We find that the introgression event unique to East Asians most likely introduced the beneficial haplotype into the ancestral population of Tibetans around 48,700 (16,000-59,500) y ago, and selection started around 9,000 (2,500-42,000) y ago. Our estimates suggest that one of the most convincing examples of adaptive introgression is in fact selection acting on standing archaic variation.
Collapse
|
23
|
Gower G, Picazo PI, Fumagalli M, Racimo F. Detecting adaptive introgression in human evolution using convolutional neural networks. eLife 2021; 10:64669. [PMID: 34032215 PMCID: PMC8192126 DOI: 10.7554/elife.64669] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 05/24/2021] [Indexed: 01/10/2023] Open
Abstract
Studies in a variety of species have shown evidence for positively selected variants introduced into a population via introgression from another, distantly related population—a process known as adaptive introgression. However, there are few explicit frameworks for jointly modelling introgression and positive selection, in order to detect these variants using genomic sequence data. Here, we develop an approach based on convolutional neural networks (CNNs). CNNs do not require the specification of an analytical model of allele frequency dynamics and have outperformed alternative methods for classification and parameter estimation tasks in various areas of population genetics. Thus, they are potentially well suited to the identification of adaptive introgression. Using simulations, we trained CNNs on genotype matrices derived from genomes sampled from the donor population, the recipient population and a related non-introgressed population, in order to distinguish regions of the genome evolving under adaptive introgression from those evolving neutrally or experiencing selective sweeps. Our CNN architecture exhibits 95% accuracy on simulated data, even when the genomes are unphased, and accuracy decreases only moderately in the presence of heterosis. As a proof of concept, we applied our trained CNNs to human genomic datasets—both phased and unphased—to detect candidates for adaptive introgression that shaped our evolutionary history.
Collapse
Affiliation(s)
- Graham Gower
- Lundbeck GeoGenetics Centre, Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Pablo Iáñez Picazo
- Lundbeck GeoGenetics Centre, Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Matteo Fumagalli
- Department of Life Sciences, Silwood Park Campus, Imperial College London, London, United Kingdom
| | - Fernando Racimo
- Lundbeck GeoGenetics Centre, Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
24
|
Gopalan S, Atkinson EG, Buck LT, Weaver TD, Henn BM. Inferring archaic introgression from hominin genetic data. Evol Anthropol 2021; 30:199-220. [PMID: 33951239 PMCID: PMC8360192 DOI: 10.1002/evan.21895] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Revised: 08/03/2020] [Accepted: 03/29/2021] [Indexed: 01/05/2023]
Abstract
Questions surrounding the timing, extent, and evolutionary consequences of archaic admixture into human populations have a long history in evolutionary anthropology. More recently, advances in human genetics, particularly in the field of ancient DNA, have shed new light on the question of whether or not Homo sapiens interbred with other hominin groups. By the late 1990s, published genetic work had largely concluded that archaic groups made no lasting genetic contribution to modern humans; less than a decade later, this conclusion was reversed following the successful DNA sequencing of an ancient Neanderthal. This reversal of consensus is noteworthy, but the reasoning behind it is not widely understood across all academic communities. There remains a communication gap between population geneticists and paleoanthropologists. In this review, we endeavor to bridge this gap by outlining how technological advancements, new statistical methods, and notable controversies ultimately led to the current consensus.
Collapse
Affiliation(s)
- Shyamalika Gopalan
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, New York, USA.,Department of Evolutionary Anthropology, Duke University, Durham, North Carolina, USA
| | - Elizabeth G Atkinson
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, New York, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital and Stanley Center for Psychiatric Research, Broad Institute, Boston, Massachusetts, USA
| | - Laura T Buck
- Research Centre in Evolutionary Anthropology and Palaeoecology, Liverpool John Moores University, Liverpool, UK
| | - Timothy D Weaver
- Department of Anthropology, University of California, Davis, California, USA
| | - Brenna M Henn
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, New York, USA.,Department of Anthropology, University of California, Davis, California, USA.,UC Davis Genome Center, University of California, Davis, California, USA
| |
Collapse
|
25
|
Hollfelder N, Breton G, Sjödin P, Jakobsson M. The deep population history in Africa. Hum Mol Genet 2021; 30:R2-R10. [PMID: 33438014 PMCID: PMC8117439 DOI: 10.1093/hmg/ddab005] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 12/22/2020] [Accepted: 01/05/2021] [Indexed: 12/28/2022] Open
Abstract
Africa is the continent with the greatest genetic diversity among humans and the level of diversity is further enhanced by incorporating non-majority groups, which are often understudied. Many of today's minority populations historically practiced foraging lifestyles, which were the only subsistence strategies prior to the rise of agriculture and pastoralism, but only a few groups practicing these strategies remain today. Genomic investigations of Holocene human remains excavated across the African continent show that the genetic landscape was vastly different compared to today's genetic landscape and that many groups that today are population isolate inhabited larger regions in the past. It is becoming clear that there are periods of isolation among groups and geographic areas, but also genetic contact over large distances throughout human history in Africa. Genomic information from minority populations and from prehistoric remains provide an invaluable source of information on the human past, in particular deep human population history, as Holocene large-scale population movements obscure past patterns of population structure. Here we revisit questions on the nature and time of the radiation of early humans in Africa, the extent of gene-flow among human populations as well as introgression from archaic and extinct lineages on the continent.
Collapse
Affiliation(s)
- Nina Hollfelder
- Human Evolution, Department of Organismal Biology, Uppsala University, Norbyvägen 18C, 75236 Uppsala, Sweden
| | - Gwenna Breton
- Human Evolution, Department of Organismal Biology, Uppsala University, Norbyvägen 18C, 75236 Uppsala, Sweden
| | - Per Sjödin
- Human Evolution, Department of Organismal Biology, Uppsala University, Norbyvägen 18C, 75236 Uppsala, Sweden
| | - Mattias Jakobsson
- Human Evolution, Department of Organismal Biology, Uppsala University, Norbyvägen 18C, 75236 Uppsala, Sweden
- Palaeo-Research Institute, University of Johannesburg, Physical, Cnr Kingsway & University Roads, Auckland Park, Johannesburg 2092, South Africa
- SciLifeLab, Stockholm and Uppsala, Entrance C11, BMC, Husargatan 3, 752 37 Uppsala, Sweden
| |
Collapse
|
26
|
Tennessen JA, Duraisingh MT. Three Signatures of Adaptive Polymorphism Exemplified by Malaria-Associated Genes. Mol Biol Evol 2021; 38:1356-1371. [PMID: 33185667 PMCID: PMC8042748 DOI: 10.1093/molbev/msaa294] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Malaria has been one of the strongest selective pressures on our species. Many of the best-characterized cases of adaptive evolution in humans are in genes tied to malaria resistance. However, the complex evolutionary patterns at these genes are poorly captured by standard scans for nonneutral evolution. Here, we present three new statistical tests for selection based on population genetic patterns that are observed more than once among key malaria resistance loci. We assess these tests using forward-time evolutionary simulations and apply them to global whole-genome sequencing data from humans, and thus we show that they are effective at distinguishing selection from neutrality. Each test captures a distinct evolutionary pattern, here called Divergent Haplotypes, Repeated Shifts, and Arrested Sweeps, associated with a particular period of human prehistory. We clarify the selective signatures at known malaria-relevant genes and identify additional genes showing similar adaptive evolutionary patterns. Among our top outliers, we see a particular enrichment for genes involved in erythropoiesis and for genes previously associated with malaria resistance, consistent with a major role for malaria in shaping these patterns of genetic diversity. Polymorphisms at these genes are likely to impact resistance to malaria infection and contribute to ongoing host-parasite coevolutionary dynamics.
Collapse
|
27
|
Abstract
Throughout human history, large-scale migrations have facilitated the formation of populations with ancestry from multiple previously separated populations. This process leads to subsequent shuffling of genetic ancestry through recombination, producing variation in ancestry between populations, among individuals in a population, and along the genome within an individual. Recent methodological and empirical developments have elucidated the genomic signatures of this admixture process, bringing previously understudied admixed populations to the forefront of population and medical genetics. Under this theme, we present a collection of recent PLOS Genetics publications that exemplify recent progress in human genetic admixture studies, and we discuss potential areas for future work.
Collapse
Affiliation(s)
- Katharine L. Korunes
- Department of Evolutionary Anthropology, Duke University, Durham, North Carolina, United States of America
| | - Amy Goldberg
- Department of Evolutionary Anthropology, Duke University, Durham, North Carolina, United States of America
| |
Collapse
|
28
|
Bergström A, Stringer C, Hajdinjak M, Scerri EML, Skoglund P. Origins of modern human ancestry. Nature 2021; 590:229-237. [PMID: 33568824 DOI: 10.1038/s41586-021-03244-5] [Citation(s) in RCA: 72] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 12/14/2020] [Indexed: 01/30/2023]
Abstract
New finds in the palaeoanthropological and genomic records have changed our view of the origins of modern human ancestry. Here we review our current understanding of how the ancestry of modern humans around the globe can be traced into the deep past, and which ancestors it passes through during our journey back in time. We identify three key phases that are surrounded by major questions, and which will be at the frontiers of future research. The most recent phase comprises the worldwide expansion of modern humans between 40 and 60 thousand years ago (ka) and their last known contacts with archaic groups such as Neanderthals and Denisovans. The second phase is associated with a broadly construed African origin of modern human diversity between 60 and 300 ka. The oldest phase comprises the complex separation of modern human ancestors from archaic human groups from 0.3 to 1 million years ago. We argue that no specific point in time can currently be identified at which modern human ancestry was confined to a limited birthplace, and that patterns of the first appearance of anatomical or behavioural traits that are used to define Homo sapiens are consistent with a range of evolutionary histories.
Collapse
Affiliation(s)
- Anders Bergström
- Ancient Genomics Laboratory, Francis Crick Institute, London, UK
| | - Chris Stringer
- Department of Earth Sciences, Natural History Museum, London, UK.
| | - Mateja Hajdinjak
- Ancient Genomics Laboratory, Francis Crick Institute, London, UK
| | - Eleanor M L Scerri
- Pan-African Evolution Research Group, Max Planck Institute for Science of Human History, Jena, Germany.,Department of Classics and Archaeology, University of Malta, Msida, Malta.,Institute of Prehistoric Archaeology, University of Cologne, Cologne, Germany
| | - Pontus Skoglund
- Ancient Genomics Laboratory, Francis Crick Institute, London, UK.
| |
Collapse
|
29
|
Ragsdale AP, Gravel S. Unbiased Estimation of Linkage Disequilibrium from Unphased Data. Mol Biol Evol 2020; 37:923-932. [PMID: 31697386 DOI: 10.1093/molbev/msz265] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Linkage disequilibrium (LD) is used to infer evolutionary history, to identify genomic regions under selection, and to dissect the relationship between genotype and phenotype. In each case, we require accurate estimates of LD statistics from sequencing data. Unphased data present a challenge because multilocus haplotypes cannot be inferred exactly. Widely used estimators for the common statistics r2 and D2 exhibit large and variable upward biases that complicate interpretation and comparison across cohorts. Here, we show how to find unbiased estimators for a wide range of two-locus statistics, including D2, for both single and multiple randomly mating populations. These unbiased statistics are particularly well suited to estimate effective population sizes from unlinked loci in small populations. We develop a simple inference pipeline and use it to refine estimates of recent effective population sizes of the threatened Channel Island Fox populations.
Collapse
Affiliation(s)
- Aaron P Ragsdale
- Department of Human Genetics, McGill University, Montreal, QC, Canada
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montreal, QC, Canada
| |
Collapse
|
30
|
Abstract
Simulation plays a central role in population genomics studies. Recent years have seen rapid improvements in software efficiency that make it possible to simulate large genomic regions for many individuals sampled from large numbers of populations. As the complexity of the demographic models we study grows, however, there is an ever-increasing opportunity to introduce bugs in their implementation. Here, we describe two errors made in defining population genetic models using the msprime coalescent simulator that have found their way into the published record. We discuss how these errors have affected downstream analyses and give recommendations for software developers and users to reduce the risk of such errors.
Collapse
|
31
|
Mapping gene flow between ancient hominins through demography-aware inference of the ancestral recombination graph. PLoS Genet 2020; 16:e1008895. [PMID: 32760067 PMCID: PMC7410169 DOI: 10.1371/journal.pgen.1008895] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Accepted: 05/29/2020] [Indexed: 01/09/2023] Open
Abstract
The sequencing of Neanderthal and Denisovan genomes has yielded many new insights about interbreeding events between extinct hominins and the ancestors of modern humans. While much attention has been paid to the relatively recent gene flow from Neanderthals and Denisovans into modern humans, other instances of introgression leave more subtle genomic evidence and have received less attention. Here, we present a major extension of the ARGweaver algorithm, called ARGweaver-D, which can infer local genetic relationships under a user-defined demographic model that includes population splits and migration events. This Bayesian algorithm probabilistically samples ancestral recombination graphs (ARGs) that specify not only tree topologies and branch lengths along the genome, but also indicate migrant lineages. The sampled ARGs can therefore be parsed to produce probabilities of introgression along the genome. We show that this method is well powered to detect the archaic migration into modern humans, even with only a few samples. We then show that the method can also detect introgressed regions stemming from older migration events, or from unsampled populations. We apply it to human, Neanderthal, and Denisovan genomes, looking for signatures of older proposed migration events, including ancient humans into Neanderthal, and unknown archaic hominins into Denisovans. We identify 3% of the Neanderthal genome that is putatively introgressed from ancient humans, and estimate that the gene flow occurred between 200-300kya. We find no convincing evidence that negative selection acted against these regions. Finally, we predict that 1% of the Denisovan genome was introgressed from an unsequenced, but highly diverged, archaic hominin ancestor. About 15% of these “super-archaic” regions—comprising at least about 4Mb—were, in turn, introgressed into modern humans and continue to exist in the genomes of people alive today. We present ARGweaver-D, an extension of the ARGweaver algorithm which can be applied under a user-defined demographic model including population splits and migration events. Given genome sequence data from a collection of individuals across multiple closely related populations or subspecies, ARGweaver-D can infer trees describing the genetic relationships among these individuals at every location along the genome, conditional on the demographic model. Like ARGweaver, ARGweaver-D is a Bayesian method, sampling trees from the posterior distribution in order to account for uncertainty. Using simulations, we show that ARGweaver-D can successfully identify regions introgressed from Neanderthals and Denisovans into modern humans. It is also well-powered to detect introgressed regions stemming from older gene-flow events. We apply ARGweaver-D to the genomes of two Neanderthals, a Denisovan, and two African humans. We identify 3% of the Neanderthal genome which is likely derived from gene flow from ancient humans. We also identify about 1% of the Denisovan genome that may be traced to an unsequenced archaic hominin; 15% of these regions were subsequently passed to modern humans. We find no convincing evidence that selection acted against any of these introgressed regions.
Collapse
|
32
|
Adrion JR, Cole CB, Dukler N, Galloway JG, Gladstein AL, Gower G, Kyriazis CC, Ragsdale AP, Tsambos G, Baumdicker F, Carlson J, Cartwright RA, Durvasula A, Gronau I, Kim BY, McKenzie P, Messer PW, Noskova E, Ortega-Del Vecchyo D, Racimo F, Struck TJ, Gravel S, Gutenkunst RN, Lohmueller KE, Ralph PL, Schrider DR, Siepel A, Kelleher J, Kern AD. A community-maintained standard library of population genetic models. eLife 2020; 9:e54967. [PMID: 32573438 PMCID: PMC7438115 DOI: 10.7554/elife.54967] [Citation(s) in RCA: 77] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Accepted: 06/15/2020] [Indexed: 12/18/2022] Open
Abstract
The explosion in population genomic data demands ever more complex modes of analysis, and increasingly, these analyses depend on sophisticated simulations. Recent advances in population genetic simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to inconsistency and duplication of effort. This situation presents a major barrier to empirical researchers seeking to use simulations for power analyses of upcoming studies or sanity checks on existing genomic data. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here, we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a growing catalog of published simulation models from a range of organisms and supports multiple simulation engine backends. This resource is available as a well-documented python library with a simple command-line interface. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage a broader community of developers to contribute to this growing resource.
Collapse
Affiliation(s)
- Jeffrey R Adrion
- Department of Biology and Institute of Ecology and Evolution, University of OregonEugeneUnited States
| | - Christopher B Cole
- Weatherall Institute of Molecular Medicine, University of OxfordOxfordUnited Kingdom
| | - Noah Dukler
- Simons Center for Quantitative Biology, Cold Spring Harbor LaboratoryCold Spring HarborUnited States
| | - Jared G Galloway
- Department of Biology and Institute of Ecology and Evolution, University of OregonEugeneUnited States
| | - Ariella L Gladstein
- Department of Genetics, University of North Carolina at Chapel HillChapel HillUnited States
| | - Graham Gower
- Lundbeck GeoGenetics Centre, Globe Institute, University of CopenhagenCopenhagenDenmark
| | - Christopher C Kyriazis
- Department of Ecology and Evolutionary Biology, University of California, Los AngelesLos AngelesUnited States
| | | | - Georgia Tsambos
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of MelbourneMelbourneAustralia
| | - Franz Baumdicker
- Department of Mathematical Stochastics, University of FreiburgFreiburgGermany
| | - Jedidiah Carlson
- Department of Genome Sciences, University of WashingtonSeattleUnited States
| | - Reed A Cartwright
- The Biodesign Institute and The School of Life Sciences, Arizona State UniversityTempeUnited States
| | - Arun Durvasula
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los AngelesLos AngelesUnited States
| | - Ilan Gronau
- The Efi Arazi School of Computer Science, Herzliya Interdisciplinary CenterHerzliyaIsrael
| | - Bernard Y Kim
- Department of Biology, Stanford UniversityStanfordUnited States
| | - Patrick McKenzie
- Department of Ecology, Evolution, and Environmental Biology, Columbia UniversityNew YorkUnited States
| | - Philipp W Messer
- Department of Computational BiologyCornell UniversityIthacaUnited States
| | - Ekaterina Noskova
- Computer Technologies Laboratory, ITMO UniversitySaint PetersburgRussian Federation
| | - Diego Ortega-Del Vecchyo
- International Laboratory for Human Genome Research, National Autonomous University of MexicoJuriquillaMexico
| | - Fernando Racimo
- Lundbeck GeoGenetics Centre, Globe Institute, University of CopenhagenCopenhagenDenmark
| | - Travis J Struck
- Departmentof Molecular and Cellular Biology, University of ArizonaTucsonUnited States
| | - Simon Gravel
- Department of Human Genetics, McGill UniversityMontrealCanada
| | - Ryan N Gutenkunst
- Departmentof Molecular and Cellular Biology, University of ArizonaTucsonUnited States
| | - Kirk E Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los AngelesLos AngelesUnited States
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los AngelesLos AngelesUnited States
| | - Peter L Ralph
- Department of Biology and Institute of Ecology and Evolution, University of OregonEugeneUnited States
- Department of Mathematics, University of OregonEugeneUnited States
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina at Chapel HillChapel HillUnited States
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor LaboratoryCold Spring HarborUnited States
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of OxfordOxfordUnited Kingdom
| | - Andrew D Kern
- Department of Biology and Institute of Ecology and Evolution, University of OregonEugeneUnited States
| |
Collapse
|
33
|
Sankararaman S. Methods for detecting introgressed archaic sequences. Curr Opin Genet Dev 2020; 62:85-90. [PMID: 32717667 PMCID: PMC7484293 DOI: 10.1016/j.gde.2020.05.026] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Revised: 05/12/2020] [Accepted: 05/22/2020] [Indexed: 11/16/2022]
Abstract
Analysis of genome sequences from archaic and modern humans have revealed multiple episodes of admixture between highly-diverged population groups. Statistical methods that attempt to localize DNA segments introduced by these events offer a powerful tool to investigate recent human evolution. We review recent advances in methods for detecting introgressed sequences.
Collapse
Affiliation(s)
- Sriram Sankararaman
- Department of Computer Science, University of California, Los Angeles, CA 90095, United States; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, United States; Department of Computational Medicine, University of California, Los Angeles, CA 90095, United States.
| |
Collapse
|
34
|
Durvasula A, Sankararaman S. Recovering signals of ghost archaic introgression in African populations. SCIENCE ADVANCES 2020; 6:eaax5097. [PMID: 32095519 PMCID: PMC7015685 DOI: 10.1126/sciadv.aax5097] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Accepted: 12/03/2019] [Indexed: 05/18/2023]
Abstract
While introgression from Neanderthals and Denisovans has been documented in modern humans outside Africa, the contribution of archaic hominins to the genetic variation of present-day Africans remains poorly understood. We provide complementary lines of evidence for archaic introgression into four West African populations. Our analyses of site frequency spectra indicate that these populations derive 2 to 19% of their genetic ancestry from an archaic population that diverged before the split of Neanderthals and modern humans. Using a method that can identify segments of archaic ancestry without the need for reference archaic genomes, we built genome-wide maps of archaic ancestry in the Yoruba and the Mende populations. Analyses of these maps reveal segments of archaic ancestry at high frequency in these populations that represent potential targets of adaptive introgression. Our results reveal the substantial contribution of archaic ancestry in shaping the gene pool of present-day West African populations.
Collapse
Affiliation(s)
- Arun Durvasula
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Sriram Sankararaman
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
35
|
Affiliation(s)
- Kelley Harris
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
| |
Collapse
|
36
|
Lipson M, Ribot I, Mallick S, Rohland N, Olalde I, Adamski N, Broomandkhoshbacht N, Lawson AM, López S, Oppenheimer J, Stewardson K, Asombang RN, Bocherens H, Bradman N, Culleton BJ, Cornelissen E, Crevecoeur I, de Maret P, Fomine FLM, Lavachery P, Mindzie CM, Orban R, Sawchuk E, Semal P, Thomas MG, Van Neer W, Veeramah KR, Kennett DJ, Patterson N, Hellenthal G, Lalueza-Fox C, MacEachern S, Prendergast ME, Reich D. Ancient West African foragers in the context of African population history. Nature 2020; 577:665-670. [PMID: 31969706 PMCID: PMC8386425 DOI: 10.1038/s41586-020-1929-1] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Accepted: 11/29/2019] [Indexed: 12/31/2022]
Abstract
Our knowledge of ancient human population structure in sub-Saharan Africa, particularly prior to the advent of food production, remains limited. Here we report genome-wide DNA data from four children-two of whom were buried approximately 8,000 years ago and two 3,000 years ago-from Shum Laka (Cameroon), one of the earliest known archaeological sites within the probable homeland of the Bantu language group1-11. One individual carried the deeply divergent Y chromosome haplogroup A00, which today is found almost exclusively in the same region12,13. However, the genome-wide ancestry profiles of all four individuals are most similar to those of present-day hunter-gatherers from western Central Africa, which implies that populations in western Cameroon today-as well as speakers of Bantu languages from across the continent-are not descended substantially from the population represented by these four people. We infer an Africa-wide phylogeny that features widespread admixture and three prominent radiations, including one that gave rise to at least four major lineages deep in the history of modern humans.
Collapse
Affiliation(s)
- Mark Lipson
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
| | - Isabelle Ribot
- Département d'Anthropologie, Université de Montréal, Montreal, Quebec, Canada
| | - Swapan Mallick
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Medical and Population Genetics Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Nadin Rohland
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Iñigo Olalde
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Institute of Evolutionary Biology (CSIC-UPF), Barcelona, Spain
| | - Nicole Adamski
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Nasreen Broomandkhoshbacht
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
- Department of Anthropology, University of California, Santa Cruz, CA, USA
| | - Ann Marie Lawson
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Saioa López
- UCL Genetics Institute, University College London, London, UK
| | - Jonas Oppenheimer
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA
| | - Kristin Stewardson
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | | | - Hervé Bocherens
- Department of Geosciences, Biogeology, University of Tübingen, Tübingen, Germany
- Senckenberg Research Centre for Human Evolution and Palaeoenvironment, University of Tübingen, Tübingen, Germany
| | - Neil Bradman
- UCL Genetics Institute, University College London, London, UK
- The Henry Stewart Group, London, UK
| | - Brendan J Culleton
- Institutes of Energy and the Environment, Pennsylvania State University, University Park, PA, USA
| | - Els Cornelissen
- Department of Cultural Anthropology and History, Royal Museum for Central Africa, Tervuren, Belgium
| | | | - Pierre de Maret
- Faculté de Philosophie et Sciences Sociales, Université Libre de Bruxelles, Brussels, Belgium
| | | | - Philippe Lavachery
- Agence Wallonne du Patrimoine, Service Public de Wallonie, Namur, Belgium
| | | | - Rosine Orban
- Royal Belgian Institute of Natural Sciences, Brussels, Belgium
| | - Elizabeth Sawchuk
- Department of Anthropology, Stony Brook University, Stony Brook, NY, USA
| | - Patrick Semal
- Royal Belgian Institute of Natural Sciences, Brussels, Belgium
| | - Mark G Thomas
- UCL Genetics Institute, University College London, London, UK
- Department of Genetics, Evolution and Environment, University College London, London, UK
| | - Wim Van Neer
- Royal Belgian Institute of Natural Sciences, Brussels, Belgium
- Department of Biology, University of Leuven, Leuven, Belgium
| | - Krishna R Veeramah
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY, USA
| | - Douglas J Kennett
- Department of Anthropology, University of California, Santa Barbara, CA, USA
| | - Nick Patterson
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Garrett Hellenthal
- UCL Genetics Institute, University College London, London, UK
- Department of Genetics, Evolution and Environment, University College London, London, UK
| | | | - Scott MacEachern
- Division of Social Science, Duke Kunshan University, Kunshan, China
| | - Mary E Prendergast
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Department of Sociology and Anthropology, Saint Louis University, Madrid, Spain
| | - David Reich
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Medical and Population Genetics Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
37
|
A method for genome-wide genealogy estimation for thousands of samples. Nat Genet 2019; 51:1321-1329. [PMID: 31477933 DOI: 10.1038/s41588-019-0484-x] [Citation(s) in RCA: 207] [Impact Index Per Article: 41.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Accepted: 07/15/2019] [Indexed: 01/29/2023]
Abstract
Knowledge of genome-wide genealogies for thousands of individuals would simplify most evolutionary analyses for humans and other species, but has remained computationally infeasible. We have developed a method, Relate, scaling to >10,000 sequences while simultaneously estimating branch lengths, mutational ages and variable historical population sizes, as well as allowing for data errors. Application to 1,000 Genomes Project haplotypes produces joint genealogical histories for 26 human populations. Highly diverged lineages are present in all groups, but most frequent in Africa. Outside Africa, these mainly reflect ancient introgression from groups related to Neanderthals and Denisovans, while African signals instead reflect unknown events unique to that continent. Our approach allows more powerful inferences of natural selection than has previously been possible. We identify multiple regions under strong positive selection, and multi-allelic traits including hair color, body mass index and blood pressure, showing strong evidence of directional selection, varying among human groups.
Collapse
|