1
|
Caduff M, Eckel R, Leuenberger C, Wegmann D. Accurate Bayesian inference of sex chromosome karyotypes and sex-linked scaffolds from low-depth sequencing data. Mol Ecol Resour 2024; 24:e13913. [PMID: 38173222 DOI: 10.1111/1755-0998.13913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Revised: 11/27/2023] [Accepted: 11/30/2023] [Indexed: 01/05/2024]
Abstract
The identification of sex-linked scaffolds and the genetic sex of individuals, i.e. their sex karyotype, is a fundamental step in population genomic studies. If sex-linked scaffolds are known, single individuals may be sexed based on read counts of next-generation sequencing data. If both sex-linked scaffolds as well as sex karyotypes are unknown, as is often the case for non-model organisms, they have to be jointly inferred. For both cases, current methods rely on arbitrary thresholds, which limits their power for low-depth data. In addition, most current methods are limited to euploid sex karyotypes (XX and XY). Here we develop BeXY, a fully Bayesian method to jointly infer the posterior probabilities for each scaffold to be autosomal, X- or Y-linked and for each individual to be any of the sex karyotypes XX, XY, X0, XXX, XXY, XYY and XXYY. If the sex-linked scaffolds are known, it also identifies autosomal trisomies and estimates the sex karyotype posterior probabilities for single individuals. As we show with downsampling experiments, BeXY has higher power than all existing methods. It accurately infers the sex karyotype of ancient human samples with as few as 20,000 reads and accurately infers sex-linked scaffolds from data sets of just a handful of samples or with highly imbalanced sex ratios, also in the case of low-quality reference assemblies. We illustrate the power of BeXY by applying it to both whole-genome shotgun and target enrichment sequencing data of ancient and modern humans, as well as several non-model organisms.
Collapse
Affiliation(s)
- Madleina Caduff
- Department of Biology, University of Fribourg, Fribourg, Switzerland
- Swiss Institute of Bioinformatics, Fribourg, Switzerland
| | - Raphael Eckel
- Department of Biology, University of Fribourg, Fribourg, Switzerland
- Swiss Institute of Bioinformatics, Fribourg, Switzerland
| | - Christoph Leuenberger
- Department of Biology, University of Fribourg, Fribourg, Switzerland
- Swiss Institute of Bioinformatics, Fribourg, Switzerland
| | - Daniel Wegmann
- Department of Biology, University of Fribourg, Fribourg, Switzerland
- Swiss Institute of Bioinformatics, Fribourg, Switzerland
| |
Collapse
|
2
|
Mallick S, Micco A, Mah M, Ringbauer H, Lazaridis I, Olalde I, Patterson N, Reich D. The Allen Ancient DNA Resource (AADR) a curated compendium of ancient human genomes. Sci Data 2024; 11:182. [PMID: 38341426 PMCID: PMC10858950 DOI: 10.1038/s41597-024-03031-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 01/31/2024] [Indexed: 02/12/2024] Open
Abstract
More than two hundred papers have reported genome-wide data from ancient humans. While the raw data for the vast majority are fully publicly available testifying to the commitment of the paleogenomics community to open data, formats for both raw data and meta-data differ. There is thus a need for uniform curation and a centralized, version-controlled compendium that researchers can download, analyze, and reference. Since 2019, we have been maintaining the Allen Ancient DNA Resource (AADR), which aims to provide an up-to-date, curated version of the world's published ancient human DNA data, represented at more than a million single nucleotide polymorphisms (SNPs) at which almost all ancient individuals have been assayed. The AADR has gone through six public releases at the time of writing and review of this manuscript, and crossed the threshold of >10,000 individuals with published genome-wide ancient DNA data at the end of 2022. This note is intended as a citable descriptor of the AADR.
Collapse
Affiliation(s)
- Swapan Mallick
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
- Howard Hughes Medical Institute, Boston, MA, 02115, USA.
| | - Adam Micco
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
- Howard Hughes Medical Institute, Boston, MA, 02115, USA
| | - Matthew Mah
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
- Howard Hughes Medical Institute, Boston, MA, 02115, USA
| | - Harald Ringbauer
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA
- Max Planck Institute for Evolutionary Anthropology, Leipzig, 04103, Germany
| | - Iosif Lazaridis
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA
| | - Iñigo Olalde
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
- BIOMICs Research Group, University of the Basque Country, 01006, Vitoria-Gasteiz, Spain
| | - Nick Patterson
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA
| | - David Reich
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
- Howard Hughes Medical Institute, Boston, MA, 02115, USA.
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA.
| |
Collapse
|
3
|
Fournier R, Tsangalidou Z, Reich D, Palamara PF. Haplotype-based inference of recent effective population size in modern and ancient DNA samples. Nat Commun 2023; 14:7945. [PMID: 38040695 PMCID: PMC10692198 DOI: 10.1038/s41467-023-43522-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Accepted: 11/10/2023] [Indexed: 12/03/2023] Open
Abstract
Individuals sharing recent ancestors are likely to co-inherit large identical-by-descent (IBD) genomic regions. The distribution of these IBD segments in a population may be used to reconstruct past demographic events such as effective population size variation, but accurate IBD detection is difficult in ancient DNA data and in underrepresented populations with limited reference data. In this work, we introduce an accurate method for inferring effective population size variation during the past ~2000 years in both modern and ancient DNA data, called HapNe. HapNe infers recent population size fluctuations using either IBD sharing (HapNe-IBD) or linkage disequilibrium (HapNe-LD), which does not require phasing and can be computed in low coverage data, including data sets with heterogeneous sampling times. HapNe shows improved accuracy in a range of simulated demographic scenarios compared to currently available methods for IBD-based and LD-based inference of recent effective population size, while requiring fewer computational resources. We apply HapNe to several modern populations from the 1,000 Genomes Project, the UK Biobank, the Allen Ancient DNA Resource, and recently published samples from Iron Age Britain, detecting multiple instances of recent effective population size variation across these groups.
Collapse
Affiliation(s)
| | | | - David Reich
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Pier Francesco Palamara
- Department of Statistics, University of Oxford, Oxford, UK.
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.
| |
Collapse
|
4
|
Pandey D, Harris M, Garud NR, Narasimhan VM. Understanding natural selection in Holocene Europe using multi-locus genotype identity scans. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.24.538113. [PMID: 37163039 PMCID: PMC10168228 DOI: 10.1101/2023.04.24.538113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Ancient DNA (aDNA) has been a revolutionary technology in understanding human history but has not been used extensively to study natural selection as large sample sizes to study allele frequency changes over time have thus far not been available. Here, we examined a time transect of 708 published samples over the past 7,000 years of European history using multi-locus genotype-based selection scans. As aDNA data is affected by high missingness, ascertainment bias, DNA damage, random allele calling, and is unphased, we first validated our selection scan, G 12 a n c i e n t , on simulated data resembling aDNA under a demographic model that captures broad features of the allele frequency spectrum of European genomes as well as positive controls that have been previously identified and functionally validated in modern European datasets on data from ancient individuals from time periods very close to the present time. We then applied our statistic to the aDNA time transect to detect and resolve the timing of natural selection occurring genome wide and found several candidates of selection across the different time periods that had not been picked up by selection scans using single SNP allele frequency approaches. In addition, enrichment analysis discovered multiple categories of complex traits that might be under adaptation across these periods. Our results demonstrate the utility of applying different types of selection scans to aDNA to uncover putative selection signals at loci in the ancient past that might have been masked in modern samples.
Collapse
Affiliation(s)
- Devansh Pandey
- Department of Integrative Biology, The University of Texas at Austin
| | - Mariana Harris
- Department of Computational Medicine, University of California, Los Angeles
| | - Nandita R Garud
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles
- Department of Human Genetics, University of California, Los Angeles
| | - Vagheesh M Narasimhan
- Department of Integrative Biology, The University of Texas at Austin
- Department of Statistics and Data Science, The University of Texas at Austin
| |
Collapse
|
5
|
Davy T, Ju D, Mathieson I, Skoglund P. Hunter-gatherer admixture facilitated natural selection in Neolithic European farmers. Curr Biol 2023; 33:1365-1371.e3. [PMID: 36963383 PMCID: PMC10153476 DOI: 10.1016/j.cub.2023.02.049] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 11/17/2022] [Accepted: 02/15/2023] [Indexed: 03/26/2023]
Abstract
Ancient DNA has revealed multiple episodes of admixture in human prehistory during geographic expansions associated with cultural innovations. One important example is the expansion of Neolithic agricultural groups out of the Near East into Europe and their consequent admixture with Mesolithic hunter-gatherers.1,2,3,4 Ancient genomes from this period provide an opportunity to study the role of admixture in providing new genetic variation for selection to act upon, and also to identify genomic regions that resisted hunter-gatherer introgression and may thus have contributed to agricultural adaptations. We used genome-wide DNA from 677 individuals spanning Mesolithic and Neolithic Europe to infer ancestry deviations in the genomes of admixed individuals and to test for natural selection after admixture by testing for deviations from a genome-wide null distribution. We find that the region around the pigmentation-associated gene SLC24A5 shows the greatest overrepresentation of Neolithic local ancestry in the genome (|Z| = 3.46). In contrast, we find the greatest overrepresentation of Mesolithic ancestry across the major histocompatibility complex (MHC; |Z| = 4.21), a major immunity locus, which also shows allele frequency deviations indicative of selection following admixture (p = 1 × 10-56). This could reflect negative frequency-dependent selection on MHC alleles common in Neolithic populations or that Mesolithic alleles were positively selected for and facilitated adaptation in Neolithic populations to pathogens or other environmental factors. Our study extends previous results that highlight immune function and pigmentation as targets of adaptation in more recent populations to selection processes in the Stone Age.
Collapse
Affiliation(s)
- Tom Davy
- Ancient Genomics Laboratory, Francis Crick Institute, 1 Midland Road, NW1 1AT London, UK.
| | - Dan Ju
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, 415 Curie Blvd, Philadelphia, PA 19104, USA
| | - Iain Mathieson
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, 415 Curie Blvd, Philadelphia, PA 19104, USA
| | - Pontus Skoglund
- Ancient Genomics Laboratory, Francis Crick Institute, 1 Midland Road, NW1 1AT London, UK.
| |
Collapse
|
6
|
Janković I, Balen J, Potrebica H, Ahern JCM, Novak M. Mass violence in Copper Age Europe: The massacre burial site from Potočani, Croatia. AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 2021; 176:474-485. [PMID: 34418068 DOI: 10.1002/ajpa.24396] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 07/19/2021] [Accepted: 08/02/2021] [Indexed: 01/04/2023]
Abstract
OBJECTIVES To provide a comprehensive analysis of perimortem cranial injuries found on human remains from the Eneolithic (ca. 4200 BCE) mass grave discovered at Potočani, Croatia, to test if the assemblage is a result of a deliberate violent episode on a massive scale. MATERIALS AND METHODS Standard bioarchaeological analysis, including inventory of the preserved elements, minimum number of individuals, sex determination, age at death, as well as pattern and distribution of trauma, was recorded. RESULTS A minimum of 41 people are present in the sample. Both sexes and almost all age groups are represented, with a prevalence of children and young adults. Four blunt force antemortem injuries are registered in three adult males and one subadult while perimortem injuries are recorded on 13 crania with a total of 28 injuries. The distribution of perimortem injuries is not patterned with age, sex, or siding, and their location is on lateral, posterior, or superior parts of the crania. No "defensive wounds" or other type of injuries are observed on postcranial elements. DISCUSSION The injuries, manner of disposal of the bodies, radiocarbon dates, and other available data strongly suggest that the Potočani sample represents a single episode of execution during which the Potočani people were unable to defend themselves. The Potočani massacre is the oldest such example in southeastern Europe and provides additional evidence that indiscriminate violence on a massive scale is not a product of modern societies.
Collapse
Affiliation(s)
- Ivor Janković
- Centre for Applied Bioanthropology, Institute for Anthropological Research, Zagreb, Croatia.,Department of Anthropology, University of Wyoming, Laramie, Wyoming, USA
| | - Jacqueline Balen
- Prehistoric Department, Archaeological Museum in Zagreb, Zagreb, Croatia
| | - Hrvoje Potrebica
- Faculty of Humanities and Social Sciences, University of Zagreb, Zagreb, Croatia
| | - James C M Ahern
- Department of Anthropology, University of Wyoming, Laramie, Wyoming, USA
| | - Mario Novak
- Centre for Applied Bioanthropology, Institute for Anthropological Research, Zagreb, Croatia
| |
Collapse
|