1
|
Alaçamlı E, Naidoo T, Güler MN, Sağlıcan E, Aktürk Ş, Mapelli I, Vural KB, Somel M, Malmström H, Günther T. READv2: advanced and user-friendly detection of biological relatedness in archaeogenomics. Genome Biol 2024; 25:216. [PMID: 39135108 PMCID: PMC11318251 DOI: 10.1186/s13059-024-03350-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Accepted: 07/24/2024] [Indexed: 08/16/2024] Open
Abstract
The advent of genome-wide ancient DNA analysis has revolutionized our understanding of prehistoric societies. However, studying biological relatedness in these groups requires tailored approaches due to the challenges of analyzing ancient DNA. READv2, an optimized Python3 implementation of the most widely used tool for this purpose, addresses these challenges while surpassing its predecessor in speed and accuracy. For sufficient amounts of data, it can classify up to third-degree relatedness and differentiate between the two types of first-degree relatedness, full siblings and parent-offspring. READv2 enables user-friendly, efficient, and nuanced analysis of biological relatedness, facilitating a deeper understanding of past social structures.
Collapse
Affiliation(s)
- Erkin Alaçamlı
- Human Evolution, Department of Organismal Biology, Uppsala University, Uppsala, Sweden
- Present Address: Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Thijessen Naidoo
- Ancient DNA Unit, Science for Life Laboratory, Department of Archaeology and Classical Studies, Stockholm University, Stockholm, Sweden
- Centre for Palaeogenetics, Stockholm, Sweden
| | - Merve N Güler
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Ekin Sağlıcan
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Şevval Aktürk
- Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey
| | - Igor Mapelli
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Kıvılcım Başak Vural
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Mehmet Somel
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Helena Malmström
- Human Evolution, Department of Organismal Biology, Uppsala University, Uppsala, Sweden.
| | - Torsten Günther
- Human Evolution, Department of Organismal Biology, Uppsala University, Uppsala, Sweden.
- Ancient DNA Unit, Science for Life Laboratory, Department of Organismal Biology, Uppsala University, Uppsala, Sweden.
| |
Collapse
|
2
|
Sankaranarayanan G, Kodiveri Muthukaliannan G. Exploring antimicrobial resistance determinants in the Neanderthal microbiome. Microbiol Spectr 2024; 12:e0266223. [PMID: 38916350 PMCID: PMC11302244 DOI: 10.1128/spectrum.02662-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 05/24/2024] [Indexed: 06/26/2024] Open
Abstract
This study aimed to investigate the presence of antimicrobial resistance determinants (ARDs) in the Neanderthal microbiome through meticulous analysis of metagenomic data derived directly from dental calculus and fecal sediments across diverse Neanderthal sites in Europe. Employing a targeted locus mapping approach followed by a consensus strategy instead of an assembly-first approach, we aimed to identify and characterize ARDs within these ancient microbial communities. A comprehensive and redundant ARD database was constructed by amalgamating data from various antibiotic resistance gene repositories. Our results highlighted the efficacy of the KMA tool in providing a robust alignment of ancient metagenomic reads to the antibiotic resistance gene database. Notably, the KMA tool identified a limited number of ARDs, with only the 23S ribosomal gene from the dental calculus sample of Neanderthal remains at Goyet Troisieme Caverne exhibiting ancient DNA (aDNA) characteristics. Despite not identifying ARDs with typical ancient DNA damage patterns or negative distance proportions, our findings suggest a nuanced identification of putative antimicrobial resistance determinants in the Neanderthal microbiome's genetic repertoire based on the taxonomy-habitat correlation. Nevertheless, our findings are limited by factors such as environmental DNA contamination, DNA fragmentation, and cytosine deamination of aDNA. The study underscores the necessity for refined methodologies to unlock the genomic assets of prehistoric populations, fostering a comprehensive understanding of the intricate dynamics shaping the microbial landscape across history. IMPORTANCE The results of our analysis demonstrate the challenges in identifying determinants of antibiotic resistance within the endogenous microbiome of Neanderthals. Despite the comprehensive investigation of multiple studies and the utilization of advanced analytical techniques, the detection of antibiotic resistance determinants in the ancient microbial communities proved to be particularly difficult. However, our analysis did reveal the presence of some authentic ancient conservative genes, indicating the preservation of certain genetic elements over time. These findings raise intriguing questions about the factors influencing the presence or absence of antibiotic resistance in ancient microbial communities. It could be speculated that the spread of current antibiotic resistance, which has reached alarming levels in modern times, is primarily driven by anthropogenic factors such as the widespread use and misuse of antibiotics in medical and agricultural practices.
Collapse
|
3
|
Dolenz S, van der Valk T, Jin C, Oppenheimer J, Sharif MB, Orlando L, Shapiro B, Dalén L, Heintzman PD. Unravelling reference bias in ancient DNA datasets. Bioinformatics 2024; 40:btae436. [PMID: 38960861 PMCID: PMC11254355 DOI: 10.1093/bioinformatics/btae436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 03/22/2024] [Accepted: 07/02/2024] [Indexed: 07/05/2024] Open
Abstract
MOTIVATION The alignment of sequencing reads is a critical step in the characterization of ancient genomes. However, reference bias and spurious mappings pose a significant challenge, particularly as cutting-edge wet lab methods generate datasets that push the boundaries of alignment tools. Reference bias occurs when reference alleles are favoured over alternative alleles during mapping, whereas spurious mappings stem from either contamination or when endogenous reads fail to align to their correct position. Previous work has shown that these phenomena are correlated with read length but a more thorough investigation of reference bias and spurious mappings for ancient DNA has been lacking. Here, we use a range of empirical and simulated palaeogenomic datasets to investigate the impacts of mapping tools, quality thresholds, and reference genome on mismatch rates across read lengths. RESULTS For these analyses, we introduce AMBER, a new bioinformatics tool for assessing the quality of ancient DNA mapping directly from BAM-files and informing on reference bias, read length cut-offs and reference selection. AMBER rapidly and simultaneously computes the sequence read mapping bias in the form of the mismatch rates per read length, cytosine deamination profiles at both CpG and non-CpG sites, fragment length distributions, and genomic breadth and depth of coverage. Using AMBER, we find that mapping algorithms and quality threshold choices dictate reference bias and rates of spurious alignment at different read lengths in a predictable manner, suggesting that optimized mapping parameters for each read length will be a key step in alleviating reference bias and spurious mappings. AVAILABILITY AND IMPLEMENTATION AMBER is available for noncommercial use on GitHub (https://github.com/tvandervalk/AMBER.git). Scripts used to generate and analyse simulated datasets are available on Github (https://github.com/sdolenz/refbias_scripts).
Collapse
Affiliation(s)
- Stephanie Dolenz
- Centre for Palaeogenetics, Svante Arrhenius väg 20C, Stockholm, SE-106 91, Sweden
- Department of Geological Sciences, Stockholm University, Stockholm, SE-106 91, Sweden
| | - Tom van der Valk
- Centre for Palaeogenetics, Svante Arrhenius väg 20C, Stockholm, SE-106 91, Sweden
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, SE-114 18, Sweden
- Science for Life Laboratory, Stockholm, SE-171 65, Sweden
| | - Chenyu Jin
- Centre for Palaeogenetics, Svante Arrhenius väg 20C, Stockholm, SE-106 91, Sweden
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, SE-114 18, Sweden
- Department of Zoology, Stockholm University, Stockholm, SE-106 91, Sweden
| | - Jonas Oppenheimer
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, 95064, United States
| | - Muhammad Bilal Sharif
- Centre for Palaeogenetics, Svante Arrhenius väg 20C, Stockholm, SE-106 91, Sweden
- Department of Zoology, Stockholm University, Stockholm, SE-106 91, Sweden
| | - Ludovic Orlando
- Centre for Anthropobiology and Genomics of Toulouse (CAGT, CNRS UMR5288), University Paul Sabatier, Faculté de Santé, Toulouse, 31000, France
| | - Beth Shapiro
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, 95064, United States
- Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA, 95064, United States
| | - Love Dalén
- Centre for Palaeogenetics, Svante Arrhenius väg 20C, Stockholm, SE-106 91, Sweden
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, SE-114 18, Sweden
- Department of Zoology, Stockholm University, Stockholm, SE-106 91, Sweden
| | - Peter D Heintzman
- Centre for Palaeogenetics, Svante Arrhenius väg 20C, Stockholm, SE-106 91, Sweden
- Department of Geological Sciences, Stockholm University, Stockholm, SE-106 91, Sweden
| |
Collapse
|
4
|
Aktürk Ş, Mapelli I, Güler MN, Gürün K, Katırcıoğlu B, Vural KB, Sağlıcan E, Çetin M, Yaka R, Sürer E, Atağ G, Çokoğlu SS, Sevkar A, Altınışık NE, Koptekin D, Somel M. Benchmarking kinship estimation tools for ancient genomes using pedigree simulations. Mol Ecol Resour 2024; 24:e13960. [PMID: 38676702 DOI: 10.1111/1755-0998.13960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Revised: 03/19/2024] [Accepted: 03/28/2024] [Indexed: 04/29/2024]
Abstract
There is growing interest in uncovering genetic kinship patterns in past societies using low-coverage palaeogenomes. Here, we benchmark four tools for kinship estimation with such data: lcMLkin, NgsRelate, KIN, and READ, which differ in their input, IBD estimation methods, and statistical approaches. We used pedigree and ancient genome sequence simulations to evaluate these tools when only a limited number (1 to 50 K, with minor allele frequency ≥0.01) of shared SNPs are available. The performance of all four tools was comparable using ≥20 K SNPs. We found that first-degree related pairs can be accurately classified even with 1 K SNPs, with 85% F1 scores using READ and 96% using NgsRelate or lcMLkin. Distinguishing third-degree relatives from unrelated pairs or second-degree relatives was also possible with high accuracy (F1 > 90%) with 5 K SNPs using NgsRelate and lcMLkin, while READ and KIN showed lower success (69 and 79% respectively). Meanwhile, noise in population allele frequencies and inbreeding (first-cousin mating) led to deviations in kinship coefficients, with different sensitivities across tools. We conclude that using multiple tools in parallel might be an effective approach to achieve robust estimates on ultra-low-coverage genomes.
Collapse
Affiliation(s)
- Şevval Aktürk
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Igor Mapelli
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Merve N Güler
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Kanat Gürün
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Büşra Katırcıoğlu
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Kıvılcım Başak Vural
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Ekin Sağlıcan
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Mehmet Çetin
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Reyhan Yaka
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
- Centre for Palaeogenetics, Stockholm, Sweden
- Department of Archaeology and Classical Studies, Stockholm University, Stockholm, Sweden
| | - Elif Sürer
- Department of Modeling and Simulation, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Gözde Atağ
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Sevim Seda Çokoğlu
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Arda Sevkar
- Department of Anthropology, Hacettepe University, Ankara, Turkey
| | - N Ezgi Altınışık
- Department of Anthropology, Hacettepe University, Ankara, Turkey
| | - Dilek Koptekin
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Mehmet Somel
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| |
Collapse
|
5
|
Austin RM, Honap TP, Mann AE, Hübner A, DeGaglia CMS, Warinner C, Zuckerman MK, Hofman CA. Metagenomic and paleopathological analyses of a historic documented collection explore ancient dental calculus as a diagnostic tool. Sci Rep 2024; 14:14720. [PMID: 38926415 PMCID: PMC11208530 DOI: 10.1038/s41598-024-64818-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Accepted: 06/13/2024] [Indexed: 06/28/2024] Open
Abstract
Dental calculus is a microbial biofilm that contains biomolecules from oral commensals and pathogens, including those potentially related to cause of death (CoD). To assess the utility of calculus as a diagnostically informative substrate, in conjunction with paleopathological analysis, calculus samples from 39 individuals in the Smithsonian Institution's Robert J. Terry Collection with CoDs of either syphilis or tuberculosis were assessed via shotgun metagenomic sequencing for the presence of Treponema pallidum subsp. pallidum and Mycobacterium tuberculosis complex (MTBC) DNA. Paleopathological analysis revealed that frequencies of skeletal lesions associated with these diseases were partially inconsistent with diagnostic criteria. Although recovery of T. p. pallidum DNA from individuals with a syphilis CoD was elusive, MTBC DNA was identified in at least one individual with a tuberculosis CoD. The authenticity of MTBC DNA was confirmed using targeted quantitative PCR assays, MTBC genome enrichment, and in silico bioinformatic analyses; however, the lineage of the MTBC strain present could not be determined. Overall, our study highlights the utility of dental calculus for molecular detection of tuberculosis in the archaeological record and underscores the effect of museum preparation techniques and extensive handling on pathogen DNA preservation in skeletal collections.
Collapse
Affiliation(s)
- Rita M Austin
- Frontiers in Evolutionary Zoology Research Group, Natural History Museum of Oslo, University of Oslo, Oslo, 0562, Norway.
- Department of Anthropology, National Museum of Natural History, Smithsonian Institution, Washington, DC, 20560, USA.
- Department of Anthropology, University of Oklahoma, Norman, OK, 73019, USA.
- Laboratories of Molecular Anthropology and Microbiome Research, University of Oklahoma, Norman, OK, 73019, USA.
| | - Tanvi P Honap
- Department of Anthropology, University of Oklahoma, Norman, OK, 73019, USA
- Laboratories of Molecular Anthropology and Microbiome Research, University of Oklahoma, Norman, OK, 73019, USA
| | - Allison E Mann
- Department of Biological Sciences, Clemson University, Clemson, SC, 29634, USA
| | - Alexander Hübner
- Department Archaeogenetics, Max-Planck-Institute for Evolutionary Anthropology, Leipzig, 04103, Germany
| | | | - Christina Warinner
- Department of Anthropology, Harvard University, Cambridge, MA, 02138, USA
| | - Molly K Zuckerman
- Department of Anthropology and Middle Eastern Cultures, Mississippi State University, Mississippi State, MS, 39762, USA.
| | - Courtney A Hofman
- Department of Anthropology, National Museum of Natural History, Smithsonian Institution, Washington, DC, 20560, USA.
- Department of Anthropology, University of Oklahoma, Norman, OK, 73019, USA.
- Laboratories of Molecular Anthropology and Microbiome Research, University of Oklahoma, Norman, OK, 73019, USA.
| |
Collapse
|
6
|
Hempel E, Faith JT, Preick M, de Jager D, Barish S, Hartmann S, Grau JH, Moodley Y, Gedman G, Pirovich KM, Bibi F, Kalthoff DC, Bocklandt S, Lamm B, Dalén L, Westbury MV, Hofreiter M. Colonial-driven extinction of the blue antelope despite genomic adaptation to low population size. Curr Biol 2024; 34:2020-2029.e6. [PMID: 38614080 DOI: 10.1016/j.cub.2024.03.051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 02/09/2024] [Accepted: 03/25/2024] [Indexed: 04/15/2024]
Abstract
Low genomic diversity is generally indicative of small population size and is considered detrimental by decreasing long-term adaptability.1,2,3,4,5,6 Moreover, small population size may promote gene flow with congeners and outbreeding depression.7,8,9,10,11,12,13 Here, we examine the connection between habitat availability, effective population size (Ne), and extinction by generating a 40× nuclear genome from the extinct blue antelope (Hippotragus leucophaeus). Historically endemic to the relatively small Cape Floristic Region in southernmost Africa,14,15 populations were thought to have expanded and contracted across glacial-interglacial cycles, tracking suitable habitat.16,17,18 However, we found long-term low Ne, unaffected by glacial cycles, suggesting persistence with low genomic diversity for many millennia prior to extinction in ∼AD 1800. A lack of inbreeding, alongside high levels of genetic purging, suggests adaptation to this long-term low Ne and that human impacts during the colonial era (e.g., hunting and landscape transformation), rather than longer-term ecological processes, were central to its extinction. Phylogenomic analyses uncovered gene flow between roan (H. equinus) and blue antelope, as well as between roan and sable antelope (H. niger), approximately at the time of divergence of blue and sable antelope (∼1.9 Ma). Finally, we identified the LYST and ASIP genes as candidates for the eponymous bluish pelt color of the blue antelope. Our results revise numerous aspects of our understanding of the interplay between genomic diversity and evolutionary history and provide the resources for uncovering the genetic basis of this extinct species' unique traits.
Collapse
Affiliation(s)
- Elisabeth Hempel
- Evolutionary Adaptive Genomics, Institute of Biochemistry and Biology, Faculty of Science, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam, Germany; Museum für Naturkunde, Leibniz Institute for Evolution and Biodiversity Science, Invalidenstraße 43, 10115 Berlin, Germany.
| | - J Tyler Faith
- Natural History Museum of Utah, University of Utah, 301 Wakara Way, Salt Lake City, UT 84108, USA; Department of Anthropology, University of Utah, 260 South Central Campus Drive, Salt Lake City, UT 84112, USA; Origins Centre, University of the Witwatersrand, 2000 Johannesburg, Republic of South Africa
| | - Michaela Preick
- Evolutionary Adaptive Genomics, Institute of Biochemistry and Biology, Faculty of Science, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam, Germany
| | - Deon de Jager
- Globe Institute, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | | | - Stefanie Hartmann
- Evolutionary Adaptive Genomics, Institute of Biochemistry and Biology, Faculty of Science, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam, Germany
| | - José H Grau
- Center for Species Survival, Smithsonian Conservation Biology Institute, Washington, DC 20008, USA; Amedes Genetics, Amedes Medizinische Dienstleistungen GmbH, 10117 Berlin, Germany
| | - Yoshan Moodley
- Department of Biological Sciences, University of Venda, Private Bag X5050, Thohoyandou 0950, Republic of South Africa
| | | | | | - Faysal Bibi
- Museum für Naturkunde, Leibniz Institute for Evolution and Biodiversity Science, Invalidenstraße 43, 10115 Berlin, Germany
| | - Daniela C Kalthoff
- Swedish Museum of Natural History, Department of Zoology, Box 50007, 10405 Stockholm, Sweden
| | | | - Ben Lamm
- Colossal Biosciences, Dallas, TX 75247, USA
| | - Love Dalén
- Swedish Museum of Natural History, Department of Bioinformatics and Genetics, Box 50007, 10405 Stockholm, Sweden; Centre for Palaeogenetics, Svante Arrhenius väg 20c, 10691 Stockholm, Sweden; Department of Zoology, Stockholm University, 10691 Stockholm, Sweden.
| | - Michael V Westbury
- Globe Institute, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark.
| | - Michael Hofreiter
- Evolutionary Adaptive Genomics, Institute of Biochemistry and Biology, Faculty of Science, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam, Germany.
| |
Collapse
|
7
|
Garrido Marques A, Rubinacci S, Malaspinas AS, Delaneau O, Sousa da Mota B. Assessing the impact of post-mortem damage and contamination on imputation performance in ancient DNA. Sci Rep 2024; 14:6227. [PMID: 38486065 PMCID: PMC10940295 DOI: 10.1038/s41598-024-56584-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 03/08/2024] [Indexed: 03/18/2024] Open
Abstract
Low-coverage imputation is becoming ever more present in ancient DNA (aDNA) studies. Imputation pipelines commonly used for present-day genomes have been shown to yield accurate results when applied to ancient genomes. However, post-mortem damage (PMD), in the form of C-to-T substitutions at the reads termini, and contamination with DNA from closely related species can potentially affect imputation performance in aDNA. In this study, we evaluated imputation performance (i) when using a genotype caller designed for aDNA, ATLAS, compared to bcftools, and (ii) when contamination is present. We evaluated imputation performance with principal component analyses and by calculating imputation error rates. With a particular focus on differently imputed sites, we found that using ATLAS prior to imputation substantially improved imputed genotypes for a very damaged ancient genome (42% PMD). Trimming the ends of the sequencing reads led to similar improvements in imputation accuracy. For the remaining genomes, ATLAS brought limited gains. Finally, to examine the effect of contamination on imputation, we added various amounts of reads from two present-day genomes to a previously downsampled high-coverage ancient genome. We observed that imputation accuracy drastically decreased for contamination rates above 5%. In conclusion, we recommend (i) accounting for PMD by either trimming sequencing reads or using a genotype caller such as ATLAS before imputing highly damaged genomes and (ii) only imputing genomes containing up to 5% of contamination.
Collapse
Affiliation(s)
| | - Simone Rubinacci
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Anna-Sapfo Malaspinas
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | | | - Bárbara Sousa da Mota
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland.
| |
Collapse
|
8
|
Eisenhofer R, Wright S, Weyrich L. Benchmarking a targeted 16S ribosomal RNA gene enrichment approach to reconstruct ancient microbial communities. PeerJ 2024; 12:e16770. [PMID: 38440408 PMCID: PMC10911074 DOI: 10.7717/peerj.16770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 12/16/2023] [Indexed: 03/06/2024] Open
Abstract
The taxonomic characterization of ancient microbiomes is a key step in the rapidly growing field of paleomicrobiology. While PCR amplification of the 16S ribosomal RNA (rRNA) gene is a widely used technique in modern microbiota studies, this method has systematic biases when applied to ancient microbial DNA. Shotgun metagenomic sequencing has proven to be the most effective method in reconstructing taxonomic profiles of ancient dental calculus samples. Nevertheless, shotgun sequencing approaches come with inherent limitations that could be addressed through hybridization enrichment capture. When employed together, shotgun sequencing and hybridization capture have the potential to enhance the characterization of ancient microbial communities. Here, we develop, test, and apply a hybridization enrichment capture technique to selectively target 16S rRNA gene fragments from the libraries of ancient dental calculus samples generated with shotgun techniques. We simulated data sets generated from hybridization enrichment capture, indicating that taxonomic identification of fragmented and damaged 16S rRNA gene sequences was feasible. Applying this enrichment approach to 15 previously published ancient calculus samples, we observed a 334-fold increase of ancient 16S rRNA gene fragments in the enriched samples when compared to unenriched libraries. Our results suggest that 16S hybridization capture is less prone to the effects of background contamination than 16S rRNA amplification, yielding a higher percentage of on-target recovery. While our enrichment technique detected low abundant and rare taxa within a given sample, these assignments may not achieve the same level of specificity as those achieved by unenriched methods.
Collapse
Affiliation(s)
| | - Sterling Wright
- Department of Anthropology, Pennsylvania State University, University Park, Pennsylvania, United States
| | - Laura Weyrich
- Department of Anthropology, Pennsylvania State University, University Park, Pennsylvania, United States
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, United States
- School of Biological Sciences, University of Adelaide, Adelaide, Australia
| |
Collapse
|
9
|
Lien A, Legori LP, Kraft L, Sackett PW, Renaud G. Benchmarking software tools for trimming adapters and merging next-generation sequencing data for ancient DNA. FRONTIERS IN BIOINFORMATICS 2023; 3:1260486. [PMID: 38131007 PMCID: PMC10733496 DOI: 10.3389/fbinf.2023.1260486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 11/21/2023] [Indexed: 12/23/2023] Open
Abstract
Ancient DNA is highly degraded, resulting in very short sequences. Reads generated with modern high-throughput sequencing machines are generally longer than ancient DNA molecules, therefore the reads often contain some portion of the sequencing adaptors. It is crucial to remove those adaptors, as they can interfere with downstream analysis. Furthermore, overlapping portions when DNA has been read forward and backward (paired-end) can be merged to correct sequencing errors and improve read quality. Several tools have been developed for adapter trimming and read merging, however, no one has attempted to evaluate their accuracy and evaluate their potential impact on downstream analyses. Through the simulation of sequencing data, seven commonly used tools were analyzed in their ability to reconstruct ancient DNA sequences through read merging. The analyzed tools exhibit notable differences in their abilities to correct sequence errors and identify the correct read overlap, but the most substantial difference is observed in their ability to calculate quality scores for merged bases. Selecting the most appropriate tool for a given project depends on several factors, although some tools such as fastp have some shortcomings, whereas others like leeHom outperform the other tools in most aspects. While the choice of tool did not result in a measurable difference when analyzing population genetics using principal component analysis, it is important to note that downstream analyses that are sensitive to wrongly merged reads or that rely on quality scores can be significantly impacted by the choice of tool.
Collapse
Affiliation(s)
- Annette Lien
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Kongens Lyngby, Denmark
| | | | - Louis Kraft
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Peter Wad Sackett
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Gabriel Renaud
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Kongens Lyngby, Denmark
| |
Collapse
|
10
|
Childebayeva A, Zavala EI. Review: Computational analysis of human skeletal remains in ancient DNA and forensic genetics. iScience 2023; 26:108066. [PMID: 37927550 PMCID: PMC10622734 DOI: 10.1016/j.isci.2023.108066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2023] Open
Abstract
Degraded DNA is used to answer questions in the fields of ancient DNA (aDNA) and forensic genetics. While aDNA studies typically center around human evolution and past history, and forensic genetics is often more concerned with identifying a specific individual, scientists in both fields face similar challenges. The overlap in source material has prompted periodic discussions and studies on the advantages of collaboration between fields toward mutually beneficial methodological advancements. However, most have been centered around wet laboratory methods (sampling, DNA extraction, library preparation, etc.). In this review, we focus on the computational side of the analytical workflow. We discuss limitations and considerations to consider when working with degraded DNA. We hope this review provides a framework to researchers new to computational workflows for how to think about analyzing highly degraded DNA and prompts an increase of collaboration between the forensic genetics and aDNA fields.
Collapse
Affiliation(s)
- Ainash Childebayeva
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Department of Anthropology, University of Kansas, Lawrence, KS, USA
| | - Elena I. Zavala
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Biology, University of Oregon, Eugene, OR, USA
| |
Collapse
|
11
|
Kim J, Rosenberg NA. Record-matching of STR profiles with fragmentary genomic SNP data. Eur J Hum Genet 2023; 31:1283-1290. [PMID: 37567955 PMCID: PMC10620386 DOI: 10.1038/s41431-023-01430-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 05/30/2023] [Accepted: 07/03/2023] [Indexed: 08/13/2023] Open
Abstract
In many forensic settings, identity of a DNA sample is sought from poor-quality DNA, for which the typical STR loci tabulated in forensic databases are not possible to reliably genotype. Genome-wide SNPs, however, can potentially be genotyped from such samples via next-generation sequencing, so that queries can in principle compare SNP genotypes from DNA samples of interest to STR genotype profiles that represent proposed matches. We use genetic record-matching to evaluate the possibility of testing SNP profiles obtained from poor-quality DNA samples to identify exact and relatedness matches to STR profiles. Using simulations based on whole-genome sequences, we show that in some settings, similar match accuracies to those seen with full coverage of the genome are obtained by genetic record-matching for SNP data that represent 5-10% genomic coverage. Thus, if even a fraction of random genomic SNPs can be genotyped by next-generation sequencing, then the potential may exist to test the resulting genotype profiles for matches to profiles consisting exclusively of nonoverlapping STR loci. The result has implications in relation to criminal justice, mass disasters, missing-person cases, studies of ancient DNA, and genomic privacy.
Collapse
Affiliation(s)
- Jaehee Kim
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, CA, 94305, USA.
| |
Collapse
|
12
|
Pochon Z, Bergfeldt N, Kırdök E, Vicente M, Naidoo T, van der Valk T, Altınışık NE, Krzewińska M, Dalén L, Götherström A, Mirabello C, Unneberg P, Oskolkov N. aMeta: an accurate and memory-efficient ancient metagenomic profiling workflow. Genome Biol 2023; 24:242. [PMID: 37872569 PMCID: PMC10591440 DOI: 10.1186/s13059-023-03083-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 10/06/2023] [Indexed: 10/25/2023] Open
Abstract
Analysis of microbial data from archaeological samples is a growing field with great potential for understanding ancient environments, lifestyles, and diseases. However, high error rates have been a challenge in ancient metagenomics, and the availability of computational frameworks that meet the demands of the field is limited. Here, we propose aMeta, an accurate metagenomic profiling workflow for ancient DNA designed to minimize the amount of false discoveries and computer memory requirements. Using simulated data, we benchmark aMeta against a current state-of-the-art workflow and demonstrate its superiority in microbial detection and authentication, as well as substantially lower usage of computer memory.
Collapse
Affiliation(s)
- Zoé Pochon
- Centre for Palaeogenetics, Stockholm, Sweden
- Department of Archaeology and Classical Studies, Stockholm University, Stockholm, Sweden
| | - Nora Bergfeldt
- Centre for Palaeogenetics, Stockholm, Sweden
- Department of Zoology, Stockholm University, Stockholm, Sweden
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden
| | - Emrah Kırdök
- Department of Biotechnology, Faculty of Science, Mersin University, Mersin, Turkey
| | - Mário Vicente
- Centre for Palaeogenetics, Stockholm, Sweden
- Department of Archaeology and Classical Studies, Stockholm University, Stockholm, Sweden
| | - Thijessen Naidoo
- Centre for Palaeogenetics, Stockholm, Sweden
- Department of Archaeology and Classical Studies, Stockholm University, Stockholm, Sweden
- Ancient DNA Unit, Science for Life Laboratory, Stockholm, Sweden
- Ancient DNA Unit, Science for Life Laboratory, Uppsala, Sweden
| | - Tom van der Valk
- Centre for Palaeogenetics, Stockholm, Sweden
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden
| | - N Ezgi Altınışık
- Human-G Laboratory, Department of Anthropology, Hacettepe University, 06800, Beytepe, Ankara, Turkey
| | - Maja Krzewińska
- Centre for Palaeogenetics, Stockholm, Sweden
- Department of Archaeology and Classical Studies, Stockholm University, Stockholm, Sweden
| | - Love Dalén
- Centre for Palaeogenetics, Stockholm, Sweden
- Department of Zoology, Stockholm University, Stockholm, Sweden
| | - Anders Götherström
- Centre for Palaeogenetics, Stockholm, Sweden
- Department of Archaeology and Classical Studies, Stockholm University, Stockholm, Sweden
| | - Claudio Mirabello
- Department of Physics, Chemistry and Biology, Science for Life Laboratory, National Bioinformatics Infrastructure Sweden, Linköping University, Linköping, Sweden
| | - Per Unneberg
- Department of Cell and Molecular Biology, Science for Life Laboratory, National Bioinformatics Infrastructure Sweden, Uppsala University, Uppsala, Sweden
| | - Nikolay Oskolkov
- Department of Biology, Science for Life Laboratory, National Bioinformatics Infrastructure Sweden, Lund University, Lund, Sweden.
| |
Collapse
|
13
|
Pusadkar V, Azad RK. Benchmarking Metagenomic Classifiers on Simulated Ancient and Modern Metagenomic Data. Microorganisms 2023; 11:2478. [PMID: 37894136 PMCID: PMC10609333 DOI: 10.3390/microorganisms11102478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 09/28/2023] [Accepted: 09/29/2023] [Indexed: 10/29/2023] Open
Abstract
Taxonomic profiling of ancient metagenomic samples is challenging due to the accumulation of specific damage patterns on DNA over time. Although a number of methods for metagenome profiling have been developed, most of them have been assessed on modern metagenomes or simulated metagenomes mimicking modern metagenomes. Further, a comparative assessment of metagenome profilers on simulated metagenomes representing a spectrum of degradation depth, from the extremity of ancient (most degraded) to current or modern (not degraded) metagenomes, has not yet been performed. To understand the strengths and weaknesses of different metagenome profilers, we performed their comprehensive evaluation on simulated metagenomes representing human dental calculus microbiome, with the level of DNA damage successively raised to mimic modern to ancient metagenomes. All classes of profilers, namely, DNA-to-DNA, DNA-to-protein, and DNA-to-marker comparison-based profilers were evaluated on metagenomes with varying levels of damage simulating deamination, fragmentation, and contamination. Our results revealed that, compared to deamination and fragmentation, human and environmental contamination of ancient DNA (with modern DNA) has the most pronounced effect on the performance of each profiler. Further, the DNA-to-DNA (e.g., Kraken2, Bracken) and DNA-to-marker (e.g., MetaPhlAn4) based profiling approaches showed complementary strengths, which can be leveraged to elevate the state-of-the-art of ancient metagenome profiling.
Collapse
Affiliation(s)
- Vaidehi Pusadkar
- Department of Biological Sciences, University of North Texas, Denton, TX 76203, USA;
- BioDiscovery Institute, University of North Texas, Denton, TX 76203, USA
| | - Rajeev K. Azad
- Department of Biological Sciences, University of North Texas, Denton, TX 76203, USA;
- BioDiscovery Institute, University of North Texas, Denton, TX 76203, USA
- Department of Mathematics, University of North Texas, Denton, TX 76203, USA
| |
Collapse
|
14
|
Atağ G, Vural KB, Kaptan D, Özkan M, Koptekin D, Sağlıcan E, Doğramacı S, Köz M, Yılmaz A, Söylev A, Togan İ, Somel M, Özer F. MTaxi: A comparative tool for taxon identification of ultra low coverage ancient genomes. OPEN RESEARCH EUROPE 2023; 2:100. [PMID: 37829208 PMCID: PMC10565424 DOI: 10.12688/openreseurope.14936.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 09/26/2023] [Indexed: 10/14/2023]
Abstract
A major challenge in zooarchaeology is to morphologically distinguish closely related species' remains, especially using small bone fragments. Shotgun sequencing aDNA from archeological remains and comparative alignment to the candidate species' reference genomes will only apply when reference nuclear genomes of comparable quality are available, and may still fail when coverages are low. Here, we propose an alternative method, MTaxi, that uses highly accessible mitochondrial DNA (mtDNA) to distinguish between pairs of closely related species from ancient DNA sequences. MTaxi utilises mtDNA transversion-type substitutions between pairs of candidate species, assigns reads to either species, and performs a binomial test to determine the sample taxon. We tested MTaxi on sheep/goat and horse/donkey data, between which zooarchaeological classification can be challenging in ways that epitomise our case. The method performed efficiently on simulated ancient genomes down to 0.3x mitochondrial coverage for both sheep/goat and horse/donkey, with no false positives. Trials on n=18 ancient sheep/goat samples and n=10 horse/donkey samples of known species identity also yielded 100% accuracy. Overall, MTaxi provides a straightforward approach to classify closely related species that are difficult to distinguish through zooarchaeological methods using low coverage aDNA data, especially when similar quality reference genomes are unavailable. MTaxi is freely available at https://github.com/goztag/MTaxi.
Collapse
Affiliation(s)
- Gözde Atağ
- Biological Sciences, Middle East Technical University, Ankara, Turkey
| | | | - Damla Kaptan
- Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Mustafa Özkan
- Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Dilek Koptekin
- Biological Sciences, Middle East Technical University, Ankara, Turkey
- Health Informatics, Middle East Technical University, Ankara, Turkey
| | - Ekin Sağlıcan
- Health Informatics, Middle East Technical University, Ankara, Turkey
| | - Sevcan Doğramacı
- Computer Engineering, Konya Food and Agriculture University, Konya, Turkey
| | - Mevlüt Köz
- Molecular Biology and Genetics, Konya Food and Agriculture University, Konya, Turkey
| | - Ardan Yılmaz
- Computer Engineering, Middle East Technical University, Ankara, Turkey
| | - Arda Söylev
- Computer Engineering, Konya Food and Agriculture University, Konya, Turkey
| | - İnci Togan
- Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Mehmet Somel
- Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Füsun Özer
- Anthropology, Hacettepe University, Ankara, Turkey
| |
Collapse
|
15
|
Rubin JD, Vogel NA, Gopalakrishnan S, Sackett PW, Renaud G. HaploCart: Human mtDNA haplogroup classification using a pangenomic reference graph human mtDNA haplogroup inference. PLoS Comput Biol 2023; 19:e1011148. [PMID: 37285390 DOI: 10.1371/journal.pcbi.1011148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 05/02/2023] [Indexed: 06/09/2023] Open
Abstract
Current mitochondrial DNA (mtDNA) haplogroup classification tools map reads to a single reference genome and perform inference based on the detected mutations to this reference. This approach biases haplogroup assignments towards the reference and prohibits accurate calculations of the uncertainty in assignment. We present HaploCart, a probabilistic mtDNA haplogroup classifier which uses a pangenomic reference graph framework together with principles of Bayesian inference. We demonstrate that our approach significantly outperforms available tools by being more robust to lower coverage or incomplete consensus sequences and producing phylogenetically-aware confidence scores that are unbiased towards any haplogroup. HaploCart is available both as a command-line tool and through a user-friendly web interface. The C++ program accepts as input consensus FASTA, FASTQ, or GAM files, and outputs a text file with the haplogroup assignments of the samples along with the level of confidence in the assignments. Our work considerably reduces the amount of data required to obtain a confident mitochondrial haplogroup assignment.
Collapse
Affiliation(s)
- Joshua Daniel Rubin
- Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Nicola Alexandra Vogel
- Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| | | | - Peter Wad Sackett
- Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Gabriel Renaud
- Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| |
Collapse
|
16
|
Dalal V, Pasupuleti N, Chaubey G, Rai N, Shinde V. Advancements and Challenges in Ancient DNA Research: Bridging the Global North-South Divide. Genes (Basel) 2023; 14:479. [PMID: 36833406 PMCID: PMC9956214 DOI: 10.3390/genes14020479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 02/02/2023] [Accepted: 02/08/2023] [Indexed: 02/16/2023] Open
Abstract
Ancient DNA (aDNA) research first began in 1984 and ever since has greatly expanded our understanding of evolution and migration. Today, aDNA analysis is used to solve various puzzles about the origin of mankind, migration patterns, and the spread of infectious diseases. The incredible findings ranging from identifying the new branches within the human family to studying the genomes of extinct flora and fauna have caught the world by surprise in recent times. However, a closer look at these published results points out a clear Global North and Global South divide. Therefore, through this research, we aim to emphasize encouraging better collaborative opportunities and technology transfer to support researchers in the Global South. Further, the present research also focuses on expanding the scope of the ongoing conversation in the field of aDNA by reporting relevant literature published around the world and discussing the advancements and challenges in the field.
Collapse
Affiliation(s)
- Vasundhra Dalal
- Centre for Cellular and Molecular Biology, Hyderabad 500007, Telangana, India
| | | | - Gyaneshwer Chaubey
- Cytogenetics Laboratory, Department of Zoology, Banaras Hindu University, Varanasi 221005, Uttar Pradesh, India
| | - Niraj Rai
- Ancient DNA Lab, Birbal Sahni Institute of Palaeosciences, Lucknow 226007, Uttar Pradesh, India
| | - Vasant Shinde
- Centre for Cellular and Molecular Biology, Hyderabad 500007, Telangana, India
| |
Collapse
|
17
|
Everett R, Cribdon B. MetaDamage tool: Examining post-mortem damage in sedaDNA on a metagenomic scale. Front Ecol Evol 2023. [DOI: 10.3389/fevo.2022.888421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
The use of metagenomic datasets to support ancient sedimentary DNA (sedaDNA) for paleoecological reconstruction has been demonstrated to be a powerful tool to understand multi-organism responses to climatic shifts and events. Authentication remains integral to the ancient DNA discipline, and this extends to sedaDNA analysis. Furthermore, distinguishing authentic sedaDNA from contamination or modern material also allows for a better understanding of broader questions in sedaDNA research, such as formation processes, source and catchment, and post-depositional processes. Existing tools for the detection of damage signals are designed for single-taxon input, require a priori organism specification, and require a significant number of input sequences to establish a signal. It is therefore often difficult to identify an established cytosine deamination rate consistent with ancient DNA across a sediment sample. In this study, we present MetaDamage, a tool that examines cytosine deamination on a metagenomic (all organisms) scale for multiple previously undetermined taxa and can produce a damage profile based on a few hundred reads. We outline the development and testing of the MetaDamage tool using both authentic sedaDNA sequences and simulated data to demonstrate the resolution in which MetaDamage can identify deamination levels consistent with the presence of ancient DNA. The MetaDamage tool offers a method for the initial assessment of the presence of sedaDNA and a better understanding of key questions of preservation for paleoecological reconstruction.
Collapse
|
18
|
Thuesen NH, Klausen MS, Gopalakrishnan S, Trolle T, Renaud G. Benchmarking freely available HLA typing algorithms across varying genes, coverages and typing resolutions. Front Immunol 2022; 13:987655. [PMID: 36426357 PMCID: PMC9679531 DOI: 10.3389/fimmu.2022.987655] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 10/10/2022] [Indexed: 11/02/2023] Open
Abstract
Identifying the specific human leukocyte antigen (HLA) allele combination of an individual is crucial in organ donation, risk assessment of autoimmune and infectious diseases and cancer immunotherapy. However, due to the high genetic polymorphism in this region, HLA typing requires specialized methods. We investigated the performance of five next-generation sequencing (NGS) based HLA typing tools with a non-restricted license namely HLA*LA, Optitype, HISAT-genotype, Kourami and STC-Seq. This evaluation was done for the five HLA loci, HLA-A, -B, -C, -DRB1 and -DQB1 using whole-exome sequencing (WES) samples from 829 individuals. The robustness of the tools to lower depth of coverage (DOC) was evaluated by subsampling and HLA typing 230 WES samples at DOC ranging from 1X to 100X. The HLA typing accuracy was measured across four typing resolutions. Among these, we present two clinically-relevant typing resolutions (P group and pseudo-sequence), which specifically focus on the peptide binding region. On average, across the five HLA loci examined, HLA*LA was found to have the highest typing accuracy. For the individual loci, HLA-A, -B and -C, Optitype's typing accuracy was the highest and HLA*LA had the highest typing accuracy for HLA-DRB1 and -DQB1. The tools' robustness to lower DOC data varied widely and further depended on the specific HLA locus. For all Class I loci, Optitype had a typing accuracy above 95% (according to the modification of the amino acids in the functionally relevant portion of the HLA molecule) at 50X, but increasing the DOC beyond even 100X could still improve the typing accuracy of HISAT-genotype, Kourami, and STC-seq across all five HLA loci as well as HLA*LA's typing accuracy for HLA-DQB1. HLA typing is also used in studies of ancient DNA (aDNA), which is often based on sequencing data with lower quality and DOC. Interestingly, we found that Optitype's typing accuracy is not notably impaired by short read length or by DNA damage, which is typical of aDNA, as long as the DOC is sufficiently high.
Collapse
Affiliation(s)
- Nikolas Hallberg Thuesen
- Evaxion Biotech, Copenhagen, Denmark
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Lyngby, Denmark
| | | | - Shyam Gopalakrishnan
- Section for Hologenomics, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | - Gabriel Renaud
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
19
|
Abstract
Like modern metagenomics, ancient metagenomics is a highly data-rich discipline, with the added challenge that the DNA of interest is degraded and, depending on the sample type, in low abundance. This requires the application of specialized measures during molecular experiments and computational analyses. Furthermore, researchers often work with finite sample sizes, which impedes optimal experimental design and control of confounding factors, and with ethically sensitive samples necessitating the consideration of additional guidelines. In September 2020, early career researchers in the field of ancient metagenomics met (Standards, Precautions & Advances in Ancient Metagenomics 2 [SPAAM2] community meeting) to discuss the state of the field and how to address current challenges. Here, in an effort to bridge the gap between ancient and modern metagenomics, we highlight and reflect upon some common misconceptions, provide a brief overview of the challenges in our field, and point toward useful resources for potential reviewers and newcomers to the field.
Collapse
|
20
|
Borry M, Hübner A, Rohrlach AB, Warinner C. PyDamage: automated ancient damage identification and estimation for contigs in ancient DNA de novo assembly. PeerJ 2021; 9:e11845. [PMID: 34395085 PMCID: PMC8323603 DOI: 10.7717/peerj.11845] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 07/01/2021] [Indexed: 01/19/2023] Open
Abstract
DNA de novo assembly can be used to reconstruct longer stretches of DNA (contigs), including genes and even genomes, from short DNA sequencing reads. Applying this technique to metagenomic data derived from archaeological remains, such as paleofeces and dental calculus, we can investigate past microbiome functional diversity that may be absent or underrepresented in the modern microbiome gene catalogue. However, compared to modern samples, ancient samples are often burdened with environmental contamination, resulting in metagenomic datasets that represent mixtures of ancient and modern DNA. The ability to rapidly and reliably establish the authenticity and integrity of ancient samples is essential for ancient DNA studies, and the ability to distinguish between ancient and modern sequences is particularly important for ancient microbiome studies. Characteristic patterns of ancient DNA damage, namely DNA fragmentation and cytosine deamination (observed as C-to-T transitions) are typically used to authenticate ancient samples and sequences, but existing tools for inspecting and filtering aDNA damage either compute it at the read level, which leads to high data loss and lower quality when used in combination with de novo assembly, or require manual inspection, which is impractical for ancient assemblies that typically contain tens to hundreds of thousands of contigs. To address these challenges, we designed PyDamage, a robust, automated approach for aDNA damage estimation and authentication of de novo assembled aDNA. PyDamage uses a likelihood ratio based approach to discriminate between truly ancient contigs and contigs originating from modern contamination. We test PyDamage on both on simulated aDNA data and archaeological paleofeces, and we demonstrate its ability to reliably and automatically identify contigs bearing DNA damage characteristic of aDNA. Coupled with aDNA de novo assembly, Pydamage opens up new doors to explore functional diversity in ancient metagenomic datasets.
Collapse
Affiliation(s)
- Maxime Borry
- Microbiome Sciences Group, Max Planck Institute for the Science of Human History, Department of Archaeogenetics, Jena, Germany
| | - Alexander Hübner
- Microbiome Sciences Group, Max Planck Institute for the Science of Human History, Department of Archaeogenetics, Jena, Germany.,Faculty of Biological Sciences, Friedrich-Schiller Universität Jena, Jena, Germany
| | - Adam B Rohrlach
- Population Genetics Group, Max Planck Institute for the Science of Human History, Department of Archaeogenetics, Jena, Germany.,ARC Centre of Excellence for Mathematical and Statistical Frontiers, The University of Adelaide, Adelaide, Australia
| | - Christina Warinner
- Microbiome Sciences Group, Max Planck Institute for the Science of Human History, Department of Archaeogenetics, Jena, Germany.,Faculty of Biological Sciences, Friedrich-Schiller Universität Jena, Jena, Germany.,Department of Anthropology, Harvard University, Cambridge, MA, United States of America
| |
Collapse
|
21
|
Clemente F, Unterländer M, Dolgova O, Amorim CEG, Coroado-Santos F, Neuenschwander S, Ganiatsou E, Cruz Dávalos DI, Anchieri L, Michaud F, Winkelbach L, Blöcher J, Arizmendi Cárdenas YO, Sousa da Mota B, Kalliga E, Souleles A, Kontopoulos I, Karamitrou-Mentessidi G, Philaniotou O, Sampson A, Theodorou D, Tsipopoulou M, Akamatis I, Halstead P, Kotsakis K, Urem-Kotsou D, Panagiotopoulos D, Ziota C, Triantaphyllou S, Delaneau O, Jensen JD, Moreno-Mayar JV, Burger J, Sousa VC, Lao O, Malaspinas AS, Papageorgopoulou C. The genomic history of the Aegean palatial civilizations. Cell 2021; 184:2565-2586.e21. [PMID: 33930288 PMCID: PMC8127963 DOI: 10.1016/j.cell.2021.03.039] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Revised: 09/17/2020] [Accepted: 03/18/2021] [Indexed: 12/30/2022]
Abstract
The Cycladic, the Minoan, and the Helladic (Mycenaean) cultures define the Bronze Age (BA) of Greece. Urbanism, complex social structures, craft and agricultural specialization, and the earliest forms of writing characterize this iconic period. We sequenced six Early to Middle BA whole genomes, along with 11 mitochondrial genomes, sampled from the three BA cultures of the Aegean Sea. The Early BA (EBA) genomes are homogeneous and derive most of their ancestry from Neolithic Aegeans, contrary to earlier hypotheses that the Neolithic-EBA cultural transition was due to massive population turnover. EBA Aegeans were shaped by relatively small-scale migration from East of the Aegean, as evidenced by the Caucasus-related ancestry also detected in Anatolians. In contrast, Middle BA (MBA) individuals of northern Greece differ from EBA populations in showing ∼50% Pontic-Caspian Steppe-related ancestry, dated at ca. 2,600-2,000 BCE. Such gene flow events during the MBA contributed toward shaping present-day Greek genomes.
Collapse
Affiliation(s)
- Florian Clemente
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Martina Unterländer
- Laboratory of Physical Anthropology, Department of History and Ethnology, Democritus University of Thrace, 69100 Komotini, Greece; Palaeogenetics Group, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University of Mainz, 55099 Mainz, Germany
| | - Olga Dolgova
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, 08028 Barcelona, Spain
| | - Carlos Eduardo G Amorim
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Francisco Coroado-Santos
- CE3C, Centre for Ecology, Evolution and Environmental Changes, Faculty of Sciences of the University of Lisbon, 1749-016 Lisbon, Portugal
| | - Samuel Neuenschwander
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; Vital-IT, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Elissavet Ganiatsou
- Laboratory of Physical Anthropology, Department of History and Ethnology, Democritus University of Thrace, 69100 Komotini, Greece
| | - Diana I Cruz Dávalos
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Lucas Anchieri
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Frédéric Michaud
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Laura Winkelbach
- Palaeogenetics Group, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University of Mainz, 55099 Mainz, Germany
| | - Jens Blöcher
- Palaeogenetics Group, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University of Mainz, 55099 Mainz, Germany
| | - Yami Ommar Arizmendi Cárdenas
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Bárbara Sousa da Mota
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Eleni Kalliga
- Laboratory of Physical Anthropology, Department of History and Ethnology, Democritus University of Thrace, 69100 Komotini, Greece
| | - Angelos Souleles
- Laboratory of Physical Anthropology, Department of History and Ethnology, Democritus University of Thrace, 69100 Komotini, Greece
| | - Ioannis Kontopoulos
- Center for GeoGenetics, GLOBE Institute, University of Copenhagen, 1350 Copenhagen, Denmark
| | | | - Olga Philaniotou
- Ephor Emerita of Antiquities, Hellenic Ministry of Culture and Sports, 10682 Athens, Greece
| | - Adamantios Sampson
- Department of Mediterranean Studies, University of the Aegean, 85132 Rhodes, Greece
| | - Dimitra Theodorou
- Ephorate of Antiquities of Kozani, Hellenic Ministry of Culture and Sports, 50004 Kozani, Greece
| | - Metaxia Tsipopoulou
- Ephor Emerita of Antiquities, Hellenic Ministry of Culture and Sports, 10682 Athens, Greece
| | - Ioannis Akamatis
- Department of History and Archaeology, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
| | - Paul Halstead
- Department of Archaeology, University of Sheffield, Minalloy House, 10-16 Regent St., Sheffield S1 3NJ, UK
| | - Kostas Kotsakis
- Department of History and Archaeology, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
| | - Dushka Urem-Kotsou
- Department of History and Ethnology, Democritus University of Thrace, 69100 Komotini, Greece
| | - Diamantis Panagiotopoulos
- Institute of Classical Archaeology, University of Heidelberg, Marstallhof 4, 69117 Heidelberg, Germany
| | - Christina Ziota
- Ephorate of Antiquities of Florina, Hellenic Ministry of Culture and Sports, 53100 Florina, Greece
| | - Sevasti Triantaphyllou
- Department of History and Archaeology, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
| | - Olivier Delaneau
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| | - J Víctor Moreno-Mayar
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; Center for GeoGenetics, GLOBE Institute, University of Copenhagen, 1350 Copenhagen, Denmark; National Institute of Genomic Medicine (INMEGEN), 14610 Mexico City, Mexico
| | - Joachim Burger
- Palaeogenetics Group, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University of Mainz, 55099 Mainz, Germany
| | - Vitor C Sousa
- CE3C, Centre for Ecology, Evolution and Environmental Changes, Faculty of Sciences of the University of Lisbon, 1749-016 Lisbon, Portugal
| | - Oscar Lao
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, 08028 Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Anna-Sapfo Malaspinas
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.
| | - Christina Papageorgopoulou
- Laboratory of Physical Anthropology, Department of History and Ethnology, Democritus University of Thrace, 69100 Komotini, Greece.
| |
Collapse
|
22
|
Wibowo MC, Yang Z, Borry M, Hübner A, Huang KD, Tierney BT, Zimmerman S, Barajas-Olmos F, Contreras-Cubas C, García-Ortiz H, Martínez-Hernández A, Luber JM, Kirstahler P, Blohm T, Smiley FE, Arnold R, Ballal SA, Pamp SJ, Russ J, Maixner F, Rota-Stabelli O, Segata N, Reinhard K, Orozco L, Warinner C, Snow M, LeBlanc S, Kostic AD. Reconstruction of ancient microbial genomes from the human gut. Nature 2021; 594:234-239. [PMID: 33981035 PMCID: PMC8189908 DOI: 10.1038/s41586-021-03532-0] [Citation(s) in RCA: 105] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 04/12/2021] [Indexed: 12/26/2022]
Abstract
Loss of gut microbial diversity1–6 in industrial populations is associated with chronic diseases7, underscoring the importance of studying our ancestral gut microbiome. However, relatively little is known about the composition of pre-industrial gut microbiomes. Here we performed a large-scale de novo assembly of microbial genomes from palaeofaeces. From eight authenticated human palaeofaeces samples (1,000–2,000 years old) with well-preserved DNA from southwestern USA and Mexico, we reconstructed 498 medium- and high-quality microbial genomes. Among the 181 genomes with the strongest evidence of being ancient and of human gut origin, 39% represent previously undescribed species-level genome bins. Tip dating suggests an approximate diversification timeline for the key human symbiont Methanobrevibacter smithii. In comparison to 789 present-day human gut microbiome samples from eight countries, the palaeofaeces samples are more similar to non-industrialized than industrialized human gut microbiomes. Functional profiling of the palaeofaeces samples reveals a markedly lower abundance of antibiotic-resistance and mucin-degrading genes, as well as enrichment of mobile genetic elements relative to industrial gut microbiomes. This study facilitates the discovery and characterization of previously undescribed gut microorganisms from ancient microbiomes and the investigation of the evolutionary history of the human gut microbiota through genome reconstruction from palaeofaeces. Ancient microbiomes from palaeofaeces are more similar to non-industrialized than industrialized human gut microbiomes regardless of geography, but 39% of their de novo reconstructed genomes represent previously undescribed microbial species.
Collapse
Affiliation(s)
- Marsha C Wibowo
- Section on Pathophysiology and Molecular Pharmacology, Joslin Diabetes Center, Boston, MA, USA.,Department of Microbiology, Harvard Medical School, Boston, MA, USA
| | - Zhen Yang
- Section on Pathophysiology and Molecular Pharmacology, Joslin Diabetes Center, Boston, MA, USA.,Department of Microbiology, Harvard Medical School, Boston, MA, USA.,Department of Combinatorics and Optimization, University of Waterloo, Waterloo, Ontario, Canada
| | - Maxime Borry
- Department of Archaeogenetics, Max Planck Institute for the Science of Human History, Jena, Germany
| | - Alexander Hübner
- Department of Archaeogenetics, Max Planck Institute for the Science of Human History, Jena, Germany
| | - Kun D Huang
- CIBIO Department, University of Trento, Trento, Italy.,Research and Innovation Centre, Fondazione Edmund Mach, San Michele all'Adige, Italy
| | - Braden T Tierney
- Section on Pathophysiology and Molecular Pharmacology, Joslin Diabetes Center, Boston, MA, USA.,Department of Microbiology, Harvard Medical School, Boston, MA, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Samuel Zimmerman
- Section on Pathophysiology and Molecular Pharmacology, Joslin Diabetes Center, Boston, MA, USA.,Department of Microbiology, Harvard Medical School, Boston, MA, USA
| | - Francisco Barajas-Olmos
- Immunogenomics and Metabolic Diseases Laboratory, Secretaría de Salud, Instituto Nacional de Medicina Genómica, Mexico City, Mexico
| | - Cecilia Contreras-Cubas
- Immunogenomics and Metabolic Diseases Laboratory, Secretaría de Salud, Instituto Nacional de Medicina Genómica, Mexico City, Mexico
| | - Humberto García-Ortiz
- Immunogenomics and Metabolic Diseases Laboratory, Secretaría de Salud, Instituto Nacional de Medicina Genómica, Mexico City, Mexico
| | - Angélica Martínez-Hernández
- Immunogenomics and Metabolic Diseases Laboratory, Secretaría de Salud, Instituto Nacional de Medicina Genómica, Mexico City, Mexico
| | - Jacob M Luber
- Section on Pathophysiology and Molecular Pharmacology, Joslin Diabetes Center, Boston, MA, USA.,Department of Microbiology, Harvard Medical School, Boston, MA, USA.,Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Philipp Kirstahler
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Tre Blohm
- Department of Anthropology, University of Montana, Missoula, MT, USA
| | - Francis E Smiley
- Department of Anthropology, Northern Arizona University, Flagstaff, AZ, USA
| | - Richard Arnold
- Pahrump Paiute Tribe and Consolidated Group of Tribes and Organizations, Pahrump, NV, USA
| | - Sonia A Ballal
- Department of Gastroenterology, Hepatology and Nutrition, Boston Children's Hospital, Boston, MA, USA
| | - Sünje Johanna Pamp
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Julia Russ
- Morrison Microscopy Core Research Facility, Center for Biotechnology, University of Nebraska-Lincoln, Lincoln, NE, USA
| | - Frank Maixner
- Institute for Mummy Studies, EURAC Research, Bolzano, Italy
| | - Omar Rota-Stabelli
- Research and Innovation Centre, Fondazione Edmund Mach, San Michele all'Adige, Italy.,Center Agriculture Food Environment (C3A), University of Trento, Trento, Italy
| | - Nicola Segata
- CIBIO Department, University of Trento, Trento, Italy
| | - Karl Reinhard
- School of Natural Resources, University of Nebraska-Lincoln, Lincoln, NE, USA
| | - Lorena Orozco
- Immunogenomics and Metabolic Diseases Laboratory, Secretaría de Salud, Instituto Nacional de Medicina Genómica, Mexico City, Mexico
| | - Christina Warinner
- Department of Archaeogenetics, Max Planck Institute for the Science of Human History, Jena, Germany.,Department of Anthropology, Harvard University, Cambridge, MA, USA.,Faculty of Biological Sciences, Friedrich-Schiller University, Jena, Germany
| | - Meradeth Snow
- Department of Anthropology, University of Montana, Missoula, MT, USA
| | - Steven LeBlanc
- Peabody Museum of Archaeology and Ethnology, Harvard University, Cambridge, MA, USA
| | - Aleksandar D Kostic
- Section on Pathophysiology and Molecular Pharmacology, Joslin Diabetes Center, Boston, MA, USA. .,Department of Microbiology, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
23
|
Diroma MA, Modi A, Lari M, Sineo L, Caramelli D, Vai S. New Insights Into Mitochondrial DNA Reconstruction and Variant Detection in Ancient Samples. Front Genet 2021; 12:619950. [PMID: 33679884 PMCID: PMC7930628 DOI: 10.3389/fgene.2021.619950] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 01/12/2021] [Indexed: 11/13/2022] Open
Abstract
Ancient DNA (aDNA) studies are frequently focused on the analysis of the mitochondrial DNA (mtDNA), which is much more abundant than the nuclear genome, hence can be better retrieved from ancient remains. However, postmortem DNA damage and contamination make the data analysis difficult because of DNA fragmentation and nucleotide alterations. In this regard, the assessment of the heteroplasmic fraction in ancient mtDNA has always been considered an unachievable goal due to the complexity in distinguishing true endogenous variants from artifacts. We implemented and applied a computational pipeline for mtDNA analysis to a dataset of 30 ancient human samples from an Iron Age necropolis in Polizzello (Sicily, Italy). The pipeline includes several modules from well-established tools for aDNA analysis and a recently released variant caller, which was specifically conceived for mtDNA, applied for the first time to aDNA data. Through a fine-tuned filtering on variant allele sequencing features, we were able to accurately reconstruct nearly complete (>88%) mtDNA genome for almost all the analyzed samples (27 out of 30), depending on the degree of preservation and the sequencing throughput, and to get a reliable set of variants allowing haplogroup prediction. Additionally, we provide guidelines to deal with possible artifact sources, including nuclear mitochondrial sequence (NumtS) contamination, an often-neglected issue in ancient mtDNA surveys. Potential heteroplasmy levels were also estimated, although most variants were likely homoplasmic, and validated by data simulations, proving that new sequencing technologies and software are sensitive enough to detect partially mutated sites in ancient genomes and discriminate true variants from artifacts. A thorough functional annotation of detected and filtered mtDNA variants was also performed for a comprehensive evaluation of these ancient samples.
Collapse
Affiliation(s)
- Maria Angela Diroma
- Dipartimento di Biologia, Università degli Studi di Firenze, Florence, Italy
| | - Alessandra Modi
- Dipartimento di Biologia, Università degli Studi di Firenze, Florence, Italy
| | - Martina Lari
- Dipartimento di Biologia, Università degli Studi di Firenze, Florence, Italy
| | - Luca Sineo
- Dipartimento di Scienze e Tecnologie Biologiche, Chimiche e Farmaceutiche, Università degli Studi di Palermo, Palermo, Italy
| | - David Caramelli
- Dipartimento di Biologia, Università degli Studi di Firenze, Florence, Italy
| | - Stefania Vai
- Dipartimento di Biologia, Università degli Studi di Firenze, Florence, Italy
| |
Collapse
|
24
|
Xu W, Lin Y, Zhao K, Li H, Tian Y, Ngatia JN, Ma Y, Sahu SK, Guo H, Guo X, Xu YC, Liu H, Kristiansen K, Lan T, Zhou X. An efficient pipeline for ancient DNA mapping and recovery of endogenous ancient DNA from whole-genome sequencing data. Ecol Evol 2021; 11:390-401. [PMID: 33437437 PMCID: PMC7790629 DOI: 10.1002/ece3.7056] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 10/10/2020] [Accepted: 10/29/2020] [Indexed: 12/20/2022] Open
Abstract
Ancient DNA research has developed rapidly over the past few decades due to improvements in PCR and next-generation sequencing (NGS) technologies, but challenges still exist. One major challenge in relation to ancient DNA research is to recover genuine endogenous ancient DNA sequences from raw sequencing data. This is often difficult due to degradation of ancient DNA and high levels of contamination, especially homologous contamination that has extremely similar genetic background with that of the real ancient DNA. In this study, we collected whole-genome sequencing (WGS) data from 6 ancient samples to compare different mapping algorithms. To further explore more effective methods to separate endogenous DNA from homologous contaminations, we attempted to recover reads based on ancient DNA specific characteristics of deamination, depurination, and DNA fragmentation with different parameters. We propose a quick and improved pipeline for separating endogenous ancient DNA while simultaneously decreasing homologous contaminations to very low proportions. Our goal in this research was to develop useful recommendations for ancient DNA mapping and for separation of endogenous DNA to facilitate future studies of ancient DNA.
Collapse
Affiliation(s)
- Wenhao Xu
- Institute of Vertebrate Paleontology and PaleoanthropologyChinese Academy of SciencesBeijingChina
- College of InformaticsHuazhong Agricultural UniversityWuhanChina
| | - Yu Lin
- State Key Laboratory of Agricultural GenomicsBGI‐ShenzhenShenzhenChina
- Guangdong Provincial Key Laboratory of Genome Read and WriteBGI‐ShenzhenShenzhenChina
| | - Keliang Zhao
- Institute of Vertebrate Paleontology and PaleoanthropologyChinese Academy of SciencesBeijingChina
- CAS Center for Excellence in Life and PaleoenvironmentBeijingChina
| | - Haimeng Li
- State Key Laboratory of Agricultural GenomicsBGI‐ShenzhenShenzhenChina
- School of Future TechnologyUniversity of Chinese Academy of SciencesBeijingChina
| | - Yinping Tian
- State Key Laboratory of Agricultural GenomicsBGI‐ShenzhenShenzhenChina
| | | | - Yue Ma
- College of Wildlife ResourcesNortheast Forestry UniversityHarbinChina
| | - Sunil Kumar Sahu
- State Key Laboratory of Agricultural GenomicsBGI‐ShenzhenShenzhenChina
| | - Huabing Guo
- Forest Inventory and Planning Institute of Jilin ProvinceChangchunChina
| | - Xiaosen Guo
- State Key Laboratory of Agricultural GenomicsBGI‐ShenzhenShenzhenChina
- Guangdong Provincial Academician Workstation of BGI Synthetic GenomicsBGI‐ShenzhenShenzhenChina
| | - Yan Chun Xu
- College of Wildlife ResourcesNortheast Forestry UniversityHarbinChina
| | - Huan Liu
- State Key Laboratory of Agricultural GenomicsBGI‐ShenzhenShenzhenChina
- Department of BiologyLaboratory of Genomics and Molecular BiomedicineUniversity of CopenhagenCopenhagenDenmark
| | - Karsten Kristiansen
- State Key Laboratory of Agricultural GenomicsBGI‐ShenzhenShenzhenChina
- Department of BiologyLaboratory of Genomics and Molecular BiomedicineUniversity of CopenhagenCopenhagenDenmark
| | - Tianming Lan
- State Key Laboratory of Agricultural GenomicsBGI‐ShenzhenShenzhenChina
- Department of BiologyLaboratory of Genomics and Molecular BiomedicineUniversity of CopenhagenCopenhagenDenmark
| | - Xinying Zhou
- Institute of Vertebrate Paleontology and PaleoanthropologyChinese Academy of SciencesBeijingChina
- CAS Center for Excellence in Life and PaleoenvironmentBeijingChina
| |
Collapse
|
25
|
Garrett Vieira F, Samaniego Castruita JA, Gilbert MTP. Using in silico predicted ancestral genomes to improve the efficiency of paleogenome reconstruction. Ecol Evol 2020; 10:12700-12709. [PMID: 33304488 PMCID: PMC7713980 DOI: 10.1002/ece3.6925] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 09/23/2020] [Accepted: 09/28/2020] [Indexed: 01/20/2023] Open
Abstract
Paleogenomics is the nascent discipline concerned with sequencing and analysis of genome-scale information from historic, ancient, and even extinct samples. While once inconceivable due to the challenges of DNA damage, contamination, and the technical limitations of PCR-based Sanger sequencing, following the dawn of the second-generation sequencing revolution, it has rapidly become a reality. However, a significant challenge facing ancient DNA studies on extinct species is the lack of closely related reference genomes against which to map the sequencing reads from ancient samples. Although bioinformatic efforts to improve the assemblies have focused mainly in mapping algorithms, in this article we explore the potential of an alternative approach, namely using reconstructed ancestral genome as reference for mapping DNA sequences of ancient samples. Specifically, we present a preliminary proof of concept for a general framework and demonstrate how under certain evolutionary divergence thresholds, considerable mapping improvements can be easily obtained.
Collapse
Affiliation(s)
- Filipe Garrett Vieira
- Section for Evolutionary GenomicsThe GLOBE InstituteFaculty of Health and Medical SciencesUniversity of CopenhagenCopenhagenDenmark
| | - José Alfredo Samaniego Castruita
- Section for Evolutionary GenomicsThe GLOBE InstituteFaculty of Health and Medical SciencesUniversity of CopenhagenCopenhagenDenmark
| | - M. Thomas P. Gilbert
- Section for Evolutionary GenomicsThe GLOBE InstituteFaculty of Health and Medical SciencesUniversity of CopenhagenCopenhagenDenmark
- University MuseumNorwegian University of Science and TechnologyTrondheimNorway
| |
Collapse
|
26
|
Alosaimi S, Bandiang A, van Biljon N, Awany D, Thami PK, Tchamga MSS, Kiran A, Messaoud O, Hassan RIM, Mugo J, Ahmed A, Bope CD, Allali I, Mazandu GK, Mulder NJ, Chimusa ER. A broad survey of DNA sequence data simulation tools. Brief Funct Genomics 2020; 19:49-59. [PMID: 31867604 DOI: 10.1093/bfgp/elz033] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Revised: 10/27/2019] [Accepted: 11/04/2019] [Indexed: 11/12/2022] Open
Abstract
In silico DNA sequence generation is a powerful technology to evaluate and validate bioinformatics tools, and accordingly more than 35 DNA sequence simulation tools have been developed. With such a diverse array of tools to choose from, an important question is: Which tool should be used for a desired outcome? This question is largely unanswered as documentation for many of these DNA simulation tools is sparse. To address this, we performed a review of DNA sequence simulation tools developed to date and evaluated 20 state-of-art DNA sequence simulation tools on their ability to produce accurate reads based on their implemented sequence error model. We provide a succinct description of each tool and suggest which tool is most appropriate for the given different scenarios. Given the multitude of similar yet non-identical tools, researchers can use this review as a guide to inform their choice of DNA sequence simulation tool. This paves the way towards assessing existing tools in a unified framework, as well as enabling different simulation scenario analysis within the same framework.
Collapse
Affiliation(s)
- Shatha Alosaimi
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Armand Bandiang
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Noelle van Biljon
- Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Denis Awany
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Prisca K Thami
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa.,Botswana Harvard AIDS Institute Partnership, Gaborone, Botswana
| | - Milaine S S Tchamga
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Anmol Kiran
- Malawi-Liverpool-Wellcome Trust Clinical Research Programme, Blantyre, Malawi.,Edinburgh University, Edinburgh, UK
| | - Olfa Messaoud
- Université de Tunis El Manar, Institut Pasteur de Tunis, LR16IPT05 Génomique Biomédicale et Oncogénétique, Tunis, 1002, Tunisia
| | - Radia Ismaeel Mohammed Hassan
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Jacquiline Mugo
- Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Azza Ahmed
- Centre for Bioinformatics and Systems Biology, Faculty of Science, University of Khartoum, Sudan
| | - Christian D Bope
- Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Imane Allali
- Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Gaston K Mazandu
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa.,Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa.,African Institute for Mathematical Sciences (AIMS), Cape Town, South Africa
| | - Nicola J Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Emile R Chimusa
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| |
Collapse
|
27
|
Parker C, Rohrlach AB, Friederich S, Nagel S, Meyer M, Krause J, Bos KI, Haak W. A systematic investigation of human DNA preservation in medieval skeletons. Sci Rep 2020; 10:18225. [PMID: 33106554 PMCID: PMC7588426 DOI: 10.1038/s41598-020-75163-w] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2020] [Accepted: 10/07/2020] [Indexed: 12/14/2022] Open
Abstract
Ancient DNA (aDNA) analyses necessitate the destructive sampling of archaeological material. Currently, the cochlea, part of the osseous inner ear located inside the petrous pyramid, is the most sought after skeletal element for molecular analyses of ancient humans as it has been shown to yield high amounts of endogenous DNA. However, destructive sampling of the petrous pyramid may not always be possible, particularly in cases where preservation of skeletal morphology is of top priority. To investigate alternatives, we present a survey of human aDNA preservation for each of ten skeletal elements in a skeletal collection from Medieval Germany. Through comparison of human DNA content and quality we confirm best performance of the petrous pyramid and identify seven additional sampling locations across four skeletal elements that yield adequate aDNA for most applications in human palaeogenetics. Our study provides a better perspective on DNA preservation across the human skeleton and takes a further step toward the more responsible use of ancient materials in human aDNA studies.
Collapse
Affiliation(s)
- Cody Parker
- Max Planck Institute for the Science of Human History, Jena, Germany.
| | - Adam B Rohrlach
- Max Planck Institute for the Science of Human History, Jena, Germany
- ARC Centre of Excellence for Mathematical and Statistical Frontiers, The University of Adelaide, Adelaide, SA, Australia
| | - Susanne Friederich
- Landesamt für Denkmalpflege und Archäologie, Sachsen-Anhalt, Halle (Saale), Germany
| | - Sarah Nagel
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Matthias Meyer
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Johannes Krause
- Max Planck Institute for the Science of Human History, Jena, Germany.
| | - Kirsten I Bos
- Max Planck Institute for the Science of Human History, Jena, Germany
| | - Wolfgang Haak
- Max Planck Institute for the Science of Human History, Jena, Germany.
| |
Collapse
|
28
|
Martiniano R, Garrison E, Jones ER, Manica A, Durbin R. Removing reference bias and improving indel calling in ancient DNA data analysis by mapping to a sequence variation graph. Genome Biol 2020; 21:250. [PMID: 32943086 PMCID: PMC7499850 DOI: 10.1186/s13059-020-02160-7] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Accepted: 08/27/2020] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND During the last decade, the analysis of ancient DNA (aDNA) sequence has become a powerful tool for the study of past human populations. However, the degraded nature of aDNA means that aDNA molecules are short and frequently mutated by post-mortem chemical modifications. These features decrease read mapping accuracy and increase reference bias, in which reads containing non-reference alleles are less likely to be mapped than those containing reference alleles. Alternative approaches have been developed to replace the linear reference with a variation graph which includes known alternative variants at each genetic locus. Here, we evaluate the use of variation graph software vg to avoid reference bias for aDNA and compare with existing methods. RESULTS We use vg to align simulated and real aDNA samples to a variation graph containing 1000 Genome Project variants and compare with the same data aligned with bwa to the human linear reference genome. Using vg leads to a balanced allelic representation at polymorphic sites, effectively removing reference bias, and more sensitive variant detection in comparison with bwa, especially for insertions and deletions (indels). Alternative approaches that use relaxed bwa parameter settings or filter bwa alignments can also reduce bias but can have lower sensitivity than vg, particularly for indels. CONCLUSIONS Our findings demonstrate that aligning aDNA sequences to variation graphs effectively mitigates the impact of reference bias when analyzing aDNA, while retaining mapping sensitivity and allowing detection of variation, in particular indel variation, that was previously missed.
Collapse
Affiliation(s)
- Rui Martiniano
- Department of Genetics, University of Cambridge, Cambridge, CB3 0DH UK
| | - Erik Garrison
- Wellcome Sanger Institute, Cambridge, CB10 1SA UK
- Genomics Institute, University of California, Santa Cruz, CA 95064 USA
| | - Eppie R. Jones
- Department of Zoology, University of Cambridge, Cambridge, CB2 3EJ UK
| | - Andrea Manica
- Department of Zoology, University of Cambridge, Cambridge, CB2 3EJ UK
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Cambridge, CB3 0DH UK
- Wellcome Sanger Institute, Cambridge, CB10 1SA UK
| |
Collapse
|
29
|
Mühlemann B, Vinner L, Margaryan A, Wilhelmson H, de la Fuente Castro C, Allentoft ME, de Barros Damgaard P, Hansen AJ, Holtsmark Nielsen S, Strand LM, Bill J, Buzhilova A, Pushkina T, Falys C, Khartanovich V, Moiseyev V, Jørkov MLS, Østergaard Sørensen P, Magnusson Y, Gustin I, Schroeder H, Sutter G, Smith GL, Drosten C, Fouchier RAM, Smith DJ, Willerslev E, Jones TC, Sikora M. Diverse variola virus (smallpox) strains were widespread in northern Europe in the Viking Age. Science 2020; 369:369/6502/eaaw8977. [PMID: 32703849 DOI: 10.1126/science.aaw8977] [Citation(s) in RCA: 75] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Revised: 02/13/2020] [Accepted: 05/29/2020] [Indexed: 12/14/2022]
Abstract
Smallpox, one of the most devastating human diseases, killed between 300 million and 500 million people in the 20th century alone. We recovered viral sequences from 13 northern European individuals, including 11 dated to ~600-1050 CE, overlapping the Viking Age, and reconstructed near-complete variola virus genomes for four of them. The samples predate the earliest confirmed smallpox cases by ~1000 years, and the sequences reveal a now-extinct sister clade of the modern variola viruses that were in circulation before the eradication of smallpox. We date the most recent common ancestor of variola virus to ~1700 years ago. Distinct patterns of gene inactivation in the four near-complete sequences show that different evolutionary paths of genotypic host adaptation resulted in variola viruses that circulated widely among humans.
Collapse
Affiliation(s)
- Barbara Mühlemann
- Centre for Pathogen Evolution, Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK.,Institute of Virology, Charité - Universitätsmedizin Berlin, 10117 Berlin, Germany.,German Center for Infection Research (DZIF), Associated Partner Site, Berlin, Germany
| | - Lasse Vinner
- Lundbeck Foundation GeoGenetics Center, GLOBE Institute, University of Copenhagen, 1350 Copenhagen, Denmark
| | - Ashot Margaryan
- Lundbeck Foundation GeoGenetics Center, GLOBE Institute, University of Copenhagen, 1350 Copenhagen, Denmark.,Institute of Molecular Biology, National Academy of Sciences of Armenia, 0014 Yerevan, Armenia
| | - Helene Wilhelmson
- Department of Archaeology and Ancient History, Lund University, 221 00 Lund, Sweden.,Sydsvensk Arkeologi AB, 291 22 Kristianstad, Sweden
| | | | - Morten E Allentoft
- Lundbeck Foundation GeoGenetics Center, GLOBE Institute, University of Copenhagen, 1350 Copenhagen, Denmark.,Trace and Environmental DNA (TrEnD) Laboratory, School of Molecular and Life Sciences, Curtin University, 6102 Perth, WA, Australia
| | - Peter de Barros Damgaard
- Lundbeck Foundation GeoGenetics Center, GLOBE Institute, University of Copenhagen, 1350 Copenhagen, Denmark
| | - Anders Johannes Hansen
- Lundbeck Foundation GeoGenetics Center, GLOBE Institute, University of Copenhagen, 1350 Copenhagen, Denmark
| | - Sofie Holtsmark Nielsen
- Lundbeck Foundation GeoGenetics Center, GLOBE Institute, University of Copenhagen, 1350 Copenhagen, Denmark
| | - Lisa Mariann Strand
- Department of Archaeology and Cultural History, Norwegian University of Science and Technology University Museum, 7491 Trondheim, Norway
| | - Jan Bill
- Museum of Cultural History, University of Oslo, 0130 Oslo, Norway
| | - Alexandra Buzhilova
- Research Institute and Museum of Anthropology, Lomonosov Moscow State University, Moscow 125009, Russian Federation
| | - Tamara Pushkina
- Department of Archaeology, Faculty of History, Lomonosov Moscow State University, Moscow 119992, Russian Federation
| | - Ceri Falys
- Thames Valley Archaeological Services, Reading RG1 5NR, UK
| | - Valeri Khartanovich
- Peter the Great Museum of Anthropology and Ethnography (Kunstkamera) RAS, 199034 St. Petersburg, Russian Federation
| | - Vyacheslav Moiseyev
- Peter the Great Museum of Anthropology and Ethnography (Kunstkamera) RAS, 199034 St. Petersburg, Russian Federation
| | - Marie Louise Schjellerup Jørkov
- Laboratory of Biological Anthropology, Department of Forensic Medicine, Faculty of Health Sciences, University of Copenhagen, 2100 Copenhagen, Denmark
| | | | | | - Ingrid Gustin
- Department of Archaeology and Ancient History, Lund University, 221 00 Lund, Sweden
| | - Hannes Schroeder
- Section for Evolutionary Genomics, GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, 1353 Copenhagen, Denmark
| | - Gerd Sutter
- Institute for Infectious Diseases and Zoonoses, LMU University of Munich, 80539 Munich, Germany.,German Center for Infection Research (DZIF), Partner Site, Munich, Germany
| | - Geoffrey L Smith
- Department of Pathology, University of Cambridge, Cambridge CB2 1QP, UK
| | - Christian Drosten
- Institute of Virology, Charité - Universitätsmedizin Berlin, 10117 Berlin, Germany.,German Center for Infection Research (DZIF), Associated Partner Site, Berlin, Germany
| | - Ron A M Fouchier
- Department of Viroscience, Erasmus Medical Centre, 3015 CN Rotterdam, Netherlands
| | - Derek J Smith
- Centre for Pathogen Evolution, Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK
| | - Eske Willerslev
- Lundbeck Foundation GeoGenetics Center, GLOBE Institute, University of Copenhagen, 1350 Copenhagen, Denmark. .,Lundbeck Foundation GeoGenetics Center, Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK.,Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.,Danish Institute for Advanced Study, University of Southern Denmark, 5230 Odense M, Denmark
| | - Terry C Jones
- Centre for Pathogen Evolution, Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK. .,Institute of Virology, Charité - Universitätsmedizin Berlin, 10117 Berlin, Germany.,German Center for Infection Research (DZIF), Associated Partner Site, Berlin, Germany
| | - Martin Sikora
- Lundbeck Foundation GeoGenetics Center, GLOBE Institute, University of Copenhagen, 1350 Copenhagen, Denmark.
| |
Collapse
|
30
|
Poullet M, Orlando L. Assessing DNA Sequence Alignment Methods for Characterizing Ancient Genomes and Methylomes. Front Ecol Evol 2020. [DOI: 10.3389/fevo.2020.00105] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
|
31
|
Pierini F, Nutsua M, Böhme L, Özer O, Bonczarowska J, Susat J, Franke A, Nebel A, Krause-Kyora B, Lenz TL. Targeted analysis of polymorphic loci from low-coverage shotgun sequence data allows accurate genotyping of HLA genes in historical human populations. Sci Rep 2020; 10:7339. [PMID: 32355290 PMCID: PMC7193575 DOI: 10.1038/s41598-020-64312-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 04/14/2020] [Indexed: 01/15/2023] Open
Abstract
The highly polymorphic human leukocyte antigen (HLA) plays a crucial role in adaptive immunity and is associated with various complex diseases. Accurate analysis of HLA genes using ancient DNA (aDNA) data is crucial for understanding their role in human adaptation to pathogens. Here, we describe the TARGT pipeline for targeted analysis of polymorphic loci from low-coverage shotgun sequence data. The pipeline was successfully applied to medieval aDNA samples and validated using both simulated aDNA and modern empirical sequence data from the 1000 Genomes Project. Thus the TARGT pipeline enables accurate analysis of HLA polymorphisms in historical (and modern) human populations.
Collapse
Affiliation(s)
- Federica Pierini
- Research Group for Evolutionary Immunogenomics, Max Planck Institute for Evolutionary Biology, 24306, Ploen, Germany.,Université Paris-Saclay, CNRS, Inria, Laboratoire de recherche en informatique, 91405, Orsay, France
| | - Marcel Nutsua
- Institute of Clinical Molecular Biology, Kiel University, 24105, Kiel, Germany
| | - Lisa Böhme
- Institute of Clinical Molecular Biology, Kiel University, 24105, Kiel, Germany
| | - Onur Özer
- Research Group for Evolutionary Immunogenomics, Max Planck Institute for Evolutionary Biology, 24306, Ploen, Germany
| | - Joanna Bonczarowska
- Institute of Clinical Molecular Biology, Kiel University, 24105, Kiel, Germany
| | - Julian Susat
- Institute of Clinical Molecular Biology, Kiel University, 24105, Kiel, Germany
| | - Andre Franke
- Institute of Clinical Molecular Biology, Kiel University, 24105, Kiel, Germany
| | - Almut Nebel
- Institute of Clinical Molecular Biology, Kiel University, 24105, Kiel, Germany
| | - Ben Krause-Kyora
- Institute of Clinical Molecular Biology, Kiel University, 24105, Kiel, Germany
| | - Tobias L Lenz
- Research Group for Evolutionary Immunogenomics, Max Planck Institute for Evolutionary Biology, 24306, Ploen, Germany.
| |
Collapse
|
32
|
Hübler R, Key FM, Warinner C, Bos KI, Krause J, Herbig A. HOPS: automated detection and authentication of pathogen DNA in archaeological remains. Genome Biol 2019; 20:280. [PMID: 31842945 PMCID: PMC6913047 DOI: 10.1186/s13059-019-1903-0] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Accepted: 11/27/2019] [Indexed: 01/28/2023] Open
Abstract
High-throughput DNA sequencing enables large-scale metagenomic analyses of complex biological systems. Such analyses are not restricted to present-day samples and can also be applied to molecular data from archaeological remains. Investigations of ancient microbes can provide valuable information on past bacterial commensals and pathogens, but their molecular detection remains a challenge. Here, we present HOPS (Heuristic Operations for Pathogen Screening), an automated bacterial screening pipeline for ancient DNA sequences that provides detailed information on species identification and authenticity. HOPS is a versatile tool for high-throughput screening of DNA from archaeological material to identify candidates for genome-level analyses.
Collapse
Affiliation(s)
- Ron Hübler
- Max Planck Institute for the Science of Human History, Jena, Germany
| | - Felix M Key
- Max Planck Institute for the Science of Human History, Jena, Germany. .,Institute for Medical Engineering and Sciences, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA. .,Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| | | | - Kirsten I Bos
- Max Planck Institute for the Science of Human History, Jena, Germany
| | - Johannes Krause
- Max Planck Institute for the Science of Human History, Jena, Germany
| | - Alexander Herbig
- Max Planck Institute for the Science of Human History, Jena, Germany.
| |
Collapse
|
33
|
Moreno-Mayar JV, Korneliussen TS, Dalal J, Renaud G, Albrechtsen A, Nielsen R, Malaspinas AS. A likelihood method for estimating present-day human contamination in ancient male samples using low-depth X-chromosome data. Bioinformatics 2019; 36:828-841. [PMID: 31504166 PMCID: PMC8215924 DOI: 10.1093/bioinformatics/btz660] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Revised: 08/05/2019] [Accepted: 08/22/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION The presence of present-day human contaminating DNA fragments is one of the challenges defining ancient DNA (aDNA) research. This is especially relevant to the ancient human DNA field where it is difficult to distinguish endogenous molecules from human contaminants due to their genetic similarity. Recently, with the advent of high-throughput sequencing and new aDNA protocols, hundreds of ancient human genomes have become available. Contamination in those genomes has been measured with computational methods often developed specifically for these empirical studies. Consequently, some of these methods have not been implemented and tested for general use while few are aimed at low-depth nuclear data, a common feature in aDNA datasets. RESULTS We develop a new X-chromosome-based maximum likelihood method for estimating present-day human contamination in low-depth sequencing data from male individuals. We implement our method for general use, assess its performance under conditions typical of ancient human DNA research, and compare it to previous nuclear data-based methods through extensive simulations. For low-depth data, we show that existing methods can produce unusable estimates or substantially underestimate contamination. In contrast, our method provides accurate estimates for a depth of coverage as low as 0.5× on the X-chromosome when contamination is below 25%. Moreover, our method still yields meaningful estimates in very challenging situations, i.e. when the contaminant and the target come from closely related populations or with increased error rates. With a running time below 5 min, our method is applicable to large scale aDNA genomic studies. AVAILABILITY AND IMPLEMENTATION The method is implemented in C++ and R and is available in github.com/sapfo/contaminationX and popgen.dk/angsd.
Collapse
Affiliation(s)
| | | | - Jyoti Dalal
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland,Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Gabriel Renaud
- Lundbeck Foundation GeoGenetics Centre, GLOBE Institute, University of Copenhagen, 1350 Copenhagen
| | - Anders Albrechtsen
- Department of Biology, The Bioinformatics Centre, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Rasmus Nielsen
- Lundbeck Foundation GeoGenetics Centre, GLOBE Institute, University of Copenhagen, 1350 Copenhagen,Department of Statistics, CA 94720, USA,Department of Integrative Biology, University of California, Berkeley, CA 94720, USA
| | | |
Collapse
|
34
|
Günther T, Nettelblad C. The presence and impact of reference bias on population genomic studies of prehistoric human populations. PLoS Genet 2019; 15:e1008302. [PMID: 31348818 PMCID: PMC6685638 DOI: 10.1371/journal.pgen.1008302] [Citation(s) in RCA: 103] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Revised: 08/07/2019] [Accepted: 07/10/2019] [Indexed: 11/18/2022] Open
Abstract
Haploid high quality reference genomes are an important resource in genomic research projects. A consequence is that DNA fragments carrying the reference allele will be more likely to map successfully, or receive higher quality scores. This reference bias can have effects on downstream population genomic analysis when heterozygous sites are falsely considered homozygous for the reference allele. In palaeogenomic studies of human populations, mapping against the human reference genome is used to identify endogenous human sequences. Ancient DNA studies usually operate with low sequencing coverages and fragmentation of DNA molecules causes a large proportion of the sequenced fragments to be shorter than 50 bp-reducing the amount of accepted mismatches, and increasing the probability of multiple matching sites in the genome. These ancient DNA specific properties are potentially exacerbating the impact of reference bias on downstream analyses, especially since most studies of ancient human populations use pseudo-haploid data, i.e. they randomly sample only one sequencing read per site. We show that reference bias is pervasive in published ancient DNA sequence data of prehistoric humans with some differences between individual genomic regions. We illustrate that the strength of reference bias is negatively correlated with fragment length. Most genomic regions we investigated show little to no mapping bias but even a small proportion of sites with bias can impact analyses of those particular loci or slightly skew genome-wide estimates. Therefore, reference bias has the potential to cause minor but significant differences in the results of downstream analyses such as population allele sharing, heterozygosity estimates and estimates of archaic ancestry. These spurious results highlight how important it is to be aware of these technical artifacts and that we need strategies to mitigate the effect. Therefore, we suggest some post-mapping filtering strategies to resolve reference bias which help to reduce its impact substantially.
Collapse
Affiliation(s)
- Torsten Günther
- Human Evolution, Department of Organismal Biology, Uppsala University, Uppsala, Sweden
| | - Carl Nettelblad
- Division of Scientific Computing, Department of Information Technology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| |
Collapse
|
35
|
Renaud G, Hanghøj K, Korneliussen TS, Willerslev E, Orlando L. Joint Estimates of Heterozygosity and Runs of Homozygosity for Modern and Ancient Samples. Genetics 2019; 212:587-614. [PMID: 31088861 PMCID: PMC6614887 DOI: 10.1534/genetics.119.302057] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 05/01/2019] [Indexed: 11/18/2022] Open
Abstract
Both the total amount and the distribution of heterozygous sites within individual genomes are informative about the genetic diversity of the population they belong to. Detecting true heterozygous sites in ancient genomes is complicated by the generally limited coverage achieved and the presence of post-mortem damage inflating sequencing errors. Additionally, large runs of homozygosity found in the genomes of particularly inbred individuals and of domestic animals can skew estimates of genome-wide heterozygosity rates. Current computational tools aimed at estimating runs of homozygosity and genome-wide heterozygosity levels are generally sensitive to such limitations. Here, we introduce ROHan, a probabilistic method which substantially improves the estimate of heterozygosity rates both genome-wide and for genomic local windows. It combines a local Bayesian model and a Hidden Markov Model at the genome-wide level and can work both on modern and ancient samples. We show that our algorithm outperforms currently available methods for predicting heterozygosity rates for ancient samples. Specifically, ROHan can delineate large runs of homozygosity (at megabase scales) and produce a reliable confidence interval for the genome-wide rate of heterozygosity outside of such regions from modern genomes with a depth of coverage as low as 5-6× and down to 7-8× for ancient samples showing moderate DNA damage. We apply ROHan to a series of modern and ancient genomes previously published and revise available estimates of heterozygosity for humans, chimpanzees and horses.
Collapse
Affiliation(s)
- Gabriel Renaud
- Lundbeck Foundation GeoGenetics Center, Globe Institute, University of Copenhagen, 1350K, Denmark
| | - Kristian Hanghøj
- Lundbeck Foundation GeoGenetics Center, Globe Institute, University of Copenhagen, 1350K, Denmark
- Laboratoire d'Anthropobiologie Moléculaire et d'Imagerie de Synthèse, CNRS UMR 5288, Université de Toulouse, Université Paul Sabatier, 31000, France
| | | | - Eske Willerslev
- Lundbeck Foundation GeoGenetics Center, Globe Institute, University of Copenhagen, 1350K, Denmark
- Department of Zoology, University of Cambridge, CB2 3EJ, UK
- The Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
- The Danish Institute for Advanced Study at The University of Southern Denmark, DK-5230 Odense M, Denmark
| | - Ludovic Orlando
- Lundbeck Foundation GeoGenetics Center, Globe Institute, University of Copenhagen, 1350K, Denmark
- Laboratoire d'Anthropobiologie Moléculaire et d'Imagerie de Synthèse, CNRS UMR 5288, Université de Toulouse, Université Paul Sabatier, 31000, France
| |
Collapse
|
36
|
van der Valk T, Vezzi F, Ormestad M, Dalén L, Guschanski K. Index hopping on the Illumina HiseqX platform and its consequences for ancient DNA studies. Mol Ecol Resour 2019; 20:1171-1181. [DOI: 10.1111/1755-0998.13009] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2018] [Revised: 02/28/2019] [Accepted: 02/28/2019] [Indexed: 12/30/2022]
Affiliation(s)
- Tom van der Valk
- Animal Ecology, Department of Ecology and Genetics, Evolutionary Biology Centre Uppsala University Uppsala Sweden
| | | | | | - Love Dalén
- Department of Bioinformatics and Genetics Swedish Museum of Natural History Stockholm Sweden
| | - Katerina Guschanski
- Animal Ecology, Department of Ecology and Genetics, Evolutionary Biology Centre Uppsala University Uppsala Sweden
| |
Collapse
|
37
|
Hanghøj K, Renaud G, Albrechtsen A, Orlando L. DamMet: ancient methylome mapping accounting for errors, true variants, and post-mortem DNA damage. Gigascience 2019; 8:giz025. [PMID: 31004132 PMCID: PMC6474913 DOI: 10.1093/gigascience/giz025] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Revised: 02/07/2019] [Accepted: 02/27/2019] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Recent computational advances in ancient DNA research have opened access to the detection of ancient DNA methylation footprints at the genome-wide scale. The most commonly used approach infers the methylation state of a given genomic region on the basis of the amount of nucleotide mis-incorporations observed at CpG dinucleotide sites. However, this approach overlooks a number of confounding factors, including the presence of sequencing errors and true variants. The scale and distribution of the inferred methylation measurements are also variable across samples, precluding direct comparisons. FINDINGS Here, we present DamMet, an open-source software program retrieving maximum likelihood estimates of regional CpG methylation levels from ancient DNA sequencing data. It builds on a novel statistical model of post-mortem DNA damage for dinucleotides, accounting for sequencing errors, genotypes, and differential post-mortem cytosine deamination rates at both methylated and unmethylated sites. To validate DamMet, we extended gargammel, a sequence simulator for ancient DNA data, by introducing methylation-dependent features of post-mortem DNA decay. This new simulator provides direct validation of DamMet predictions. Additionally, the methylation levels inferred by DamMet were found to be correlated to those inferred by epiPALEOMIX and both on par and directly comparable to those measured from whole-genome bisulphite sequencing experiments of fresh tissues. CONCLUSIONS DamMet provides genuine estimates for local DNA methylation levels in ancient individual genomes. The returned estimates are directly cross-sample comparable, and the software is available as an open-source C++ program hosted at https://gitlab.com/KHanghoj/DamMet along with a manual and tutorial.
Collapse
Affiliation(s)
- Kristian Hanghøj
- Lundbeck Foundation GeoGenetics Center, University of Copenhagen, Øster Voldgade 5-7, 1350K Copenhagen, Denmark
- Laboratoire d’Anthropobiologie Moléculaire et d’Imagerie de Synthèse, CNRS UMR 5288, Université de Toulouse III, Paul Sabatier (UPS), 31000 Toulouse, France
| | - Gabriel Renaud
- Lundbeck Foundation GeoGenetics Center, University of Copenhagen, Øster Voldgade 5-7, 1350K Copenhagen, Denmark
| | - Anders Albrechtsen
- Computational and RNA Biology, Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark, Øster voldgade 5-7, 1350k
| | - Ludovic Orlando
- Lundbeck Foundation GeoGenetics Center, University of Copenhagen, Øster Voldgade 5-7, 1350K Copenhagen, Denmark
- Laboratoire d’Anthropobiologie Moléculaire et d’Imagerie de Synthèse, CNRS UMR 5288, Université de Toulouse III, Paul Sabatier (UPS), 31000 Toulouse, France
| |
Collapse
|
38
|
Eisenhofer R, Weyrich LS. Assessing alignment-based taxonomic classification of ancient microbial DNA. PeerJ 2019; 7:e6594. [PMID: 30886779 PMCID: PMC6420809 DOI: 10.7717/peerj.6594] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Accepted: 02/09/2019] [Indexed: 11/20/2022] Open
Abstract
The field of palaeomicrobiology-the study of ancient microorganisms-is rapidly growing due to recent methodological and technological advancements. It is now possible to obtain vast quantities of DNA data from ancient specimens in a high-throughput manner and use this information to investigate the dynamics and evolution of past microbial communities. However, we still know very little about how the characteristics of ancient DNA influence our ability to accurately assign microbial taxonomies (i.e. identify species) within ancient metagenomic samples. Here, we use both simulated and published metagenomic data sets to investigate how ancient DNA characteristics affect alignment-based taxonomic classification. We find that nucleotide-to-nucleotide, rather than nucleotide-to-protein, alignments are preferable when assigning taxonomies to short DNA fragment lengths routinely identified within ancient specimens (<60 bp). We determine that deamination (a form of ancient DNA damage) and random sequence substitutions corresponding to ∼100,000 years of genomic divergence minimally impact alignment-based classification. We also test four different reference databases and find that database choice can significantly bias the results of alignment-based taxonomic classification in ancient metagenomic studies. Finally, we perform a reanalysis of previously published ancient dental calculus data, increasing the number of microbial DNA sequences assigned taxonomically by an average of 64.2-fold and identifying microbial species previously unidentified in the original study. Overall, this study enhances our understanding of how ancient DNA characteristics influence alignment-based taxonomic classification of ancient microorganisms and provides recommendations for future palaeomicrobiological studies.
Collapse
Affiliation(s)
- Raphael Eisenhofer
- Australian Centre for Ancient DNA, University of Adelaide, Adelaide, SA, Australia.,Centre of Excellence for Australia Biodiversity and Heritage, University of Adelaide, Adelaide, SA, Australia
| | - Laura Susan Weyrich
- Australian Centre for Ancient DNA, University of Adelaide, Adelaide, SA, Australia.,Centre of Excellence for Australia Biodiversity and Heritage, University of Adelaide, Adelaide, SA, Australia
| |
Collapse
|
39
|
Renaud G, Schubert M, Sawyer S, Orlando L. Authentication and Assessment of Contamination in Ancient DNA. Methods Mol Biol 2019; 1963:163-194. [PMID: 30875054 DOI: 10.1007/978-1-4939-9176-1_17] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Contamination from both present-day humans and postmortem microbial sources is a common challenge in ancient DNA studies. Here we present a suite of tools to assist in the assessment of contamination in ancient DNA data sets. These tools perform standard tests of authenticity of ancient DNA data including detecting the presence of postmortem damage signatures in sequence alignments and quantifying the amount of present-day human contamination.
Collapse
Affiliation(s)
- Gabriel Renaud
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen K, Denmark
| | - Mikkel Schubert
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen K, Denmark
| | - Susanna Sawyer
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen K, Denmark
| | - Ludovic Orlando
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen K, Denmark.
- Laboratoire d'Anthropobiologie Moléculaire et d'Imagerie de Synthèse, CNRS UMR 5288, Université de Toulouse, University Paul Sabatier, Toulouse, France.
| |
Collapse
|
40
|
Kawash JK, Smith SD, Karaiskos S, Grigoriev A. ARIADNA: machine learning method for ancient DNA variant discovery. DNA Res 2018; 25:619-627. [PMID: 30215675 PMCID: PMC6289774 DOI: 10.1093/dnares/dsy029] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2018] [Accepted: 08/15/2018] [Indexed: 12/30/2022] Open
Abstract
Ancient DNA (aDNA) studies often rely on standard methods of mutation calling, optimized for high-quality contemporary DNA but not for excessive contamination, time- or environment-related damage of aDNA. In the absence of validated datasets and despite showing extreme sensitivity to aDNA quality, these methods have been used in many published studies, sometimes with additions of arbitrary filters or modifications, designed to overcome aDNA degradation and contamination problems. The general lack of best practices for aDNA mutation calling may lead to inaccurate results. To address these problems, we present ARIADNA (ARtificial Intelligence for Ancient DNA), a novel approach based on machine learning techniques, using specific aDNA characteristics as features to yield improved mutation calls. In our comparisons of variant callers across several ancient genomes, ARIADNA consistently detected higher-quality genome variants with fast runtimes, while reducing the false positive rate compared with other approaches.
Collapse
Affiliation(s)
- Joseph K Kawash
- Department of Biology, Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA
| | - Sean D Smith
- Department of Biology, Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA
| | - Spyros Karaiskos
- Department of Biology, Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA
| | - Andrey Grigoriev
- Department of Biology, Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA
| |
Collapse
|
41
|
de Filippo C, Meyer M, Prüfer K. Quantifying and reducing spurious alignments for the analysis of ultra-short ancient DNA sequences. BMC Biol 2018; 16:121. [PMID: 30359256 PMCID: PMC6202837 DOI: 10.1186/s12915-018-0581-9] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 09/27/2018] [Indexed: 04/25/2023] Open
Abstract
BACKGROUND The study of ancient DNA is hampered by degradation, resulting in short DNA fragments. Advances in laboratory methods have made it possible to retrieve short DNA fragments, thereby improving access to DNA preserved in highly degraded, ancient material. However, such material contains large amounts of microbial contamination in addition to DNA fragments from the ancient organism. The resulting mixture of sequences constitutes a challenge for computational analysis, since microbial sequences are hard to distinguish from the ancient sequences of interest, especially when they are short. RESULTS Here, we develop a method to quantify spurious alignments based on the presence or absence of rare variants. We find that spurious alignments are enriched for mismatches and insertion/deletion differences and lack substitution patterns typical of ancient DNA. The impact of spurious alignments can be reduced by filtering on these features and by imposing a sample-specific minimum length cutoff. We apply this approach to sequences from four ~ 430,000-year-old Sima de los Huesos hominin remains, which contain particularly short DNA fragments, and increase the amount of usable sequence data by 17-150%. This allows us to place a third specimen from the site on the Neandertal lineage. CONCLUSIONS Our method maximizes the sequence data amenable to genetic analysis from highly degraded ancient material and avoids pitfalls that are associated with the analysis of ultra-short DNA sequences.
Collapse
Affiliation(s)
- Cesare de Filippo
- Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Matthias Meyer
- Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Kay Prüfer
- Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| |
Collapse
|
42
|
Selection of Appropriate Metagenome Taxonomic Classifiers for Ancient Microbiome Research. mSystems 2018; 3:mSystems00080-18. [PMID: 30035235 PMCID: PMC6050634 DOI: 10.1128/msystems.00080-18] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Accepted: 06/20/2018] [Indexed: 02/01/2023] Open
Abstract
Ancient biomolecules from oral and gut microbiome samples have been shown to be preserved in the archaeological record. Studying ancient microbiome communities using metagenomic techniques offers a unique opportunity to reconstruct the evolutionary trajectories of microbial communities through time. DNA accumulates specific damage over time, which could potentially affect taxonomic classification and our ability to accurately reconstruct community assemblages. It is therefore necessary to assess whether ancient DNA (aDNA) damage patterns affect metagenomic taxonomic profiling. Here, we assessed biases in community structure, diversity, species detection, and relative abundance estimates by five popular metagenomic taxonomic classification programs using in silico-generated data sets with and without aDNA damage. Damage patterns had minimal impact on the taxonomic profiles produced by each program, while false-positive rates and biases were intrinsic to each program. Therefore, the most appropriate classification program is one that minimizes the biases related to the questions being addressed. Metagenomics enables the study of complex microbial communities from myriad sources, including the remains of oral and gut microbiota preserved in archaeological dental calculus and paleofeces, respectively. While accurate taxonomic assignment is essential to this process, DNA damage characteristic of ancient samples (e.g., reduction in fragment size and cytosine deamination) may reduce the accuracy of read taxonomic assignment. Using a set of in silico-generated metagenomic data sets, we investigated how the addition of ancient DNA (aDNA) damage patterns influences microbial taxonomic assignment by five widely used profilers: QIIME/UCLUST, MetaPhlAn2, MIDAS, CLARK-S, and MALT. In silico-generated data sets were designed to mimic dental plaque, consisting of 40, 100, and 200 microbial species/strains, both with and without simulated aDNA damage patterns. Following taxonomic assignment, the profiles were evaluated for species presence/absence, relative abundance, alpha diversity, beta diversity, and specific taxonomic assignment biases. Unifrac metrics indicated that both MIDAS and MetaPhlAn2 reconstructed the most accurate community structure. QIIME/UCLUST, CLARK-S, and MALT had the highest number of inaccurate taxonomic assignments; false-positive rates were highest by CLARK-S and QIIME/UCLUST. Filtering out species present at <0.1% abundance greatly increased the accuracy of CLARK-S and MALT. All programs except CLARK-S failed to detect some species from the input file that were in their databases. The addition of ancient DNA damage resulted in minimal differences in species detection and relative abundance between simulated ancient and modern data sets for most programs. Overall, taxonomic profiling biases are program specific rather than damage dependent, and the choice of taxonomic classification program should be tailored to specific research questions. IMPORTANCE Ancient biomolecules from oral and gut microbiome samples have been shown to be preserved in the archaeological record. Studying ancient microbiome communities using metagenomic techniques offers a unique opportunity to reconstruct the evolutionary trajectories of microbial communities through time. DNA accumulates specific damage over time, which could potentially affect taxonomic classification and our ability to accurately reconstruct community assemblages. It is therefore necessary to assess whether ancient DNA (aDNA) damage patterns affect metagenomic taxonomic profiling. Here, we assessed biases in community structure, diversity, species detection, and relative abundance estimates by five popular metagenomic taxonomic classification programs using in silico-generated data sets with and without aDNA damage. Damage patterns had minimal impact on the taxonomic profiles produced by each program, while false-positive rates and biases were intrinsic to each program. Therefore, the most appropriate classification program is one that minimizes the biases related to the questions being addressed.
Collapse
|
43
|
Monroy Kuhn JM, Jakobsson M, Günther T. Estimating genetic kin relationships in prehistoric populations. PLoS One 2018; 13:e0195491. [PMID: 29684051 PMCID: PMC5912749 DOI: 10.1371/journal.pone.0195491] [Citation(s) in RCA: 140] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Accepted: 03/23/2018] [Indexed: 12/21/2022] Open
Abstract
Archaeogenomic research has proven to be a valuable tool to trace migrations of historic and prehistoric individuals and groups, whereas relationships within a group or burial site have not been investigated to a large extent. Knowing the genetic kinship of historic and prehistoric individuals would give important insights into social structures of ancient and historic cultures. Most archaeogenetic research concerning kinship has been restricted to uniparental markers, while studies using genome-wide information were mainly focused on comparisons between populations. Applications which infer the degree of relationship based on modern-day DNA information typically require diploid genotype data. Low concentration of endogenous DNA, fragmentation and other post-mortem damage to ancient DNA (aDNA) makes the application of such tools unfeasible for most archaeological samples. To infer family relationships for degraded samples, we developed the software READ (Relationship Estimation from Ancient DNA). We show that our heuristic approach can successfully infer up to second degree relationships with as little as 0.1x shotgun coverage per genome for pairs of individuals. We uncover previously unknown relationships among prehistoric individuals by applying READ to published aDNA data from several human remains excavated from different cultural contexts. In particular, we find a group of five closely related males from the same Corded Ware culture site in modern-day Germany, suggesting patrilocality, which highlights the possibility to uncover social structures of ancient populations by applying READ to genome-wide aDNA data. READ is publicly available from https://bitbucket.org/tguenther/read.
Collapse
Affiliation(s)
- Jose Manuel Monroy Kuhn
- Uppsala University, Evolutionary Biology Centre, Department of Organismal Biology, Norbyvägen 18C, SE-752 36 Uppsala, Sweden
| | - Mattias Jakobsson
- Uppsala University, Evolutionary Biology Centre, Department of Organismal Biology, Norbyvägen 18C, SE-752 36 Uppsala, Sweden
- Uppsala University, SciLifeLab, Norbyvägen 18C, SE-752 36 Uppsala, Sweden
- * E-mail: (MJ); (TG)
| | - Torsten Günther
- Uppsala University, Evolutionary Biology Centre, Department of Organismal Biology, Norbyvägen 18C, SE-752 36 Uppsala, Sweden
- * E-mail: (MJ); (TG)
| |
Collapse
|
44
|
Taron UH, Lell M, Barlow A, Paijmans JLA. Testing of Alignment Parameters for Ancient Samples: Evaluating and Optimizing Mapping Parameters for Ancient Samples Using the TAPAS Tool. Genes (Basel) 2018. [PMID: 29533977 PMCID: PMC5867878 DOI: 10.3390/genes9030157] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
High-throughput sequence data retrieved from ancient or other degraded samples has led to unprecedented insights into the evolutionary history of many species, but the analysis of such sequences also poses specific computational challenges. The most commonly used approach involves mapping sequence reads to a reference genome. However, this process becomes increasingly challenging with an elevated genetic distance between target and reference or with the presence of contaminant sequences with high sequence similarity to the target species. The evaluation and testing of mapping efficiency and stringency are thus paramount for the reliable identification and analysis of ancient sequences. In this paper, we present 'TAPAS', (Testing of Alignment Parameters for Ancient Samples), a computational tool that enables the systematic testing of mapping tools for ancient data by simulating sequence data reflecting the properties of an ancient dataset and performing test runs using the mapping software and parameter settings of interest. We showcase TAPAS by using it to assess and improve mapping strategy for a degraded sample from a banded linsang (Prionodon linsang), for which no closely related reference is currently available. This enables a 1.8-fold increase of the number of mapped reads without sacrificing mapping specificity. The increase of mapped reads effectively reduces the need for additional sequencing, thus making more economical use of time, resources, and sample material.
Collapse
Affiliation(s)
- Ulrike H Taron
- Institute for Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam, Germany.
| | - Moritz Lell
- Institute for Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam, Germany.
| | | | | |
Collapse
|
45
|
Luhmann N, Doerr D, Chauve C. Comparative scaffolding and gap filling of ancient bacterial genomes applied to two ancient Yersinia pestis genomes. Microb Genom 2017; 3:e000123. [PMID: 29114402 PMCID: PMC5643016 DOI: 10.1099/mgen.0.000123] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Accepted: 06/07/2017] [Indexed: 12/12/2022] Open
Abstract
Yersinia pestis is the causative agent of the bubonic plague, a disease responsible for several dramatic historical pandemics. Progress in ancient DNA (aDNA) sequencing rendered possible the sequencing of whole genomes of important human pathogens, including the ancient Y. pestis strains responsible for outbreaks of the bubonic plague in London in the 14th century and in Marseille in the 18th century, among others. However, aDNA sequencing data are still characterized by short reads and non-uniform coverage, so assembling ancient pathogen genomes remains challenging and often prevents a detailed study of genome rearrangements. It has recently been shown that comparative scaffolding approaches can improve the assembly of ancient Y. pestis genomes at a chromosome level. In the present work, we address the last step of genome assembly, the gap-filling stage. We describe an optimization-based method AGapEs (ancestral gap estimation) to fill in inter-contig gaps using a combination of a template obtained from related extant genomes and aDNA reads. We show how this approach can be used to refine comparative scaffolding by selecting contig adjacencies supported by a mix of unassembled aDNA reads and comparative signal. We applied our method to two Y. pestis data sets from the London and Marseilles outbreaks, for which we obtained highly improved genome assemblies for both genomes, comprised of, respectively, five and six scaffolds with 95 % of the assemblies supported by ancient reads. We analysed the genome evolution between both ancient genomes in terms of genome rearrangements, and observed a high level of synteny conservation between these strains.
Collapse
Affiliation(s)
- Nina Luhmann
- 2Genome Informatics, Faculty of Technology and Center for Biotechnology, Bielefeld University, Bielefeld, Germany.,1International Research Training Group "Computational Methods for the Analysis of the Diversity and Dynamics of Genomes", Bielefeld University, Bielefeld, Germany
| | - Daniel Doerr
- 2Genome Informatics, Faculty of Technology and Center for Biotechnology, Bielefeld University, Bielefeld, Germany.,3School of Computer and Communication Sciences, EPFL, 1015 Lausanne, Switzerland
| | - Cedric Chauve
- 4Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada
| |
Collapse
|