1
|
Caldon M, Mutti G, Mondanaro A, Imai H, Shotake T, Oteo Garcia G, Belay G, Morata J, Trotta JR, Montinaro F, Gippoliti S, Capelli C. Gelada genomes highlight events of gene flow, hybridisation and local adaptation that track past climatic changes. Mol Ecol 2024:e17514. [PMID: 39206888 DOI: 10.1111/mec.17514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 06/28/2024] [Accepted: 08/13/2024] [Indexed: 09/04/2024]
Abstract
Theropithecus gelada, the last surviving species of this genus, occupy a unique and highly specialised ecological niche in the Ethiopian highlands. A subdivision into three geographically defined populations (Northern, Central and Southern) has been tentatively proposed for this species on the basis of genetic analyses, but genomic data have been investigated only for two of these groups (Northern and Central). Here we combined newly generated whole genome sequences of individuals sampled from the population living south of the East Africa Great Rift Valley with available data from the other two gelada populations to reconstruct the evolutionary history of the species. Integrating genomic and paleoclimatic data we found that gene-flow across populations and with Papio species tracked past climate changes. The isolation and climatic conditions experienced by Southern geladas during the Holocene shaped local diversity and generated diet-related genomic signatures.
Collapse
Affiliation(s)
- Matteo Caldon
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma, Italy
| | - Giacomo Mutti
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma, Italy
- Barcelona Supercomputing Centre (BSC-CNS), Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), the Barcelona Institute of Science and Technology, Barcelona, Spain
| | | | - Hiroo Imai
- Center for the Evolutionary Origins of Human Behavior, Kyoto University, Inuyama, Aichi, Japan
| | | | - Gonzalo Oteo Garcia
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma, Italy
- Centre for Palaeogenetics, Stockholm, Sweden
- Department of Archaeology and Classical Studies, Stockholm University, Stockholm, Sweden
| | - Gurja Belay
- Department of Microbial, Cellular and Molecular Biology, Addis Ababa University, Addis Ababa, Ethiopia
| | - Jordi Morata
- Centre Nacional d'Anàlisi Genòmica, Barcelona, Spain
| | | | - Francesco Montinaro
- Department of Biology-Genetics, University of Bari, Bari, Italy
- Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Spartaco Gippoliti
- IUCN/SSC Primate Specialist Group, Rome, Italy
- Società Italiana per la Storia Della Fauna "G. Altobello", Rome, Italy
| | - Cristian Capelli
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma, Italy
- Department of Biology, University of Oxford, Oxford, UK
| |
Collapse
|
2
|
Nguyen AK, Schall PZ, Kidd JM. A map of canine sequence variation relative to a Greenland wolf outgroup. Mamm Genome 2024:10.1007/s00335-024-10056-1. [PMID: 39088040 DOI: 10.1007/s00335-024-10056-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Accepted: 07/25/2024] [Indexed: 08/02/2024]
Abstract
For over 15 years, canine genetics research relied on a reference assembly from a Boxer breed dog named Tasha (i.e., canFam3.1). Recent advances in long-read sequencing and genome assembly have led to the development of numerous high-quality assemblies from diverse canines. These assemblies represent notable improvements in completeness, contiguity, and the representation of gene promoters and gene models. Although genome graph and pan-genome approaches have promise, most genetic analyses in canines rely upon the mapping of Illumina sequencing reads to a single reference. The Dog10K consortium, and others, have generated deep catalogs of genetic variation through an alignment of Illumina sequencing reads to a reference genome obtained from a German Shepherd Dog named Mischka (i.e., canFam4, UU_Cfam_GSD_1.0). However, alignment to a breed-derived genome may introduce bias in genotype calling across samples. Since the use of an outgroup reference genome may remove this effect, we have reprocessed 1929 samples analyzed by the Dog10K consortium using a Greenland wolf (mCanLor1.2) as the reference. We efficiently performed remapping and variant calling using a GPU-implementation of common analysis tools. The resulting call set removes the variability in genetic differences seen across samples and breed relationships revealed by principal component analysis are not affected by the choice of reference genome. Using this sequence data, we inferred the history of population sizes and found that village dog populations experienced a 9-13 fold reduction in historic effective population size relative to wolves.
Collapse
Affiliation(s)
- Anthony K Nguyen
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Peter Z Schall
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Jeffrey M Kidd
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
3
|
Bergström A. Improving data archiving practices in ancient genomics. Sci Data 2024; 11:754. [PMID: 38987254 PMCID: PMC11236975 DOI: 10.1038/s41597-024-03563-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 06/21/2024] [Indexed: 07/12/2024] Open
Abstract
Ancient DNA is producing a rich record of past genetic diversity in humans and other species. However, unless the primary data is appropriately archived, its long-term value will not be fully realised. I surveyed publicly archived data from 42 recent ancient genomics studies. Half of the studies archived incomplete datasets, preventing accurate replication and representing a loss of data of potential future use. No studies met all criteria that could be considered best practice. Based on these results, I make six recommendations for data producers: (1) archive all sequencing reads, not just those that aligned to a reference genome, (2) archive read alignments too, but as secondary analysis files, (3) provide correct experiment metadata on samples, libraries and sequencing runs, (4) provide informative sample metadata, (5) archive data from low-coverage and negative experiments, and (6) document archiving choices in papers, and peer review these. Given the reliance on destructive sampling of finite material, ancient genomics studies have a particularly strong responsibility to ensure the longevity and reusability of generated data.
Collapse
Affiliation(s)
- Anders Bergström
- School of Biological Sciences, University of East Anglia, Norwich, UK.
| |
Collapse
|
4
|
Bougiouri K, Aninta SG, Charlton S, Harris A, Carmagnini A, Piličiauskienė G, Feuerborn TR, Scarsbrook L, Tabadda K, Blaževičius P, Parker HG, Gopalakrishnan S, Larson G, Ostrander EA, Irving-Pease EK, Frantz LA, Racimo F. Imputation of ancient canid genomes reveals inbreeding history over the past 10,000 years. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.15.585179. [PMID: 38903121 PMCID: PMC11188068 DOI: 10.1101/2024.03.15.585179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/22/2024]
Abstract
The multi-millenia long history between dogs and humans has placed them at the forefront of archeological and genomic research. Despite ongoing efforts including the analysis of ancient dog and wolf genomes, many questions remain regarding their geographic and temporal origins, and the microevolutionary processes that led to the diversity of breeds today. Although ancient genomes provide valuable information, their use is hindered by low depth of coverage and post-mortem damage, which inhibits confident genotype calling. In the present study, we assess how genotype imputation of ancient dog and wolf genomes, utilising a large reference panel, can improve the resolution provided by ancient datasets. Imputation accuracy was evaluated by down-sampling high coverage dog and wolf genomes to 0.05-2x coverage and comparing concordance between imputed and high coverage genotypes. We measured the impact of imputation on principal component analyses and runs of homozygosity. Our findings show high (R2>0.9) imputation accuracy for dogs with coverage as low as 0.5x and for wolves as low as 1.0x. We then imputed a dataset of 90 ancient dog and wolf genomes, to assess changes in inbreeding during the last 10,000 years of dog evolution. Ancient dog and wolf populations generally exhibited lower inbreeding levels than present-day individuals. Interestingly, regions with low ROH density maintained across ancient and present-day samples were significantly associated with genes related to olfaction and immune response. Our study indicates that imputing ancient canine genomes is a viable strategy that allows for the use of analytical methods previously limited to high-quality genetic data.
Collapse
Affiliation(s)
- Katia Bougiouri
- Section for Molecular Ecology and Evolution, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Sabhrina Gita Aninta
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Sophy Charlton
- BioArCh, Department of Archaeology, University of York, York, UK
| | - Alex Harris
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Alberto Carmagnini
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK
- Palaeogenomics Group, Department of Veterinary Sciences, Ludwig Maximilian University, Munich, Germany
| | - Giedrė Piličiauskienė
- Department of Archeology, Faculty of History, Vilnius University, Vilnius, Lithuania
| | - Tatiana R. Feuerborn
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Lachie Scarsbrook
- The Palaeogenomics and Bio-archaeology Research Network, Research Laboratory for Archaeology and History of Art, University of Oxford, Oxford, UK
| | - Kristina Tabadda
- The Palaeogenomics and Bio-archaeology Research Network, Research Laboratory for Archaeology and History of Art, University of Oxford, Oxford, UK
| | - Povilas Blaževičius
- Department of Archeology, Faculty of History, Vilnius University, Vilnius, Lithuania
- National Museum of Lithuania, Vilnius, Lithuania
| | - Heidi G. Parker
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Shyam Gopalakrishnan
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Greger Larson
- The Palaeogenomics and Bio-archaeology Research Network, Research Laboratory for Archaeology and History of Art, University of Oxford, Oxford, UK
| | - Elaine A. Ostrander
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Evan K. Irving-Pease
- Section for Molecular Ecology and Evolution, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Laurent A.F. Frantz
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK
- Palaeogenomics Group, Department of Veterinary Sciences, Ludwig Maximilian University, Munich, Germany
| | - Fernando Racimo
- Section for Molecular Ecology and Evolution, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
5
|
Özkan M, Gürün K, Yüncü E, Vural KB, Atağ G, Akbaba A, Fidan FR, Sağlıcan E, Altınışık EN, Koptekin D, Pawłowska K, Hodder I, Adcock SE, Arbuckle BS, Steadman SR, McMahon G, Erdal YS, Bilgin CC, Togan İ, Geigl EM, Götherström A, Grange T, Özer F, Somel M. The first complete genome of the extinct European wild ass (Equus hemionus hydruntinus). Mol Ecol 2024; 33:e17440. [PMID: 38946459 DOI: 10.1111/mec.17440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 05/17/2024] [Accepted: 06/14/2024] [Indexed: 07/02/2024]
Abstract
We present palaeogenomes of three morphologically unidentified Anatolian equids dating to the first millennium BCE, sequenced to a coverage of 0.6-6.4×. Mitochondrial DNA haplotypes of the Anatolian individuals clustered with those of Equus hydruntinus (or Equus hemionus hydruntinus), the extinct European wild ass, secular name 'hydruntine'. Further, the Anatolian wild ass whole genome profiles fell outside the genomic diversity of other extant and past Asiatic wild ass (E. hemionus) lineages. These observations suggest that the three Anatolian wild asses represent hydruntines, making them the latest recorded survivors of this lineage, about a millennium later than the latest observations in the zooarchaeological record. Our mitogenomic and genomic analyses indicate that E. h. hydruntinus was a clade belonging to ancient and present-day E. hemionus lineages that radiated possibly between 0.6 and 0.8 Mya. We also find evidence consistent with recent gene flow between hydruntines and Middle Eastern wild asses. Analyses of genome-wide heterozygosity and runs of homozygosity suggest that the Anatolian wild ass population may have lost genetic diversity by the mid-first millennium BCE, a possible sign of its eventual demise.
Collapse
Affiliation(s)
- Mustafa Özkan
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Kanat Gürün
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Eren Yüncü
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Kıvılcım Başak Vural
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Gözde Atağ
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Ali Akbaba
- Department of Anthropology, Ankara University, Ankara, Turkey
- Alparslan University, Muş, Turkey
| | - Fatma Rabia Fidan
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
- Cancer Dynamics Laboratory, The Francis Crick Institute, London, UK
| | - Ekin Sağlıcan
- Department of Health Informatics, Middle East Technical University, Ankara, Turkey
| | - Ezgi N Altınışık
- Department of Anthropology, Human_G Laboratory, Hacettepe University, Ankara, Turkey
| | - Dilek Koptekin
- Department of Health Informatics, Middle East Technical University, Ankara, Turkey
| | - Kamilla Pawłowska
- Department of Palaeoenvironmental Research, Adam Mickiewicz University, Poznań, Poland
| | - Ian Hodder
- Department of Anthropology, Stanford University, Stanford, California, USA
| | - Sarah E Adcock
- Institute for the Study of the Ancient World, New York University, New York, New York, USA
| | - Benjamin S Arbuckle
- Department of Anthropology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Sharon R Steadman
- Department of Sociology/Anthropology, SUNY Cortland, Cortland, New York, USA
| | - Gregory McMahon
- Classics, Humanities and Italian Studies Department, University of New Hampshire, Durham, New Hampshire, USA
| | - Yılmaz Selim Erdal
- Department of Anthropology, Human_G Laboratory, Hacettepe University, Ankara, Turkey
| | - C Can Bilgin
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - İnci Togan
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Eva-Maria Geigl
- Institut Jacques Monod, CNRS, Université de Paris, Paris, France
| | - Anders Götherström
- Department of Archaeology and Classical Studies, Stockholm University, Stockholm, Sweden
| | - Thierry Grange
- Institut Jacques Monod, CNRS, Université de Paris, Paris, France
| | - Füsun Özer
- Department of Health Informatics, Middle East Technical University, Ankara, Turkey
| | - Mehmet Somel
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| |
Collapse
|
6
|
Dolenz S, van der Valk T, Jin C, Oppenheimer J, Sharif MB, Orlando L, Shapiro B, Dalén L, Heintzman PD. Unravelling reference bias in ancient DNA datasets. Bioinformatics 2024; 40:btae436. [PMID: 38960861 PMCID: PMC11254355 DOI: 10.1093/bioinformatics/btae436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 03/22/2024] [Accepted: 07/02/2024] [Indexed: 07/05/2024] Open
Abstract
MOTIVATION The alignment of sequencing reads is a critical step in the characterization of ancient genomes. However, reference bias and spurious mappings pose a significant challenge, particularly as cutting-edge wet lab methods generate datasets that push the boundaries of alignment tools. Reference bias occurs when reference alleles are favoured over alternative alleles during mapping, whereas spurious mappings stem from either contamination or when endogenous reads fail to align to their correct position. Previous work has shown that these phenomena are correlated with read length but a more thorough investigation of reference bias and spurious mappings for ancient DNA has been lacking. Here, we use a range of empirical and simulated palaeogenomic datasets to investigate the impacts of mapping tools, quality thresholds, and reference genome on mismatch rates across read lengths. RESULTS For these analyses, we introduce AMBER, a new bioinformatics tool for assessing the quality of ancient DNA mapping directly from BAM-files and informing on reference bias, read length cut-offs and reference selection. AMBER rapidly and simultaneously computes the sequence read mapping bias in the form of the mismatch rates per read length, cytosine deamination profiles at both CpG and non-CpG sites, fragment length distributions, and genomic breadth and depth of coverage. Using AMBER, we find that mapping algorithms and quality threshold choices dictate reference bias and rates of spurious alignment at different read lengths in a predictable manner, suggesting that optimized mapping parameters for each read length will be a key step in alleviating reference bias and spurious mappings. AVAILABILITY AND IMPLEMENTATION AMBER is available for noncommercial use on GitHub (https://github.com/tvandervalk/AMBER.git). Scripts used to generate and analyse simulated datasets are available on Github (https://github.com/sdolenz/refbias_scripts).
Collapse
Affiliation(s)
- Stephanie Dolenz
- Centre for Palaeogenetics, Svante Arrhenius väg 20C, Stockholm, SE-106 91, Sweden
- Department of Geological Sciences, Stockholm University, Stockholm, SE-106 91, Sweden
| | - Tom van der Valk
- Centre for Palaeogenetics, Svante Arrhenius väg 20C, Stockholm, SE-106 91, Sweden
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, SE-114 18, Sweden
- Science for Life Laboratory, Stockholm, SE-171 65, Sweden
| | - Chenyu Jin
- Centre for Palaeogenetics, Svante Arrhenius väg 20C, Stockholm, SE-106 91, Sweden
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, SE-114 18, Sweden
- Department of Zoology, Stockholm University, Stockholm, SE-106 91, Sweden
| | - Jonas Oppenheimer
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, 95064, United States
| | - Muhammad Bilal Sharif
- Centre for Palaeogenetics, Svante Arrhenius väg 20C, Stockholm, SE-106 91, Sweden
- Department of Zoology, Stockholm University, Stockholm, SE-106 91, Sweden
| | - Ludovic Orlando
- Centre for Anthropobiology and Genomics of Toulouse (CAGT, CNRS UMR5288), University Paul Sabatier, Faculté de Santé, Toulouse, 31000, France
| | - Beth Shapiro
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, 95064, United States
- Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA, 95064, United States
| | - Love Dalén
- Centre for Palaeogenetics, Svante Arrhenius väg 20C, Stockholm, SE-106 91, Sweden
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, SE-114 18, Sweden
- Department of Zoology, Stockholm University, Stockholm, SE-106 91, Sweden
| | - Peter D Heintzman
- Centre for Palaeogenetics, Svante Arrhenius väg 20C, Stockholm, SE-106 91, Sweden
- Department of Geological Sciences, Stockholm University, Stockholm, SE-106 91, Sweden
| |
Collapse
|
7
|
Phillips AR. Variant calling in polyploids for population and quantitative genetics. APPLICATIONS IN PLANT SCIENCES 2024; 12:e11607. [PMID: 39184203 PMCID: PMC11342233 DOI: 10.1002/aps3.11607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 03/03/2024] [Accepted: 04/10/2024] [Indexed: 08/27/2024]
Abstract
Advancements in genome assembly and sequencing technology have made whole genome sequence (WGS) data and reference genomes accessible to study polyploid species. Compared to popular reduced-representation sequencing approaches, the genome-wide coverage and greater marker density provided by WGS data can greatly improve our understanding of polyploid species and polyploid biology. However, biological features that make polyploid species interesting also pose challenges in read mapping, variant identification, and genotype estimation. Accounting for characteristics in variant calling like allelic dosage uncertainty, homology between subgenomes, and variance in chromosome inheritance mode can reduce errors. Here, I discuss the challenges of variant calling in polyploid WGS data and discuss where potential solutions can be integrated into a standard variant calling pipeline.
Collapse
Affiliation(s)
- Alyssa R. Phillips
- Department of Evolution and EcologyUniversity of California, DavisDavis95616CaliforniaUSA
| |
Collapse
|
8
|
Cooke NP, Murray M, Cassidy LM, Mattiangeli V, Okazaki K, Kasai K, Gakuhari T, Bradley DG, Nakagome S. Genomic imputation of ancient Asian populations contrasts local adaptation in pre- and post-agricultural Japan. iScience 2024; 27:110050. [PMID: 38883821 PMCID: PMC11176660 DOI: 10.1016/j.isci.2024.110050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 03/25/2024] [Accepted: 05/17/2024] [Indexed: 06/18/2024] Open
Abstract
Early modern humans lived as hunter-gatherers for millennia before agriculture, yet the genetic adaptations of these populations remain a mystery. Here, we investigate selection in the ancient hunter-gatherer-fisher Jomon and contrast pre- and post-agricultural adaptation in the Japanese archipelago. Building on the successful validation of imputation with ancient Asian genomes, we identify selection signatures in the Jomon, particularly robust signals from KITLG variants, which may have influenced dark pigmentation evolution. The Jomon lacks well-known adaptive variants (EDAR, ADH1B, and ALDH2), marking their emergence after the advent of farming in the archipelago. Notably, the EDAR and ADH1B variants were prevalent in the archipelago 1,300 years ago, whereas the ALDH2 variant could have emerged later due to its absence in other ancient genomes. Overall, our study underpins local adaptation unique to the Jomon population, which in turn sheds light on post-farming selection that continues to shape contemporary Asian populations.
Collapse
Affiliation(s)
- Niall P Cooke
- School of Medicine, Trinity College Dublin, Dublin, Ireland
| | | | - Lara M Cassidy
- Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland
| | | | - Kenji Okazaki
- Department of Anatomy, Faculty of Medicine, Tottori University, Yonago, Japan
| | - Kenji Kasai
- Toyama Prefectural Center for Archaeological Operations, Toyama, Japan
| | - Takashi Gakuhari
- Institute for the Study of Ancient Civilizations and Cultural Resources, Kanazawa University, Kanazawa, Japan
| | - Daniel G Bradley
- Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland
| | - Shigeki Nakagome
- School of Medicine, Trinity College Dublin, Dublin, Ireland
- Institute for the Study of Ancient Civilizations and Cultural Resources, Kanazawa University, Kanazawa, Japan
| |
Collapse
|
9
|
Hemstrom W, Grummer JA, Luikart G, Christie MR. Next-generation data filtering in the genomics era. Nat Rev Genet 2024:10.1038/s41576-024-00738-6. [PMID: 38877133 DOI: 10.1038/s41576-024-00738-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2024] [Indexed: 06/16/2024]
Abstract
Genomic data are ubiquitous across disciplines, from agriculture to biodiversity, ecology, evolution and human health. However, these datasets often contain noise or errors and are missing information that can affect the accuracy and reliability of subsequent computational analyses and conclusions. A key step in genomic data analysis is filtering - removing sequencing bases, reads, genetic variants and/or individuals from a dataset - to improve data quality for downstream analyses. Researchers are confronted with a multitude of choices when filtering genomic data; they must choose which filters to apply and select appropriate thresholds. To help usher in the next generation of genomic data filtering, we review and suggest best practices to improve the implementation, reproducibility and reporting standards for filter types and thresholds commonly applied to genomic datasets. We focus mainly on filters for minor allele frequency, missing data per individual or per locus, linkage disequilibrium and Hardy-Weinberg deviations. Using simulated and empirical datasets, we illustrate the large effects of different filtering thresholds on common population genetics statistics, such as Tajima's D value, population differentiation (FST), nucleotide diversity (π) and effective population size (Ne).
Collapse
Affiliation(s)
- William Hemstrom
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
| | - Jared A Grummer
- Flathead Lake Biological Station, Wildlife Biology Program and Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Gordon Luikart
- Flathead Lake Biological Station, Wildlife Biology Program and Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Mark R Christie
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
- Department of Forestry and Natural Resources, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
10
|
Pan C, Reinert K. Leaf: an ultrafast filter for population-scale long-read SV detection. Genome Biol 2024; 25:155. [PMID: 38872200 PMCID: PMC11170821 DOI: 10.1186/s13059-024-03297-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 06/04/2024] [Indexed: 06/15/2024] Open
Abstract
Advances in sequencing technology have facilitated population-scale long-read structural variant (SV) detection. Arguably, one of the main challenges in population-scale analysis is developing effective computational pipelines. Here, we present a new filter-based pipeline for population-scale long-read SV detection. It better captures SV signals at an early stage than conventional assembly-based or alignment-based pipelines. Assessments in this work suggest that the filter-based pipeline helps better resolve intra-read rearrangements. Moreover, it is also more computationally efficient than conventional pipelines and thus may facilitate population-scale long-read applications.
Collapse
Affiliation(s)
- Chenxu Pan
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustr. 9, 14195, Berlin, Germany.
| | - Knut Reinert
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustr. 9, 14195, Berlin, Germany
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, 14195, Germany
| |
Collapse
|
11
|
Smith JL, Tcheandjieu C, Dikilitas O, Iyer K, Miyazawa K, Hilliard A, Lynch J, Rotter JI, Chen YDI, Sheu WHH, Chang KM, Kanoni S, Tsao PS, Ito K, Kosel M, Clarke SL, Schaid DJ, Assimes TL, Kullo IJ. Multi-Ancestry Polygenic Risk Score for Coronary Heart Disease Based on an Ancestrally Diverse Genome-Wide Association Study and Population-Specific Optimization. CIRCULATION. GENOMIC AND PRECISION MEDICINE 2024; 17:e004272. [PMID: 38380516 PMCID: PMC11372723 DOI: 10.1161/circgen.123.004272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 01/23/2024] [Indexed: 02/22/2024]
Abstract
BACKGROUND Predictive performance of polygenic risk scores (PRS) varies across populations. To facilitate equitable clinical use, we developed PRS for coronary heart disease (CHD; PRSCHD) for 5 genetic ancestry groups. METHODS We derived ancestry-specific and multi-ancestry PRSCHD based on pruning and thresholding (PRSPT) and ancestry-based continuous shrinkage priors (PRSCSx) applied to summary statistics from the largest multi-ancestry genome-wide association study meta-analysis for CHD to date, including 1.1 million participants from 5 major genetic ancestry groups. Following training and optimization in the Million Veteran Program, we evaluated the best-performing PRSCHD in 176,988 individuals across 9 diverse cohorts. RESULTS Multi-ancestry PRSPT and PRSCSx outperformed ancestry-specific PRSPT and PRSCSx across a range of tuning values. Two best-performing multi-ancestry PRSCHD (ie, PRSPTmult and PRSCSxmult) and 1 ancestry-specific (PRSCSxEUR) were taken forward for validation. PRSPTmult demonstrated the strongest association with CHD in individuals of South Asian ancestry and European ancestry (odds ratio per 1 SD [95% CI, 2.75 [2.41-3.14], 1.65 [1.59-1.72]), followed by East Asian ancestry (1.56 [1.50-1.61]), Hispanic/Latino ancestry (1.38 [1.24-1.54]), and African ancestry (1.16 [1.11-1.21]). PRSCSxmult showed the strongest associations in South Asian ancestry (2.67 [2.38-3.00]) and European ancestry (1.65 [1.59-1.71]), lower in East Asian ancestry (1.59 [1.54-1.64]), Hispanic/Latino ancestry (1.51 [1.35-1.69]), and the lowest in African ancestry (1.20 [1.15-1.26]). CONCLUSIONS The use of summary statistics from a large multi-ancestry genome-wide meta-analysis improved the performance of PRSCHD in most ancestry groups compared with single-ancestry methods. Despite the use of one of the largest and most diverse sets of training and validation cohorts to date, improvement of predictive performance was limited in African ancestry. This highlights the need for larger genome-wide association study datasets of underrepresented populations to enhance the performance of PRSCHD.
Collapse
Affiliation(s)
- Johanna L Smith
- Department of Cardiovascular Medicine (J.L.S., O.D., I.J.K.), Mayo Clinic, Rochester, MN
| | - Catherine Tcheandjieu
- Department of Epidemiology and Biostatistics, University of California San Francisco (C.T.)
- Gladstone Institute of Data Science and Biotechnology, Gladstone Institute, San Francisco, CA (C.T.)
- VA Palo Alto Health Care System (C.T., A.H., P.S.T., S.L.C.)
| | - Ozan Dikilitas
- Department of Cardiovascular Medicine (J.L.S., O.D., I.J.K.), Mayo Clinic, Rochester, MN
| | - Kruthika Iyer
- Stanford University School of Medicine, Palo Alto, CA (K. Iyer, A.H.)
| | - Kazuo Miyazawa
- Riken Center for Integrative Medical Sciences, Yokohama City, Japan (K.M., K. Ito)
| | - Austin Hilliard
- VA Palo Alto Health Care System (C.T., A.H., P.S.T., S.L.C.)
- Stanford University School of Medicine, Palo Alto, CA (K. Iyer, A.H.)
| | | | - Jerome I Rotter
- Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Lundquist Institute at Harbor-UCLA Medical Center, Torrance, CA (J.I.R., Y.-D.I.C.)
| | - Yii-Der Ida Chen
- Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Lundquist Institute at Harbor-UCLA Medical Center, Torrance, CA (J.I.R., Y.-D.I.C.)
| | - Wayne Huey-Herng Sheu
- Institute of Molecular and Genomic Medicine, National Health Research Institute (W.H.-H.S.)
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Taipei Veterans General Hospital (W.H.-H.S.)
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Taichung Veterans General Hospital, Taiwan (W.H.-H.S.)
| | - Kyong-Mi Chang
- Corporal Michael J Crescenz VA Medical Center, Philadelphia, PA (K.-M.C.)
| | - Stavroula Kanoni
- Queen Mary University of London, Cambridge, United Kingdom (S.K.)
| | - Philip S Tsao
- VA Palo Alto Health Care System (C.T., A.H., P.S.T., S.L.C.)
- Stanford University, Stanford, CA (P.S.T., S.L.C., T.L.A.)
| | - Kaoru Ito
- Riken Center for Integrative Medical Sciences, Yokohama City, Japan (K.M., K. Ito)
| | - Matthew Kosel
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN (M.K., D.J.S.)
| | - Shoa L Clarke
- VA Palo Alto Health Care System (C.T., A.H., P.S.T., S.L.C.)
- Stanford University, Stanford, CA (P.S.T., S.L.C., T.L.A.)
| | - Daniel J Schaid
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN (M.K., D.J.S.)
| | | | - Iftikhar J Kullo
- Department of Cardiovascular Medicine (J.L.S., O.D., I.J.K.), Mayo Clinic, Rochester, MN
| |
Collapse
|
12
|
Rick JA, Brock CD, Lewanski AL, Golcher-Benavides J, Wagner CE. Reference Genome Choice and Filtering Thresholds Jointly Influence Phylogenomic Analyses. Syst Biol 2024; 73:76-101. [PMID: 37881861 DOI: 10.1093/sysbio/syad065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 09/20/2023] [Accepted: 10/20/2023] [Indexed: 10/27/2023] Open
Abstract
Molecular phylogenies are a cornerstone of modern comparative biology and are commonly employed to investigate a range of biological phenomena, such as diversification rates, patterns in trait evolution, biogeography, and community assembly. Recent work has demonstrated that significant biases may be introduced into downstream phylogenetic analyses from processing genomic data; however, it remains unclear whether there are interactions among bioinformatic parameters or biases introduced through the choice of reference genome for sequence alignment and variant calling. We address these knowledge gaps by employing a combination of simulated and empirical data sets to investigate the extent to which the choice of reference genome in upstream bioinformatic processing of genomic data influences phylogenetic inference, as well as the way that reference genome choice interacts with bioinformatic filtering choices and phylogenetic inference method. We demonstrate that more stringent minor allele filters bias inferred trees away from the true species tree topology, and that these biased trees tend to be more imbalanced and have a higher center of gravity than the true trees. We find the greatest topological accuracy when filtering sites for minor allele count (MAC) >3-4 in our 51-taxa data sets, while tree center of gravity was closest to the true value when filtering for sites with MAC >1-2. In contrast, filtering for missing data increased accuracy in the inferred topologies; however, this effect was small in comparison to the effect of minor allele filters and may be undesirable due to a subsequent mutation spectrum distortion. The bias introduced by these filters differs based on the reference genome used in short read alignment, providing further support that choosing a reference genome for alignment is an important bioinformatic decision with implications for downstream analyses. These results demonstrate that attributes of the study system and dataset (and their interaction) add important nuance for how best to assemble and filter short-read genomic data for phylogenetic inference.
Collapse
Affiliation(s)
- Jessica A Rick
- School of Natural Resources & the Environment, University of Arizona, Tucson, AZ 85719, USA
| | - Chad D Brock
- Department of Biological Sciences, Tarleton State University, Stephenville, TX 76401, USA
| | - Alexander L Lewanski
- Department of Integrative Biology and W.K. Kellogg Biological Station, Michigan State University, East Lansing, MI 48824, USA
| | - Jimena Golcher-Benavides
- Department of Natural Resource Ecology and Management, Iowa State University, Ames, IA 50011, USA
| | - Catherine E Wagner
- Program in Ecology and Evolution, University of Wyoming, Laramie, WY 82071, USA
- Department of Botany, University of Wyoming, Laramie, WY 82071, USA
| |
Collapse
|
13
|
Vaddadi K, Mun T, Langmead B. Minimizing Reference Bias with an Impute-First Approach. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.30.568362. [PMID: 38076784 PMCID: PMC10705441 DOI: 10.1101/2023.11.30.568362] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Pangenome indexes reduce reference bias in sequencing data analysis. However, bias can be reduced further by using a personalized reference, e.g. a diploid human reference constructed to match a donor individual's alleles. We present a novel impute-first alignment framework that combines elements of genotype imputation and pangenome alignment. It begins by genotyping the individual using only a subsample of the input reads. It next uses a reference panel and efficient imputation algorithm to impute a personalized diploid reference. Finally, it indexes the personalized reference and applies a read aligner, which could be a linear or graph aligner, to align the full read set to the personalized reference. This framework achieves higher variant-calling recall (99.54% vs. 99.37%), precision (99.36% vs. 99.18%), and F1 (99.45% vs. 99.28%) compared to a graph pangenome aligner. The personalized reference is also smaller and faster to query compared to a pangenome index, making it an overall advantageous choice for whole-genome DNA sequencing experiments.
Collapse
Affiliation(s)
- Kavya Vaddadi
- Department of Computer Science, Johns Hopkins University
| | - Taher Mun
- Department of Computer Science, Johns Hopkins University
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University
| |
Collapse
|
14
|
Hempel E, Faith JT, Preick M, de Jager D, Barish S, Hartmann S, Grau JH, Moodley Y, Gedman G, Pirovich KM, Bibi F, Kalthoff DC, Bocklandt S, Lamm B, Dalén L, Westbury MV, Hofreiter M. Colonial-driven extinction of the blue antelope despite genomic adaptation to low population size. Curr Biol 2024; 34:2020-2029.e6. [PMID: 38614080 DOI: 10.1016/j.cub.2024.03.051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 02/09/2024] [Accepted: 03/25/2024] [Indexed: 04/15/2024]
Abstract
Low genomic diversity is generally indicative of small population size and is considered detrimental by decreasing long-term adaptability.1,2,3,4,5,6 Moreover, small population size may promote gene flow with congeners and outbreeding depression.7,8,9,10,11,12,13 Here, we examine the connection between habitat availability, effective population size (Ne), and extinction by generating a 40× nuclear genome from the extinct blue antelope (Hippotragus leucophaeus). Historically endemic to the relatively small Cape Floristic Region in southernmost Africa,14,15 populations were thought to have expanded and contracted across glacial-interglacial cycles, tracking suitable habitat.16,17,18 However, we found long-term low Ne, unaffected by glacial cycles, suggesting persistence with low genomic diversity for many millennia prior to extinction in ∼AD 1800. A lack of inbreeding, alongside high levels of genetic purging, suggests adaptation to this long-term low Ne and that human impacts during the colonial era (e.g., hunting and landscape transformation), rather than longer-term ecological processes, were central to its extinction. Phylogenomic analyses uncovered gene flow between roan (H. equinus) and blue antelope, as well as between roan and sable antelope (H. niger), approximately at the time of divergence of blue and sable antelope (∼1.9 Ma). Finally, we identified the LYST and ASIP genes as candidates for the eponymous bluish pelt color of the blue antelope. Our results revise numerous aspects of our understanding of the interplay between genomic diversity and evolutionary history and provide the resources for uncovering the genetic basis of this extinct species' unique traits.
Collapse
Affiliation(s)
- Elisabeth Hempel
- Evolutionary Adaptive Genomics, Institute of Biochemistry and Biology, Faculty of Science, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam, Germany; Museum für Naturkunde, Leibniz Institute for Evolution and Biodiversity Science, Invalidenstraße 43, 10115 Berlin, Germany.
| | - J Tyler Faith
- Natural History Museum of Utah, University of Utah, 301 Wakara Way, Salt Lake City, UT 84108, USA; Department of Anthropology, University of Utah, 260 South Central Campus Drive, Salt Lake City, UT 84112, USA; Origins Centre, University of the Witwatersrand, 2000 Johannesburg, Republic of South Africa
| | - Michaela Preick
- Evolutionary Adaptive Genomics, Institute of Biochemistry and Biology, Faculty of Science, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam, Germany
| | - Deon de Jager
- Globe Institute, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | | | - Stefanie Hartmann
- Evolutionary Adaptive Genomics, Institute of Biochemistry and Biology, Faculty of Science, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam, Germany
| | - José H Grau
- Center for Species Survival, Smithsonian Conservation Biology Institute, Washington, DC 20008, USA; Amedes Genetics, Amedes Medizinische Dienstleistungen GmbH, 10117 Berlin, Germany
| | - Yoshan Moodley
- Department of Biological Sciences, University of Venda, Private Bag X5050, Thohoyandou 0950, Republic of South Africa
| | | | | | - Faysal Bibi
- Museum für Naturkunde, Leibniz Institute for Evolution and Biodiversity Science, Invalidenstraße 43, 10115 Berlin, Germany
| | - Daniela C Kalthoff
- Swedish Museum of Natural History, Department of Zoology, Box 50007, 10405 Stockholm, Sweden
| | | | - Ben Lamm
- Colossal Biosciences, Dallas, TX 75247, USA
| | - Love Dalén
- Swedish Museum of Natural History, Department of Bioinformatics and Genetics, Box 50007, 10405 Stockholm, Sweden; Centre for Palaeogenetics, Svante Arrhenius väg 20c, 10691 Stockholm, Sweden; Department of Zoology, Stockholm University, 10691 Stockholm, Sweden.
| | - Michael V Westbury
- Globe Institute, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark.
| | - Michael Hofreiter
- Evolutionary Adaptive Genomics, Institute of Biochemistry and Biology, Faculty of Science, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam, Germany.
| |
Collapse
|
15
|
Lorig-Roach R, Meredith M, Monlong J, Jain M, Olsen HE, McNulty B, Porubsky D, Montague TG, Lucas JK, Condon C, Eizenga JM, Juul S, McKenzie SK, Simmonds SE, Park J, Asri M, Koren S, Eichler EE, Axel R, Martin B, Carnevali P, Miga KH, Paten B. Phased nanopore assembly with Shasta and modular graph phasing with GFAse. Genome Res 2024; 34:454-468. [PMID: 38627094 PMCID: PMC11067879 DOI: 10.1101/gr.278268.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 03/19/2024] [Indexed: 04/30/2024]
Abstract
Reference-free genome phasing is vital for understanding allele inheritance and the impact of single-molecule DNA variation on phenotypes. To achieve thorough phasing across homozygous or repetitive regions of the genome, long-read sequencing technologies are often used to perform phased de novo assembly. As a step toward reducing the cost and complexity of this type of analysis, we describe new methods for accurately phasing Oxford Nanopore Technologies (ONT) sequence data with the Shasta genome assembler and a modular tool for extending phasing to the chromosome scale called GFAse. We test using new variants of ONT PromethION sequencing, including those using proximity ligation, and show that newer, higher accuracy ONT reads substantially improve assembly quality.
Collapse
Affiliation(s)
- Ryan Lorig-Roach
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, California 95060, USA;
| | - Melissa Meredith
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, California 95060, USA
| | - Jean Monlong
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, California 95060, USA
| | - Miten Jain
- Department of Bioengineering, Department of Physics, Northeastern University, Boston, Massachusetts 02120, USA
| | - Hugh E Olsen
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, California 95060, USA
| | - Brandy McNulty
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, California 95060, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Tessa G Montague
- The Mortimer B. Zuckerman Mind Brain Behavior Institute, Department of Neuroscience, Columbia University, New York, New York 10027, USA
- Howard Hughes Medical Institute, Columbia University, New York, New York 10032, USA
| | - Julian K Lucas
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, California 95060, USA
| | - Chris Condon
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, California 95060, USA
| | - Jordan M Eizenga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, California 95060, USA
| | - Sissel Juul
- Oxford Nanopore Technologies Incorporated, New York, New York 10013, USA
| | - Sean K McKenzie
- Oxford Nanopore Technologies Incorporated, New York, New York 10013, USA
| | - Sara E Simmonds
- Chan Zuckerberg Initiative Foundation, Redwood City, California 94063, USA
| | - Jimin Park
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, California 95060, USA
| | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, California 95060, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Richard Axel
- The Mortimer B. Zuckerman Mind Brain Behavior Institute, Department of Neuroscience, Columbia University, New York, New York 10027, USA
- Howard Hughes Medical Institute, Columbia University, New York, New York 10032, USA
| | - Bruce Martin
- Chan Zuckerberg Initiative Foundation, Redwood City, California 94063, USA
| | - Paolo Carnevali
- Chan Zuckerberg Initiative Foundation, Redwood City, California 94063, USA;
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, California 95060, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, California 95060, USA;
| |
Collapse
|
16
|
Murray CS, Karram M, Bass DJ, Doceti M, Becker D, Nunez JCB, Ratan A, Bergland AO. Balancing selection and the functional effects of shared polymorphism in cryptic Daphnia species. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.16.589693. [PMID: 38659826 PMCID: PMC11042267 DOI: 10.1101/2024.04.16.589693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
The patterns of genetic variation within and between related taxa represent the genetic history of a species. Shared polymorphisms, loci with identical alleles across species, are of unique interest as they may represent cases of ancient selection maintaining functional variation post-speciation. In this study, we investigate the abundance of shared polymorphism in the Daphnia pulex species complex. We test whether shared mutations are consistent with the action of balancing selection or alternative hypotheses such as hybridization, incomplete lineage sorting, or convergent evolution. We analyzed over 2,000 genomes from North American and European D. pulex and several outgroup species to examine the prevalence and distribution of shared alleles between the focal species pair, North American and European D. pulex. We show that while North American and European D. pulex diverged over ten million years ago, they retained tens of thousands of shared alleles. We found that the number of shared polymorphisms between North American and European D. pulex cannot be explained by hybridization or incomplete lineage sorting alone. Instead, we show that most shared polymorphisms could be the product of convergent evolution, that a limited number appear to be old trans-specific polymorphisms, and that balancing selection is affecting young and ancient mutations alike. Finally, we provide evidence that a blue wavelength opsin gene with trans-specific polymorphisms has functional effects on behavior and fitness in the wild. Ultimately, our findings provide insights into the genetic basis of adaptation and the maintenance of genetic diversity between species.
Collapse
Affiliation(s)
- Connor S. Murray
- Department of Biology, University of Virginia, Charlottesville, VA, USA
| | - Madison Karram
- Department of Biology, University of Virginia, Charlottesville, VA, USA
| | - David J. Bass
- Department of Biology, University of Virginia, Charlottesville, VA, USA
| | - Madison Doceti
- Department of Biology, University of Virginia, Charlottesville, VA, USA
| | - Dörthe Becker
- Department of Biology, University of Virginia, Charlottesville, VA, USA
- School of Biosciences, Ecology and Evolutionary Biology, University of Sheffield, Sheffield, UK
| | | | - Aakrosh Ratan
- Center of Public Health Genomics, University of Virginia, Charlottesville, VA, USA
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA, USA
| | - Alan O. Bergland
- Department of Biology, University of Virginia, Charlottesville, VA, USA
| |
Collapse
|
17
|
Lin MJ, Iyer S, Chen NC, Langmead B. Measuring, visualizing, and diagnosing reference bias with biastools. Genome Biol 2024; 25:101. [PMID: 38641647 PMCID: PMC11027314 DOI: 10.1186/s13059-024-03240-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 04/04/2024] [Indexed: 04/21/2024] Open
Abstract
Many bioinformatics methods seek to reduce reference bias, but no methods exist to comprehensively measure it. Biastools analyzes and categorizes instances of reference bias. It works in various scenarios: when the donor's variants are known and reads are simulated; when donor variants are known and reads are real; and when variants are unknown and reads are real. Using biastools, we observe that more inclusive graph genomes result in fewer biased sites. We find that end-to-end alignment reduces bias at indels relative to local aligners. Finally, we use biastools to characterize how T2T references improve large-scale bias.
Collapse
Affiliation(s)
- Mao-Jan Lin
- Department of Computer Science, Johns Hopkins University, Baltimore, USA.
| | - Sheila Iyer
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, USA.
| |
Collapse
|
18
|
Tomlin CM, Rajaraman S, Sebesta JT, Scheen AC, Bendiksby M, Low YW, Salojärvi J, Michael TP, Albert VA, Lindqvist C. Allopolyploid origin and diversification of the Hawaiian endemic mints. Nat Commun 2024; 15:3109. [PMID: 38600100 PMCID: PMC11006916 DOI: 10.1038/s41467-024-47247-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 03/26/2024] [Indexed: 04/12/2024] Open
Abstract
Island systems provide important contexts for studying processes underlying lineage migration, species diversification, and organismal extinction. The Hawaiian endemic mints (Lamiaceae family) are the second largest plant radiation on the isolated Hawaiian Islands. We generated a chromosome-scale reference genome for one Hawaiian species, Stenogyne calaminthoides, and resequenced 45 relatives, representing 34 species, to uncover the continental origins of this group and their subsequent diversification. We further resequenced 109 individuals of two Stenogyne species, and their purported hybrids, found high on the Mauna Kea volcano on the island of Hawai'i. The three distinct Hawaiian genera, Haplostachys, Phyllostegia, and Stenogyne, are nested inside a fourth genus, Stachys. We uncovered four independent polyploidy events within Stachys, including one allopolyploidy event underlying the Hawaiian mints and their direct western North American ancestors. While the Hawaiian taxa may have principally diversified by parapatry and drift in small and fragmented populations, localized admixture may have played an important role early in lineage diversification. Our genomic analyses provide a view into how organisms may have radiated on isolated island chains, settings that provided one of the principal natural laboratories for Darwin's thinking about the evolutionary process.
Collapse
Affiliation(s)
- Crystal M Tomlin
- Department of Biological Sciences, University at Buffalo, New York, USA
| | - Sitaram Rajaraman
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
- Organismal and Evolutionary Biology Research Programme, Faculty of Biological and Environmental Sciences, University of Helsinki, Helsinki, Finland
| | | | | | - Mika Bendiksby
- Natural History Museum, University of Oslo, Oslo, Norway
| | - Yee Wen Low
- Singapore Botanic Gardens, National Parks Board, Singapore, Singapore
| | - Jarkko Salojärvi
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
- Organismal and Evolutionary Biology Research Programme, Faculty of Biological and Environmental Sciences, University of Helsinki, Helsinki, Finland
| | - Todd P Michael
- The Plant Molecular and Cellular Biology Laboratory, Salk Institute for Biological Studies, La Jolla, California, USA
| | - Victor A Albert
- Department of Biological Sciences, University at Buffalo, New York, USA.
| | | |
Collapse
|
19
|
Garrido Marques A, Rubinacci S, Malaspinas AS, Delaneau O, Sousa da Mota B. Assessing the impact of post-mortem damage and contamination on imputation performance in ancient DNA. Sci Rep 2024; 14:6227. [PMID: 38486065 PMCID: PMC10940295 DOI: 10.1038/s41598-024-56584-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 03/08/2024] [Indexed: 03/18/2024] Open
Abstract
Low-coverage imputation is becoming ever more present in ancient DNA (aDNA) studies. Imputation pipelines commonly used for present-day genomes have been shown to yield accurate results when applied to ancient genomes. However, post-mortem damage (PMD), in the form of C-to-T substitutions at the reads termini, and contamination with DNA from closely related species can potentially affect imputation performance in aDNA. In this study, we evaluated imputation performance (i) when using a genotype caller designed for aDNA, ATLAS, compared to bcftools, and (ii) when contamination is present. We evaluated imputation performance with principal component analyses and by calculating imputation error rates. With a particular focus on differently imputed sites, we found that using ATLAS prior to imputation substantially improved imputed genotypes for a very damaged ancient genome (42% PMD). Trimming the ends of the sequencing reads led to similar improvements in imputation accuracy. For the remaining genomes, ATLAS brought limited gains. Finally, to examine the effect of contamination on imputation, we added various amounts of reads from two present-day genomes to a previously downsampled high-coverage ancient genome. We observed that imputation accuracy drastically decreased for contamination rates above 5%. In conclusion, we recommend (i) accounting for PMD by either trimming sequencing reads or using a genotype caller such as ATLAS before imputing highly damaged genomes and (ii) only imputing genomes containing up to 5% of contamination.
Collapse
Affiliation(s)
| | - Simone Rubinacci
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Anna-Sapfo Malaspinas
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | | | - Bárbara Sousa da Mota
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland.
| |
Collapse
|
20
|
Coombes B, Lux T, Akhunov E, Hall A. Introgressions lead to reference bias in wheat RNA-seq analysis. BMC Biol 2024; 22:56. [PMID: 38454464 PMCID: PMC10921782 DOI: 10.1186/s12915-024-01853-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 02/21/2024] [Indexed: 03/09/2024] Open
Abstract
BACKGROUND RNA-seq is a fundamental technique in genomics, yet reference bias, where transcripts derived from non-reference alleles are quantified less accurately, can undermine the accuracy of RNA-seq quantification and thus the conclusions made downstream. Reference bias in RNA-seq analysis has yet to be explored in complex polyploid genomes despite evidence that they are often a complex mosaic of wild relative introgressions, which introduce blocks of highly divergent genes. RESULTS Here we use hexaploid wheat as a model complex polyploid, using both simulated and experimental data to show that RNA-seq alignment in wheat suffers from widespread reference bias which is largely driven by divergent introgressed genes. This leads to underestimation of gene expression and incorrect assessment of homoeologue expression balance. By incorporating gene models from ten wheat genome assemblies into a pantranscriptome reference, we present a novel method to reduce reference bias, which can be readily scaled to capture more variation as new genome and transcriptome data becomes available. CONCLUSIONS This study shows that the presence of introgressions can lead to reference bias in wheat RNA-seq analysis. Caution should be exercised by researchers using non-sample reference genomes for RNA-seq alignment and novel methods, such as the one presented here, should be considered.
Collapse
Affiliation(s)
| | - Thomas Lux
- Plant Genome and Systems Biology, Helmholtz Zentrum München, Neuherberg, Germany
| | - Eduard Akhunov
- Department of Plant Pathology, Kansas State University, Manhattan, KS, USA
| | - Anthony Hall
- Earlham Institute, Norwich, Norfolk, NR4 7UZ, UK.
| |
Collapse
|
21
|
Ward CM, Onetto CA, Van Den Heuvel S, Cuijvers KM, Hale LJ, Borneman AR. Recombination, admixture and genome instability shape the genomic landscape of Saccharomyces cerevisiae derived from spontaneous grape ferments. PLoS Genet 2024; 20:e1011223. [PMID: 38517929 PMCID: PMC10990190 DOI: 10.1371/journal.pgen.1011223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 04/03/2024] [Accepted: 03/12/2024] [Indexed: 03/24/2024] Open
Abstract
Cultural exchange of fermentation techniques has driven the spread of Saccharomyces cerevisiae across the globe, establishing natural populations in many countries. Despite this, Oceania is thought to lack native populations of S. cerevisiae, only being introduced after colonisation. Here we investigate the genomic landscape of 411 S. cerevisiae isolated from spontaneous grape fermentations in Australia across multiple locations, years, and grape cultivars. Spontaneous fermentations contained highly recombined mosaic strains that exhibited high levels of genome instability. Assigning genomic windows to putative ancestral origin revealed that few closely related starter lineages have come to dominate the genetic landscape, contributing most of the genetic variation. Fine-scale phylogenetic analysis of loci not observed in strains of commercial wine origin identified widespread admixture with European derived beer yeast along with three independent admixture events from potentially endemic Oceanic lineages that was associated with genome instability. Finally, we investigated Australian ecological niches for basal isolates, identifying phylogenetically distinct S. cerevisiae of non-European, non-domesticated origin associated with admixture loci. Our results illustrate the effect commercial use of microbes may have on local microorganism genetic diversity and demonstrates the presence of non-domesticated, potentially endemic lineages of S. cerevisiae in Australian niches that are actively admixing.
Collapse
Affiliation(s)
- Chris M. Ward
- Australian Wine Research Institute, Urrbrae, South Australia, Australia
| | - Cristobal A. Onetto
- Australian Wine Research Institute, Urrbrae, South Australia, Australia
- University of Adelaide, Adelaide, South Australia, Australia
| | | | | | - Laura J. Hale
- Australian Wine Research Institute, Urrbrae, South Australia, Australia
| | - Anthony R. Borneman
- Australian Wine Research Institute, Urrbrae, South Australia, Australia
- University of Adelaide, Adelaide, South Australia, Australia
| |
Collapse
|
22
|
Lin MJ, Iyer S, Chen NC, Langmead B. Measuring, visualizing and diagnosing reference bias with biastools. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.13.557552. [PMID: 37745608 PMCID: PMC10515925 DOI: 10.1101/2023.09.13.557552] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Many bioinformatics methods seek to reduce reference bias, but no methods exist to comprehensively measure it. Biastools analyzes and categorizes instances of reference bias. It works in various scenarios, i.e. (a) when the donor's variants are known and reads are simulated, (b) when donor variants are known and reads are real, and (c) when variants are unknown and reads are real. Using biastools, we observe that more inclusive graph genomes result in fewer biased sites. We find that end-to-end alignment reduces bias at indels relative to local aligners. Finally, we use biastools to characterize how T2T references improve large-scale bias.
Collapse
Affiliation(s)
- Mao-Jan Lin
- Department of Computer Science, Johns Hopkins University
| | - Sheila Iyer
- Department of Computer Science, Johns Hopkins University
| | - Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University
| |
Collapse
|
23
|
Dorji J, Reverter A, Alexandre PA, Chamberlain AJ, Vander-Jagt CJ, Kijas J, Porto-Neto LR. Ancestral alleles defined for 70 million cattle variants using a population-based likelihood ratio test. Genet Sel Evol 2024; 56:11. [PMID: 38321371 PMCID: PMC10848479 DOI: 10.1186/s12711-024-00879-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 01/30/2024] [Indexed: 02/08/2024] Open
Abstract
BACKGROUND The study of ancestral alleles provides insights into the evolutionary history, selection, and genetic structures of a population. In cattle, ancestral alleles are widely used in genetic analyses, including the detection of signatures of selection, determination of breed ancestry, and identification of admixture. Having a comprehensive list of ancestral alleles is expected to improve the accuracy of these genetic analyses. However, the list of ancestral alleles in cattle, especially at the whole genome sequence level, is far from complete. In fact, the current largest list of ancestral alleles (~ 42 million) represents less than 28% of the total number of detected variants in cattle. To address this issue and develop a genomic resource for evolutionary studies, we determined ancestral alleles in cattle by comparing prior derived whole-genome sequence variants to an out-species group using a population-based likelihood ratio test. RESULTS Our study determined and makes available the largest list of ancestral alleles in cattle to date (70.1 million) and includes 2.3 million on the X chromosome. There was high concordance (97.6%) of the determined ancestral alleles with those from previous studies when only high-probability ancestral alleles were considered (29.8 million positions) and another 23.5 million high-confidence ancestral alleles were novel, expanding the available reference list to improve the accuracies of genetic analyses involving ancestral alleles. The high concordance of the results with previous studies implies that our approach using genomic sequence variants and a likelihood ratio test to determine ancestral alleles is appropriate. CONCLUSIONS Considering the high concordance of ancestral alleles across studies, the ancestral alleles determined in this study including those not previously listed, particularly those with high-probability estimates, may be used for further genetic analyses with reasonable accuracy. Our approach that used predetermined variants in species and the likelihood ratio test to determine ancestral alleles is applicable to other species for which sequence level genotypes are available.
Collapse
Affiliation(s)
- Jigme Dorji
- CSIRO, Agriculture & Food, St. Lucia, QLD, 4067, Australia.
| | | | | | - Amanda J Chamberlain
- AgriBio, Centre for AgriBioscience, Agriculture Victoria, Bundoora, VIC, 3083, Australia
| | - Christy J Vander-Jagt
- AgriBio, Centre for AgriBioscience, Agriculture Victoria, Bundoora, VIC, 3083, Australia
| | - James Kijas
- CSIRO, Agriculture & Food, St. Lucia, QLD, 4067, Australia
| | | |
Collapse
|
24
|
Lucena-Perez M, Paijmans JLA, Nocete F, Nadal J, Detry C, Dalén L, Hofreiter M, Barlow A, Godoy JA. Recent increase in species-wide diversity after interspecies introgression in the highly endangered Iberian lynx. Nat Ecol Evol 2024; 8:282-292. [PMID: 38225424 DOI: 10.1038/s41559-023-02267-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 11/10/2023] [Indexed: 01/17/2024]
Abstract
Genetic diversity is lost in small and isolated populations, affecting many globally declining species. Interspecific admixture events can increase genetic variation in the recipient species' gene pool, but empirical examples of species-wide restoration of genetic diversity by admixture are lacking. Here we present multi-fold coverage genomic data from three ancient Iberian lynx (Lynx pardinus) approximately 2,000-4,000 years old and show a continuous or recurrent process of interspecies admixture with the Eurasian lynx (Lynx lynx) that increased modern Iberian lynx genetic diversity above that occurring millennia ago despite its recent demographic decline. Our results add to the accumulating evidence for natural admixture and introgression among closely related species and show that this can result in an increase of species-wide genetic diversity in highly genetically eroded species. The strict avoidance of interspecific sources in current genetic restoration measures needs to be carefully reconsidered, particularly in cases where no conspecific source population exists.
Collapse
Affiliation(s)
- Maria Lucena-Perez
- Department of Ecology and Evolution, Estación Biológica de Doñana, CSIC, Seville, Spain
| | - Johanna L A Paijmans
- Evolutionary Adaptive Genomics, University of Potsdam, Potsdam, Germany
- Department of Zoology, University of Cambridge, Cambridge, UK
| | - Francisco Nocete
- Grupo de Investigación MIDAS, Departamento Historia I (Prehistoria), Universidad de Huelva, Huelva, Spain
| | - Jordi Nadal
- SERP, Departament de Prehistoria, Historia Antiga i Arqueologia, Universitat de Barcelona, Barcelona, Spain
| | - Cleia Detry
- UNIARQ - Centro de Arqueologia da Faculdade de Letras da Universidade de Lisboa, Alameda da Universidade, Lisbon, Portugal
| | - Love Dalén
- Centre for Palaeogenetics, Stockholm, Sweden
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden
| | - Michael Hofreiter
- Evolutionary Adaptive Genomics, University of Potsdam, Potsdam, Germany
| | - Axel Barlow
- School of Environmental and Natural Sciences, Bangor University, Bangor, Gwynedd, UK
| | - José A Godoy
- Department of Ecology and Evolution, Estación Biológica de Doñana, CSIC, Seville, Spain.
| |
Collapse
|
25
|
Furuta T, Yamamoto T. MCPtaggR: R package for accurate genotype calling in reduced representation sequencing data by eliminating error-prone markers based on genome comparison. DNA Res 2024; 31:dsad027. [PMID: 38134958 PMCID: PMC10799318 DOI: 10.1093/dnares/dsad027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 12/11/2023] [Accepted: 12/18/2023] [Indexed: 12/24/2023] Open
Abstract
Reduced representation sequencing (RRS) offers cost-effective, high-throughput genotyping platforms such as genotyping-by-sequencing (GBS). RRS reads are typically mapped onto a reference genome. However, mapping reads harbouring mismatches against the reference can potentially result in mismapping and biased mapping, leading to the detection of error-prone markers that provide incorrect genotype information. We established a genotype-calling pipeline named mappable collinear polymorphic tag genotyping (MCPtagg) to achieve accurate genotyping by eliminating error-prone markers. MCPtagg was designed for the RRS-based genotyping of a population derived from a biparental cross. The MCPtagg pipeline filters out error-prone markers prior to genotype calling based on marker collinearity information obtained by comparing the genome sequences of the parents of a population to be genotyped. A performance evaluation on real GBS data from a rice F2 population confirmed its effectiveness. Furthermore, our performance test using a genome assembly that was obtained by genome sequence polishing on an available genome assembly suggests that our pipeline performs well with converted genomes, rather than necessitating de novo assembly. This demonstrates its flexibility and scalability. The R package, MCPtaggR, was developed to provide functions for the pipeline and is available at https://github.com/tomoyukif/MCPtaggR.
Collapse
Affiliation(s)
- Tomoyuki Furuta
- Institute of Plant Science and Resources, Okayama University, Kurashiki, Okayama, Japan
| | - Toshio Yamamoto
- Institute of Plant Science and Resources, Okayama University, Kurashiki, Okayama, Japan
| |
Collapse
|
26
|
Schiebelhut LM, Guillaume AS, Kuhn A, Schweizer RM, Armstrong EE, Beaumont MA, Byrne M, Cosart T, Hand BK, Howard L, Mussmann SM, Narum SR, Rasteiro R, Rivera-Colón AG, Saarman N, Sethuraman A, Taylor HR, Thomas GWC, Wellenreuther M, Luikart G. Genomics and conservation: Guidance from training to analyses and applications. Mol Ecol Resour 2024; 24:e13893. [PMID: 37966259 DOI: 10.1111/1755-0998.13893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 10/25/2023] [Accepted: 10/30/2023] [Indexed: 11/16/2023]
Abstract
Environmental change is intensifying the biodiversity crisis and threatening species across the tree of life. Conservation genomics can help inform conservation actions and slow biodiversity loss. However, more training, appropriate use of novel genomic methods and communication with managers are needed. Here, we review practical guidance to improve applied conservation genomics. We share insights aimed at ensuring effectiveness of conservation actions around three themes: (1) improving pedagogy and training in conservation genomics including for online global audiences, (2) conducting rigorous population genomic analyses properly considering theory, marker types and data interpretation and (3) facilitating communication and collaboration between managers and researchers. We aim to update students and professionals and expand their conservation toolkit with genomic principles and recent approaches for conserving and managing biodiversity. The biodiversity crisis is a global problem and, as such, requires international involvement, training, collaboration and frequent reviews of the literature and workshops as we do here.
Collapse
Affiliation(s)
- Lauren M Schiebelhut
- Life and Environmental Sciences, University of California, Merced, California, USA
| | - Annie S Guillaume
- Geospatial Molecular Epidemiology group (GEOME), Laboratory for Biological Geochemistry (LGB), École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Arianna Kuhn
- Department of Biological Sciences, University of Lethbridge, Lethbridge, Alberta, Canada
- Virginia Museum of Natural History, Martinsville, Virginia, USA
| | - Rena M Schweizer
- Division of Biological Sciences, University of Montana, Missoula, Montana, USA
| | | | - Mark A Beaumont
- School of Biological Sciences, University of Bristol, Bristol, UK
| | - Margaret Byrne
- Department of Biodiversity, Conservation and Attractions, Biodiversity and Conservation Science, Perth, Western Australia, Australia
| | - Ted Cosart
- Flathead Lake Biology Station, University of Montana, Missoula, Montana, USA
| | - Brian K Hand
- Flathead Lake Biological Station, University of Montana, Polson, Montana, USA
| | - Leif Howard
- Flathead Lake Biology Station, University of Montana, Missoula, Montana, USA
| | - Steven M Mussmann
- Southwestern Native Aquatic Resources and Recovery Center, U.S. Fish & Wildlife Service, Dexter, New Mexico, USA
| | - Shawn R Narum
- Hagerman Genetics Lab, University of Idaho, Hagerman, Idaho, USA
| | - Rita Rasteiro
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK
| | - Angel G Rivera-Colón
- Department of Evolution, Ecology, and Behavior, University of Illinois at Urbana-Champaign, Champaign, Illinois, USA
| | - Norah Saarman
- Department of Biology and Ecology Center, Utah State University, Logan, Utah, USA
| | - Arun Sethuraman
- Department of Biology, San Diego State University, San Diego, California, USA
| | - Helen R Taylor
- Royal Zoological Society of Scotland, Edinburgh, Scotland
| | - Gregg W C Thomas
- Informatics Group, Harvard University, Cambridge, Massachusetts, USA
| | - Maren Wellenreuther
- Plant and Food Research, Nelson, New Zealand
- University of Auckland, Auckland, New Zealand
| | - Gordon Luikart
- Division of Biological Sciences, University of Montana, Missoula, Montana, USA
- Flathead Lake Biology Station, University of Montana, Missoula, Montana, USA
| |
Collapse
|
27
|
van der Valk T, Jensen A, Caillaud D, Guschanski K. Comparative genomic analyses provide new insights into evolutionary history and conservation genomics of gorillas. BMC Ecol Evol 2024; 24:14. [PMID: 38273244 PMCID: PMC10811819 DOI: 10.1186/s12862-023-02195-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 12/22/2023] [Indexed: 01/27/2024] Open
Abstract
Genome sequencing is a powerful tool to understand species evolutionary history, uncover genes under selection, which could be informative of local adaptation, and infer measures of genetic diversity, inbreeding and mutational load that could be used to inform conservation efforts. Gorillas, critically endangered primates, have received considerable attention and with the recently sequenced Bwindi mountain gorilla population, genomic data is now available from all gorilla subspecies and both mountain gorilla populations. Here, we reanalysed this rich dataset with a focus on evolutionary history, local adaptation and genomic parameters relevant for conservation. We estimate a recent split between western and eastern gorillas of 150,000-180,000 years ago, with gene flow around 20,000 years ago, primarily between the Cross River and Grauer's gorilla subspecies. This gene flow event likely obscures evolutionary relationships within eastern gorillas: after excluding putatively introgressed genomic regions, we uncover a sister relationship between Virunga mountain gorillas and Grauer's gorillas to the exclusion of Bwindi mountain gorillas. This makes mountain gorillas paraphyletic. Eastern gorillas are less genetically diverse and more inbred than western gorillas, yet we detected lower genetic load in the eastern species. Analyses of indels fit remarkably well with differences in genetic diversity across gorilla taxa as recovered with nucleotide diversity measures. We also identified genes under selection and unique gene variants specific for each gorilla subspecies, encoding, among others, traits involved in immunity, diet, muscular development, hair morphology and behavior. The presence of this functional variation suggests that the subspecies may be locally adapted. In conclusion, using extensive genomic resources we provide a comprehensive overview of gorilla genomic diversity, including a so-far understudied Bwindi mountain gorilla population, identify putative genes involved in local adaptation, and detect population-specific gene flow across gorilla species.
Collapse
Affiliation(s)
- Tom van der Valk
- Centre for Palaeogenetics, Stockholm, Sweden.
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden.
- SciLifeLab, Stockholm, Sweden.
- Department of Zoology, Stockholm University, Stockholm, Sweden.
| | - Axel Jensen
- Department of Ecology and Genetics, Animal Ecology, Uppsala University, Uppsala, Sweden
| | - Damien Caillaud
- Department of Anthropology, University of CA - Davis, Davis, California, USA
| | - Katerina Guschanski
- SciLifeLab, Stockholm, Sweden
- Department of Ecology and Genetics, Animal Ecology, Uppsala University, Uppsala, Sweden
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
28
|
Liu K, Xie N, Wang Y. Quantifying mitochondrial heteroplasmy diversity: A computational approach. Mol Ecol Resour 2024; 24:e13874. [PMID: 37815422 DOI: 10.1111/1755-0998.13874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 09/22/2023] [Indexed: 10/11/2023]
Abstract
Biodiversity plays a pivotal role in sustaining ecosystem processes, encompassing diverse biological species, genetic types and the intricacies of ecosystem composition. However, the precise definition of biodiversity at the individual level remains a challenging endeavour. Hill numbers, derived from Rényi's entropy, have emerged as a popular measure of diversity, with a recent unified framework extending their application across various levels, from genetics to ecosystems. In this study, we employ a computational approach to exploring the diversity of mitochondrial heteroplasmy using real-world data. By adopting Hill numbers with q = 2, we demonstrate the feasibility of quantifying mitochondrial heteroplasmy diversity within and between individuals and populations. Furthermore, we investigate the alpha diversity of mitochondrial heteroplasmy among different species, revealing heterogeneity at multiple levels, including mitogenome components and protein-coding genes (PCGs). Our analysis explores large-scale mitochondrial heteroplasmy data in humans, examining the relationship between alpha diversity at the mitogenome components and PCGs level. Notably, we do not find a significant correlation between these two levels. Additionally, we observe significant correlations in alpha diversity between mothers and children in blood samples, exceeding the reported R2 value for allele frequency correlations. Moreover, our investigation of beta diversity and local overlay similarity demonstrates that heteroplasmy variant distributions in different tissues of children more closely resemble those of their mothers. Through systematic quantification and analysis of mitochondrial heteroplasmy diversity, this study enhances our understanding of heterogeneity at multiple levels, from individuals to populations, providing new insights into this fundamental dimension of biodiversity.
Collapse
Affiliation(s)
- Kai Liu
- Institute of Fishery Science, Hangzhou Academy of Agricultural Sciences, Hangzhou, China
| | - Nan Xie
- Institute of Fishery Science, Hangzhou Academy of Agricultural Sciences, Hangzhou, China
| | - Yuxi Wang
- Institute of Fishery Science, Hangzhou Academy of Agricultural Sciences, Hangzhou, China
| |
Collapse
|
29
|
Liu K, Xie N, Wang Y, Liu X. The Utilization of Reference-Guided Assembly and In Silico Libraries Improves the Draft Genome of Clarias batrachus and Culter alburnus. MARINE BIOTECHNOLOGY (NEW YORK, N.Y.) 2023; 25:907-917. [PMID: 37661218 DOI: 10.1007/s10126-023-10248-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 08/28/2023] [Indexed: 09/05/2023]
Abstract
Long-read sequencing technologies can generate highly contiguous genome assemblies compared to short-read methods. However, their higher cost often poses a significant barrier. To address this, we explore the utilization of mapping-based genome assembly and reference-guided assembly as cost-effective alternative approaches. We assess the efficacy of these approaches in improving the contiguity of Clarias batrachus and Culter alburnus draft genomes. Our findings demonstrate that employing an iterative mapping strategy leads to a reduction in assembly errors. Specifically, after three iterations, the Mismatches per 100 kbp value for the C. batrachus genome decreased from 2447.20 to 2432.67, reaching a minimum of 2422.67 after two iterations. Additionally, the N50 value for the C. batrachus genome increased from 362,143 to 1,315,126 bp, with a maximum of 1,315,403 bp after two iterations. Furthermore, we achieved Mismatches per 100 kbp values of 3.70 for the reference-guided assembly of C. batrachus and 0.34 for C. alburnus. Correspondingly, the N50 value for the C. batrachus and C. alburnus genomes increased from 362,143 bp and 3,686,385 bp to 2,026,888 bp and 43,735,735 bp, respectively. Finally, we successfully utilized the improved C. batrachus and C. alburnus genomes to compare genome studies using the combined approach of Ragout and Ragtag. Through a comprehensive comparative analysis of mapping-based and reference-guided genome assembly methods, we shed light on the specific contributions of reference-guided assembly in reducing assembly errors and improving assembly continuity and integrity. These advancements establish reference-guided assembly and the utilization of in silico libraries as a promising and suitable approach for comparative genomics studies.
Collapse
Affiliation(s)
- Kai Liu
- Institute of Fishery Science, Hangzhou Academy of Agricultural Sciences, Hangzhou, 310024, China.
| | - Nan Xie
- Institute of Fishery Science, Hangzhou Academy of Agricultural Sciences, Hangzhou, 310024, China
| | - Yuxi Wang
- Institute of Fishery Science, Hangzhou Academy of Agricultural Sciences, Hangzhou, 310024, China
| | - Xinyi Liu
- Institute of Fishery Science, Hangzhou Academy of Agricultural Sciences, Hangzhou, 310024, China
| |
Collapse
|
30
|
Rice ES, Alberdi A, Alfieri J, Athrey G, Balacco JR, Bardou P, Blackmon H, Charles M, Cheng HH, Fedrigo O, Fiddaman SR, Formenti G, Frantz LAF, Gilbert MTP, Hearn CJ, Jarvis ED, Klopp C, Marcos S, Mason AS, Velez-Irizarry D, Xu L, Warren WC. A pangenome graph reference of 30 chicken genomes allows genotyping of large and complex structural variants. BMC Biol 2023; 21:267. [PMID: 37993882 PMCID: PMC10664547 DOI: 10.1186/s12915-023-01758-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 11/02/2023] [Indexed: 11/24/2023] Open
Abstract
BACKGROUND The red junglefowl, the wild outgroup of domestic chickens, has historically served as a reference for genomic studies of domestic chickens. These studies have provided insight into the etiology of traits of commercial importance. However, the use of a single reference genome does not capture diversity present among modern breeds, many of which have accumulated molecular changes due to drift and selection. While reference-based resequencing is well-suited to cataloging simple variants such as single-nucleotide changes and short insertions and deletions, it is mostly inadequate to discover more complex structural variation in the genome. METHODS We present a pangenome for the domestic chicken consisting of thirty assemblies of chickens from different breeds and research lines. RESULTS We demonstrate how this pangenome can be used to catalog structural variants present in modern breeds and untangle complex nested variation. We show that alignment of short reads from 100 diverse wild and domestic chickens to this pangenome reduces reference bias by 38%, which affects downstream genotyping results. This approach also allows for the accurate genotyping of a large and complex pair of structural variants at the K feathering locus using short reads, which would not be possible using a linear reference. CONCLUSIONS We expect that this new paradigm of genomic reference will allow better pinpointing of exact mutations responsible for specific phenotypes, which will in turn be necessary for breeding chickens that meet new sustainability criteria and are resilient to quickly evolving pathogen threats.
Collapse
Affiliation(s)
- Edward S Rice
- Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
- Faculty of Veterinary Medicine, Ludwig-Maximilians-Universität, Munich, Germany
| | - Antton Alberdi
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen (UCPH), Copenhagen, Denmark
| | - James Alfieri
- Department of Ecology & Evolutionary Biology, Texas A&M University, College Station, TX, USA
| | - Giridhar Athrey
- Department of Poultry Science, Texas A&M University, College Station, TX, USA
| | - Jennifer R Balacco
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Philippe Bardou
- Sigenae, GenPhySE, Université de Toulouse, INRAE, ENVT, Castanet Tolosan, 31326, France
| | - Heath Blackmon
- Department of Biology, Texas A&M University, College Station, TX, USA
| | - Mathieu Charles
- University Paris-Saclay, INRAE, AgroParisTech, GABI, Sigenae, Jouy-en-Josas, France
| | - Hans H Cheng
- Avian Disease and Oncology Laboratory, USDA, ARS, USNPRC, East Lansing, MI, USA
| | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | | | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Laurent A F Frantz
- Faculty of Veterinary Medicine, Ludwig-Maximilians-Universität, Munich, Germany
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, E1 4DQ, UK
| | - M Thomas P Gilbert
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen (UCPH), Copenhagen, Denmark
| | - Cari J Hearn
- Avian Disease and Oncology Laboratory, USDA, ARS, USNPRC, East Lansing, MI, USA
| | - Erich D Jarvis
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
- The Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Christophe Klopp
- Sigenae, Genotoul Bioinfo, MIAT UR875, INRAE, Castanet Tolosan, France
| | - Sofia Marcos
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen (UCPH), Copenhagen, Denmark
- Applied Genomics and Bioinformatics, University of the Basque Country (UPV/EHU), Leioa, Bilbao, Spain
| | | | | | - Luohao Xu
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Key Laboratory of Aquatic Science of Chongqing, School of Life Sciences, Southwest University, Chongqing, 400715, China
| | - Wesley C Warren
- Department of Animal Sciences, University of Missouri, Columbia, MO, USA.
| |
Collapse
|
31
|
Reeves IM, Totterdell JA, Betty EL, Donnelly DM, George A, Holmes S, Moller L, Stockin KA, Wellard R, White C, Foote AD. Ancestry testing of "Old Tom," a killer whale central to mutualistic interactions with human whalers. J Hered 2023; 114:598-611. [PMID: 37821799 PMCID: PMC10650950 DOI: 10.1093/jhered/esad058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 09/21/2023] [Indexed: 10/13/2023] Open
Abstract
Cooperative hunting between humans and killer whales (Orcinus orca) targeting baleen whales was reported in Eden, New South Wales, Australia, for almost a century. By 1928, whaling operations had ceased, and local killer whale sightings became scarce. A killer whale from the group, known as "Old Tom," washed up dead in 1930 and his skeleton was preserved. How these killer whales from Eden relate to other populations globally and whether their genetic descendants persist today remains unknown. We extracted and sequenced DNA from Old Tom using ancient DNA techniques. Genomic sequences were then compared with a global dataset of mitochondrial and nuclear genomes. Old Tom shared a most recent common ancestor with killer whales from Australasia, the North Atlantic, and the North Pacific, having the highest genetic similarity with contemporary New Zealand killer whales. However, much of the variation found in Old Tom's genome was not shared with these widespread populations, suggesting ancestral rather than ongoing gene flow. Our genetic comparisons also failed to find any clear descendants of Tom, raising the possibility of local extinction of this group. We integrated Traditional Custodian knowledge to recapture the events in Eden and recognize that Indigenous Australians initiated the relationship with the killer whales before European colonization and the advent of commercial whaling locally. This study rectifies discrepancies in local records and provides new insight into the origins of the killer whales in Eden and the history of Australasian killer whales.
Collapse
Affiliation(s)
- Isabella M Reeves
- Flinders University, College of Science and Engineering, Bedford Park, Adelaide,South Australia, Australia
- Cetacean Research Centre (CETREC WA), Esperance, Perth, Western Australia, Australia
| | - John A Totterdell
- Cetacean Research Centre (CETREC WA), Esperance, Perth, Western Australia, Australia
| | - Emma L Betty
- Cetacean Ecology Research Group, School of Natural Sciences, Massey University, Auckland, New Zealand
| | - David M Donnelly
- Killer Whales Australia, Mornington, Melbourne, Victoria, Australia
| | - Angela George
- Eden Killer Whale Museum, New South Wales, Sydney, Australia
| | - Steven Holmes
- Eden Killer Whale Museum, New South Wales, Sydney, Australia
| | - Luciana Moller
- Flinders University, College of Science and Engineering, Bedford Park, Adelaide,South Australia, Australia
- Cetacean Ecology, Behaviour and Evolution Laboratory, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, South Australia, Australia
- Molecular Ecology Laboratory, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, South Australia, Australia
| | - Karen A Stockin
- Cetacean Ecology Research Group, School of Natural Sciences, Massey University, Auckland, New Zealand
| | | | - Charlie White
- Flinders University, College of Science and Engineering, Bedford Park, Adelaide,South Australia, Australia
- Cetacean Ecology, Behaviour and Evolution Laboratory, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, South Australia, Australia
| | - Andrew D Foote
- Department of Natural History, NTNU University Museum, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
- Department of Biosciences, Centre for Ecological and Evolutionary Synthesis, University of Oslo, Oslo, Norway
| |
Collapse
|
32
|
Davidson R, Williams MP, Roca-Rada X, Kassadjikova K, Tobler R, Fehren-Schmitz L, Llamas B. Allelic bias when performing in-solution enrichment of ancient human DNA. Mol Ecol Resour 2023; 23:1823-1840. [PMID: 37712846 DOI: 10.1111/1755-0998.13869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 08/11/2023] [Indexed: 09/16/2023]
Abstract
In-solution hybridisation enrichment of genetic variation is a valuable methodology in human paleogenomics. It allows enrichment of endogenous DNA by targeting genetic markers that are comparable between sequencing libraries. Many studies have used the 1240k reagent-which enriches 1,237,207 genome-wide SNPs-since 2015, though access was restricted. In 2021, Twist Biosciences and Daicel Arbor Biosciences independently released commercial kits that enabled all researchers to perform enrichments for the same 1240 k SNPs. We used the Daicel Arbor Biosciences Prime Plus kit to enrich 132 ancient samples from three continents. We identified a systematic assay bias that increases genetic similarity between enriched samples and that cannot be explained by batch effects. We present the impact of the bias on population genetics inferences (e.g. Principal Components Analysis, ƒ-statistics) and genetic relatedness (READ). We compare the Prime Plus bias to that previously reported of the legacy 1240k enrichment assay. In ƒ-statistics, we find that all Prime-Plus-generated data exhibit artefactual excess shared drift, such that within-continent relationships cannot be correctly determined. The bias is more subtle in READ, though interpretation of the results can still be misleading in specific contexts. We expect the bias may affect analyses we have not yet tested. Our observations support previously reported concerns for the integration of different data types in paleogenomics. We also caution that technological solutions to generate 1240k data necessitate a thorough validation process before their adoption in the paleogenomic community.
Collapse
Affiliation(s)
- Roberta Davidson
- The Australian Centre for Ancient DNA and the Environment Institute, School of Biological Sciences, The University of Adelaide, Adelaide, South Australia, Australia
| | - Matthew P Williams
- The Australian Centre for Ancient DNA and the Environment Institute, School of Biological Sciences, The University of Adelaide, Adelaide, South Australia, Australia
- Biology Department, The Pennsylvania State University, Pennsylvania, USA
| | - Xavier Roca-Rada
- The Australian Centre for Ancient DNA and the Environment Institute, School of Biological Sciences, The University of Adelaide, Adelaide, South Australia, Australia
| | - Kalina Kassadjikova
- UCSC Paleogenomics, Department of Anthropology, University of California, California, USA
| | - Raymond Tobler
- The Australian Centre for Ancient DNA and the Environment Institute, School of Biological Sciences, The University of Adelaide, Adelaide, South Australia, Australia
- Evolution of Cultural Diversity Initiative, Australian National University, Canberra, Australia
- Centre of Excellence for Australian Biodiversity and Heritage, The University of Adelaide, Adelaide, South Australia, Australia
| | - Lars Fehren-Schmitz
- UCSC Paleogenomics, Department of Anthropology, University of California, California, USA
- UCSC Genomics Institute, University of California, California, USA
| | - Bastien Llamas
- The Australian Centre for Ancient DNA and the Environment Institute, School of Biological Sciences, The University of Adelaide, Adelaide, South Australia, Australia
- Centre of Excellence for Australian Biodiversity and Heritage, The University of Adelaide, Adelaide, South Australia, Australia
- National Centre for Indigenous Genomics, Australian National University, Canberra, Australia
- Indigenous Genomics, Telethon Kids Institute, Adelaide, South Australia, Australia
| |
Collapse
|
33
|
Heighton SP, Allio R, Murienne J, Salmona J, Meng H, Scornavacca C, Bastos ADS, Njiokou F, Pietersen DW, Tilak MK, Luo SJ, Delsuc F, Gaubert P. Pangolin Genomes Offer Key Insights and Resources for the World's Most Trafficked Wild Mammals. Mol Biol Evol 2023; 40:msad190. [PMID: 37794645 PMCID: PMC10551234 DOI: 10.1093/molbev/msad190] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/06/2023] Open
Abstract
Pangolins form a group of scaly mammals that are trafficked at record numbers for their meat and purported medicinal properties. Despite their conservation concern, knowledge of their evolution is limited by a paucity of genomic data. We aim to produce exhaustive genomic resources that include 3,238 orthologous genes and whole-genome polymorphisms to assess the evolution of all eight extant pangolin species. Robust orthologous gene-based phylogenies recovered the monophyly of the three genera and highlighted the existence of an undescribed species closely related to Southeast Asian pangolins. Signatures of middle Miocene admixture between an extinct, possibly European, lineage and the ancestor of Southeast Asian pangolins, provide new insights into the early evolutionary history of the group. Demographic trajectories and genome-wide heterozygosity estimates revealed contrasts between continental versus island populations and species lineages, suggesting that conservation planning should consider intraspecific patterns. With the expected loss of genomic diversity from recent, extensive trafficking not yet realized in pangolins, we recommend that populations be genetically surveyed to anticipate any deleterious impact of the illegal trade. Finally, we produce a complete set of genomic resources that will be integral for future conservation management and forensic endeavors for pangolins, including tracing their illegal trade. These comprise the completion of whole-genomes for pangolins through the hybrid assembly of the first reference genome for the giant pangolin (Smutsia gigantea) and new draft genomes (∼43x-77x) for four additional species, as well as a database of orthologous genes with over 3.4 million polymorphic sites.
Collapse
Affiliation(s)
- Sean P Heighton
- Laboratoire Evolution et Diversité Biologique (EDB)— IRD-UPS-CNRS, Université Toulouse III, Toulouse, France
| | - Rémi Allio
- Institut des Sciences de l'Évolution de Montpellier (ISEM), Université de Montpellier, CNRS, IRD, Montpellier, France
| | - Jérôme Murienne
- Laboratoire Evolution et Diversité Biologique (EDB)— IRD-UPS-CNRS, Université Toulouse III, Toulouse, France
| | - Jordi Salmona
- Laboratoire Evolution et Diversité Biologique (EDB)— IRD-UPS-CNRS, Université Toulouse III, Toulouse, France
| | - Hao Meng
- The State Key Laboratory of Protein and Plant Gene Research of Life Sciences, Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, China
| | - Céline Scornavacca
- Institut des Sciences de l'Évolution de Montpellier (ISEM), Université de Montpellier, CNRS, IRD, Montpellier, France
| | - Armanda D S Bastos
- Mammal Research Institute, Department of Zoology & Entomology, University of Pretoria, Pretoria, South Africa
| | - Flobert Njiokou
- Laboratoire de Parasitologie et Ecologie, Faculté des Sciences, Université de Yaoundé I, Yaoundé, Cameroon
| | - Darren W Pietersen
- Mammal Research Institute, Department of Zoology & Entomology, University of Pretoria, Pretoria, South Africa
| | - Marie-Ka Tilak
- Institut des Sciences de l'Évolution de Montpellier (ISEM), Université de Montpellier, CNRS, IRD, Montpellier, France
| | - Shu-Jin Luo
- The State Key Laboratory of Protein and Plant Gene Research of Life Sciences, Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, China
| | - Frédéric Delsuc
- Institut des Sciences de l'Évolution de Montpellier (ISEM), Université de Montpellier, CNRS, IRD, Montpellier, France
| | - Philippe Gaubert
- Laboratoire Evolution et Diversité Biologique (EDB)— IRD-UPS-CNRS, Université Toulouse III, Toulouse, France
- CIIMAR/CIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade 16 do Porto, Terminal de Cruzeiros do Porto de Leixões, Porto, Portugal
| |
Collapse
|
34
|
Thorburn DMJ, Sagonas K, Binzer-Panchal M, Chain FJJ, Feulner PGD, Bornberg-Bauer E, Reusch TBH, Samonte-Padilla IE, Milinski M, Lenz TL, Eizaguirre C. Origin matters: Using a local reference genome improves measures in population genomics. Mol Ecol Resour 2023; 23:1706-1723. [PMID: 37489282 DOI: 10.1111/1755-0998.13838] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 05/10/2023] [Accepted: 06/02/2023] [Indexed: 07/26/2023]
Abstract
Genome sequencing enables answering fundamental questions about the genetic basis of adaptation, population structure and epigenetic mechanisms. Yet, we usually need a suitable reference genome for mapping population-level resequencing data. In some model systems, multiple reference genomes are available, giving the challenging task of determining which reference genome best suits the data. Here, we compared the use of two different reference genomes for the three-spined stickleback (Gasterosteus aculeatus), one novel genome derived from a European gynogenetic individual and the published reference genome of a North American individual. Specifically, we investigated the impact of using a local reference versus one generated from a distinct lineage on several common population genomics analyses. Through mapping genome resequencing data of 60 sticklebacks from across Europe and North America, we demonstrate that genetic distance among samples and the reference genomes impacts downstream analyses. Using a local reference genome increased mapping efficiency and genotyping accuracy, effectively retaining more and better data. Despite comparable distributions of the metrics generated across the genome using SNP data (i.e. π, Tajima's D and FST ), window-based statistics using different references resulted in different outlier genes and enriched gene functions. A marker-based analysis of DNA methylation distributions had a comparably high overlap in outlier genes and functions, yet with distinct differences depending on the reference genome. Overall, our results highlight how using a local reference genome decreases reference bias to increase confidence in downstream analyses of the data. Such results have significant implications in all reference-genome-based population genomic analyses.
Collapse
Affiliation(s)
- Doko-Miles J Thorburn
- School of Biological and Chemical Sciences, Queen Mary University of London, London, UK
- Department of Life Sciences, Imperial College London, London, UK
| | - Kostas Sagonas
- School of Biological and Chemical Sciences, Queen Mary University of London, London, UK
- Department of Zoology, School of Biology, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Mahesh Binzer-Panchal
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, National Bioinformatics Infrastructure Sweden (NBIS), Uppsala University, Uppsala, Sweden
| | - Frederic J J Chain
- Department of Biological Sciences, University of Massachusetts Lowell, Lowell, Massachusetts, USA
| | - Philine G D Feulner
- Department of Fish Ecology and Evolution, Centre of Ecology, Evolution and Biogeochemistry, EAWAG Swiss Federal Institute of Aquatic Science and Technology, Kastanienbaum, Switzerland
- Division of Aquatic Ecology and Evolution, Institute of Ecology and Evolution, University of Bern, Bern, Switzerland
| | - Erich Bornberg-Bauer
- Evolutionary Bioinformatics, Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Thorsten B H Reusch
- Marine Evolutionary Ecology, GEOMAR Helmholtz Centre for Ocean Research, Kiel, Germany
| | - Irene E Samonte-Padilla
- Department of Evolutionary Ecology, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Manfred Milinski
- Department of Evolutionary Ecology, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Tobias L Lenz
- Research Group for Evolutionary Immunogenomics, Max Planck Institute for Evolutionary Biology, Plön, Germany
- Research Unit for Evolutionary Immunogenomics, Department of Biology, University of Hamburg, Hamburg, Germany
| | - Christophe Eizaguirre
- School of Biological and Chemical Sciences, Queen Mary University of London, London, UK
| |
Collapse
|
35
|
Yang C, Zhou Y, Song Y, Wu D, Zeng Y, Nie L, Liu P, Zhang S, Chen G, Xu J, Zhou H, Zhou L, Qian X, Liu C, Tan S, Zhou C, Dai W, Xu M, Qi Y, Wang X, Guo L, Fan G, Wang A, Deng Y, Zhang Y, Jin J, He Y, Guo C, Guo G, Zhou Q, Xu X, Yang H, Wang J, Xu S, Mao Y, Jin X, Ruan J, Zhang G. The complete and fully-phased diploid genome of a male Han Chinese. Cell Res 2023; 33:745-761. [PMID: 37452091 PMCID: PMC10542383 DOI: 10.1038/s41422-023-00849-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 06/29/2023] [Indexed: 07/18/2023] Open
Abstract
Since the release of the complete human genome, the priority of human genomic study has now been shifting towards closing gaps in ethnic diversity. Here, we present a fully phased and well-annotated diploid human genome from a Han Chinese male individual (CN1), in which the assemblies of both haploids achieve the telomere-to-telomere (T2T) level. Comparison of this diploid genome with the CHM13 haploid T2T genome revealed significant variations in the centromere. Outside the centromere, we discovered 11,413 structural variations, including numerous novel ones. We also detected thousands of CN1 alleles that have accumulated high substitution rates and a few that have been under positive selection in the East Asian population. Further, we found that CN1 outperforms CHM13 as a reference genome in mapping and variant calling for the East Asian population owing to the distinct structural variants of the two references. Comparison of SNP calling for a large cohort of 8869 Chinese genomes using CN1 and CHM13 as reference respectively showed that the reference bias profoundly impacts rare SNP calling, with nearly 2 million rare SNPs miss-called with different reference genomes. Finally, applying the CN1 as a reference, we discovered 5.80 Mb and 4.21 Mb putative introgression sequences from Neanderthal and Denisovan, respectively, including many East Asian specific ones undetected using CHM13 as the reference. Our analyses reveal the advances of using CN1 as a reference for population genomic studies and paleo-genomic studies. This complete genome will serve as an alternative reference for future genomic studies on the East Asian population.
Collapse
Affiliation(s)
- Chentao Yang
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Yang Zhou
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI Research-Wuhan, BGI, Wuhan, Hubei, China
| | - Yanni Song
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Dongya Wu
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Institute of Crop Science & Institute of Bioinformatics, Zhejiang University, Hangzhou, Zhejiang, China
| | - Yan Zeng
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Lei Nie
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Shilong Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Guangji Chen
- BGI-Shenzhen, Shenzhen, Guangdong, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Jinjin Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Hongling Zhou
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Long Zhou
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Innovation Center of Yangtze River Delta, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xiaobo Qian
- BGI-Shenzhen, Shenzhen, Guangdong, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Chenlu Liu
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | | | | | - Wei Dai
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Mengyang Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Yanwei Qi
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Xiaobo Wang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Lidong Guo
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Guangyi Fan
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Aijun Wang
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Yuan Deng
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Yong Zhang
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Yunqiu He
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Chunxue Guo
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI-Hangzhou, Hangzhou, Zhejiang, China
| | - Guoji Guo
- School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Qing Zhou
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xun Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Jian Wang
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Shuhua Xu
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
- Human Phenome Institute, Zhangjiang Fudan International Innovation Center, and Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai, China
- Jiangsu Key Laboratory of Phylogenomics & Comparative Genomics, International Joint Center of Genomics of Jiangsu Province School of Life Sciences, Jiangsu Normal University, Xuzhou, Jiangsu, China
- Department of Liver Surgery and Transplantation Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Xin Jin
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China.
| | - Guojie Zhang
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China.
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China.
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China.
- Innovation Center of Yangtze River Delta, Zhejiang University, Hangzhou, Zhejiang, China.
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.
| |
Collapse
|
36
|
Louis M, Korlević P, Nykänen M, Archer F, Berrow S, Brownlow A, Lorenzen ED, O'Brien J, Post K, Racimo F, Rogan E, Rosel PE, Sinding MHS, van der Es H, Wales N, Fontaine MC, Gaggiotti OE, Foote AD. Ancient dolphin genomes reveal rapid repeated adaptation to coastal waters. Nat Commun 2023; 14:4020. [PMID: 37463880 DOI: 10.1038/s41467-023-39532-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 06/16/2023] [Indexed: 07/20/2023] Open
Abstract
Parallel evolution provides strong evidence of adaptation by natural selection due to local environmental variation. Yet, the chronology, and mode of the process of parallel evolution remains debated. Here, we harness the temporal resolution of paleogenomics to address these long-standing questions, by comparing genomes originating from the mid-Holocene (8610-5626 years before present, BP) to contemporary pairs of coastal-pelagic ecotypes of bottlenose dolphin. We find that the affinity of ancient samples to coastal populations increases as the age of the samples decreases. We assess the youngest genome (5626 years BP) at sites previously inferred to be under parallel selection to coastal habitats and find it contained coastal-associated genotypes. Thus, coastal-associated variants rose to detectable frequencies close to the emergence of coastal habitat. Admixture graph analyses reveal a reticulate evolutionary history between pelagic and coastal populations, sharing standing genetic variation that facilitated rapid adaptation to newly emerged coastal habitats.
Collapse
Affiliation(s)
- Marie Louis
- Centre for Biological Diversity, Sir Harold Mitchell Building and Dyers Brae, University of St Andrews, St Andrews, KY16 9TH, Scotland, UK.
- Globe Institute, University of Copenhagen, Øster Voldgade 5-7, 1350, Copenhagen K, Denmark.
- Groningen Institute for Evolutionary Life Sciences (GELIFES), University of Groningen, PO Box 11103 CC, Groningen, The Netherlands.
- Greenland Institute of Natural Resources, Kivioq 2, Nuuk, 3900, Greenland.
| | - Petra Korlević
- Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103, Leipzig, Germany
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Milaja Nykänen
- Department of Environmental and Biological Sciences, PO Box 111, FI-80101, Joensuu, Finland
- School of Biological, Earth and Environmental Sciences, University College Cork, North Mall, Cork, Ireland
| | - Frederick Archer
- Marine Mammal and Turtle Division, Southwest Fisheries Science Center, NOAA, 8901 La Jolla Shores Drive, La Jolla, CA, 92037, USA
| | - Simon Berrow
- Irish Whale and Dolphin Group, Kilrush, Co Clare, Ireland
- Marine and Freshwater Research Centre, Department of Natural Sciences, School of Science and Computing, Atlantic Technological University, Dublin Road, H91 T8NW, Galway, Ireland
| | - Andrew Brownlow
- Scottish Marine Animal Stranding Scheme, Institute of Biodiversity, Animal Health & Comparative Medicine College of Medical, Veterinary & Life Sciences, University of Glasgow, Glasgow, UK
| | - Eline D Lorenzen
- Globe Institute, University of Copenhagen, Øster Voldgade 5-7, 1350, Copenhagen K, Denmark
| | - Joanne O'Brien
- Irish Whale and Dolphin Group, Kilrush, Co Clare, Ireland
- Marine and Freshwater Research Centre, Department of Natural Sciences, School of Science and Computing, Atlantic Technological University, Dublin Road, H91 T8NW, Galway, Ireland
| | - Klaas Post
- Natural History Museum Rotterdam, Westzeedijk 345, 3015 AA, Rotterdam, Netherlands
| | - Fernando Racimo
- Globe Institute, University of Copenhagen, Øster Voldgade 5-7, 1350, Copenhagen K, Denmark
| | - Emer Rogan
- School of Biological, Earth and Environmental Sciences, University College Cork, North Mall, Cork, Ireland
| | - Patricia E Rosel
- Marine Mammal and Turtle Division, Southeast Fisheries Science Center, NOAA, 646 Cajundome Boulevard, Lafayette, LA, 70506, USA
| | - Mikkel-Holger S Sinding
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200, Copenhagen, Denmark
| | - Henry van der Es
- Natural History Museum Rotterdam, Westzeedijk 345, 3015 AA, Rotterdam, Netherlands
| | - Nathan Wales
- University of York, BioArCh, Environment Building, Wentworth Way, Heslington, York, YO10 5DD, UK
| | - Michael C Fontaine
- Groningen Institute for Evolutionary Life Sciences (GELIFES), University of Groningen, PO Box 11103 CC, Groningen, The Netherlands
- MIVEGEC (Université de Montpellier, CNRS 5290, IRD 229) Institut de Recherche pour le Développement (IRD), F-34394, Montpellier, France
| | - Oscar E Gaggiotti
- Centre for Biological Diversity, Sir Harold Mitchell Building and Dyers Brae, University of St Andrews, St Andrews, KY16 9TH, Scotland, UK
| | - Andrew D Foote
- Department of Natural History, Norwegian University of Science and Technology (NTNU), NO-7491, Trondheim, Norway.
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, 0316, Oslo, Norway.
| |
Collapse
|
37
|
Sousa da Mota B, Rubinacci S, Cruz Dávalos DI, G Amorim CE, Sikora M, Johannsen NN, Szmyt MH, Włodarczak P, Szczepanek A, Przybyła MM, Schroeder H, Allentoft ME, Willerslev E, Malaspinas AS, Delaneau O. Imputation of ancient human genomes. Nat Commun 2023; 14:3660. [PMID: 37339987 PMCID: PMC10282092 DOI: 10.1038/s41467-023-39202-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 06/02/2023] [Indexed: 06/22/2023] Open
Abstract
Due to postmortem DNA degradation and microbial colonization, most ancient genomes have low depth of coverage, hindering genotype calling. Genotype imputation can improve genotyping accuracy for low-coverage genomes. However, it is unknown how accurate ancient DNA imputation is and whether imputation introduces bias to downstream analyses. Here we re-sequence an ancient trio (mother, father, son) and downsample and impute a total of 43 ancient genomes, including 42 high-coverage (above 10x) genomes. We assess imputation accuracy across ancestries, time, depth of coverage, and sequencing technology. We find that ancient and modern DNA imputation accuracies are comparable. When downsampled at 1x, 36 of the 42 genomes are imputed with low error rates (below 5%) while African genomes have higher error rates. We validate imputation and phasing results using the ancient trio data and an orthogonal approach based on Mendel's rules of inheritance. We further compare the downstream analysis results between imputed and high-coverage genomes, notably principal component analysis, genetic clustering, and runs of homozygosity, observing similar results starting from 0.5x coverage, except for the African genomes. These results suggest that, for most populations and depths of coverage as low as 0.5x, imputation is a reliable method that can improve ancient DNA studies.
Collapse
Affiliation(s)
- Bárbara Sousa da Mota
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Simone Rubinacci
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Diana Ivette Cruz Dávalos
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | | | - Martin Sikora
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Niels N Johannsen
- Department of Archaeology and Heritage Studies, Aarhus University, Aarhus, Denmark
| | - Marzena H Szmyt
- Institute for Eastern Research, Adam Mickiewicz University in Poznań, Poznań, Poland
| | - Piotr Włodarczak
- Institute of Archaeology and Ethnology, Polish Academy of Sciences, Kraków, Poland
| | - Anita Szczepanek
- Institute of Archaeology and Ethnology, Polish Academy of Sciences, Kraków, Poland
- Department of Anatomy, Jagiellonian University, Medical College, Kraków, Poland
| | | | - Hannes Schroeder
- The Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Morten E Allentoft
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Trace and Environmental DNA (TrEnD) Laboratory, School of Molecular and Life Science, Curtin University, Bentley, WA, Australia
| | - Eske Willerslev
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- GeoGenetics Group, Department of Zoology, University of Cambridge, Cambridge, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
- MARUM, University of Bremen, Bremen, Germany
| | - Anna-Sapfo Malaspinas
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland.
| | - Olivier Delaneau
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland.
| |
Collapse
|
38
|
Vynck M, Nollet F, Sibbens L, Devos H. Bias reduction improves accuracy and informativity of high-throughput sequencing chimerism assays. Clin Chim Acta 2023:117452. [PMID: 37343694 DOI: 10.1016/j.cca.2023.117452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 05/22/2023] [Accepted: 06/16/2023] [Indexed: 06/23/2023]
Abstract
BACKGROUND AND AIMS Chimerism monitoring by means of high-throughput sequencing of biallelic polymorphisms has shown promising advantages for patient follow-up after hematopoietic stem cell transplantation. Yet, the presence of method bias precludes achievement of an assay's theoretically attainable informativity rate, as method bias necessitates the exclusion of some markers. This method bias arises because of preferential observation of one allele over the other, and for some allelic constellations because of stochasticity. RESULTS This paper suggests how preferential allelic observation may lead to method bias, and when and why such bias necessitates the exclusion of markers. It is shown that also markers that remain informative suffer a reduction in trueness and precision due to method bias. A bias reduction approach in the data analysis phase is introduced and shown to improve trueness and precision under all circumstances, meriting its universal adoption. This bias reduction furthermore allows to achieve an assay's theoretically achievable informativity rate, though at the cost of reduced sensitivity. Several strategies to consider in the assay design phase that may lower biases are proposed. CONCLUSION Improved design and data analysis of chimerism assays increase the accuracy, applicability, and cost-effectiveness of high-throughput sequencing chimerism assays.
Collapse
Affiliation(s)
- Matthijs Vynck
- Department of Laboratory Medicine, AZ Sint-Jan Brugge-Oostende AV, Ruddershove 10, Bruges, Belgium; Department of Morphology, Imaging, Orthopedics, Rehabilitation and Nutrition, Ghent University, Merelbeke, Belgium.
| | - Friedel Nollet
- Department of Laboratory Medicine, AZ Sint-Jan Brugge-Oostende AV, Ruddershove 10, Bruges, Belgium
| | - Lode Sibbens
- Department of Laboratory Medicine, AZ Sint-Jan Brugge-Oostende AV, Ruddershove 10, Bruges, Belgium
| | - Helena Devos
- Department of Laboratory Medicine, AZ Sint-Jan Brugge-Oostende AV, Ruddershove 10, Bruges, Belgium
| |
Collapse
|
39
|
Kuderna LFK, Gao H, Janiak MC, Kuhlwilm M, Orkin JD, Bataillon T, Manu S, Valenzuela A, Bergman J, Rousselle M, Silva FE, Agueda L, Blanc J, Gut M, de Vries D, Goodhead I, Harris RA, Raveendran M, Jensen A, Chuma IS, Horvath JE, Hvilsom C, Juan D, Frandsen P, Schraiber JG, de Melo FR, Bertuol F, Byrne H, Sampaio I, Farias I, Valsecchi J, Messias M, da Silva MNF, Trivedi M, Rossi R, Hrbek T, Andriaholinirina N, Rabarivola CJ, Zaramody A, Jolly CJ, Phillips-Conroy J, Wilkerson G, Abee C, Simmons JH, Fernandez-Duque E, Kanthaswamy S, Shiferaw F, Wu D, Zhou L, Shao Y, Zhang G, Keyyu JD, Knauf S, Le MD, Lizano E, Merker S, Navarro A, Nadler T, Khor CC, Lee J, Tan P, Lim WK, Kitchener AC, Zinner D, Gut I, Melin AD, Guschanski K, Schierup MH, Beck RMD, Umapathy G, Roos C, Boubli JP, Rogers J, Farh KKH, Marques Bonet T. A global catalog of whole-genome diversity from 233 primate species. Science 2023; 380:906-913. [PMID: 37262161 DOI: 10.1126/science.abn7829] [Citation(s) in RCA: 44] [Impact Index Per Article: 44.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Accepted: 02/06/2023] [Indexed: 06/03/2023]
Abstract
The rich diversity of morphology and behavior displayed across primate species provides an informative context in which to study the impact of genomic diversity on fundamental biological processes. Analysis of that diversity provides insight into long-standing questions in evolutionary and conservation biology and is urgent given severe threats these species are facing. Here, we present high-coverage whole-genome data from 233 primate species representing 86% of genera and all 16 families. This dataset was used, together with fossil calibration, to create a nuclear DNA phylogeny and to reassess evolutionary divergence times among primate clades. We found within-species genetic diversity across families and geographic regions to be associated with climate and sociality, but not with extinction risk. Furthermore, mutation rates differ across species, potentially influenced by effective population sizes. Lastly, we identified extensive recurrence of missense mutations previously thought to be human specific. This study will open a wide range of research avenues for future primate genomic research.
Collapse
Affiliation(s)
- Lukas F K Kuderna
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra. PRBB, C. Doctor Aiguader N88, 08003 Barcelona, Spain
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA 94404, USA
| | - Hong Gao
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA 94404, USA
| | - Mareike C Janiak
- School of Science, Engineering & Environment, University of Salford, Salford M5 4WT, UK
| | - Martin Kuhlwilm
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra. PRBB, C. Doctor Aiguader N88, 08003 Barcelona, Spain
- Department of Evolutionary Anthropology, University of Vienna, Djerassiplatz 1, 1030 Vienna, Austria
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Austria
| | - Joseph D Orkin
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra. PRBB, C. Doctor Aiguader N88, 08003 Barcelona, Spain
- Département d'anthropologie, Université de Montréal, 3150 Jean-Brillant, Montréal, QC H3T 1N8, Canada
| | - Thomas Bataillon
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Shivakumara Manu
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology, Hyderabad 500007, India
| | - Alejandro Valenzuela
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra. PRBB, C. Doctor Aiguader N88, 08003 Barcelona, Spain
| | - Juraj Bergman
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
- Section for Ecoinformatics and Biodiversity, Department of Biology, Aarhus University, Aarhus, Denmark
| | | | - Felipe Ennes Silva
- Research Group on Primate Biology and Conservation, Mamirauá Institute for Sustainable Development, Estrada da Bexiga 2584, CEP 69553-225, Tefé, Amazonas, Brazil
- Evolutionary Biology and Ecology (EBE), Département de Biologie des Organismes, Université libre de Bruxelles (ULB), Av. Franklin D. Roosevelt 50, CP 160/12, B-1050 Brussels Belgium
| | - Lidia Agueda
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri I Reixac 4, 08028 Barcelona, Spain
| | - Julie Blanc
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri I Reixac 4, 08028 Barcelona, Spain
| | - Marta Gut
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri I Reixac 4, 08028 Barcelona, Spain
| | - Dorien de Vries
- School of Science, Engineering & Environment, University of Salford, Salford M5 4WT, UK
| | - Ian Goodhead
- School of Science, Engineering & Environment, University of Salford, Salford M5 4WT, UK
| | - R Alan Harris
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Muthuswamy Raveendran
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Axel Jensen
- Department of Ecology and Genetics, Animal Ecology, Uppsala University, SE-75236 Uppsala, Sweden
| | | | - Julie E Horvath
- North Carolina Museum of Natural Sciences, Raleigh, NC 27601, USA
- Department of Biological and Biomedical Sciences, North Carolina Central University, Durham, NC 27707, USA
- Department of Biological Sciences, North Carolina State University, Raleigh, NC 27695, USA
- Department of Evolutionary Anthropology, Duke University, Durham, NC 27708, USA
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | | | - David Juan
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra. PRBB, C. Doctor Aiguader N88, 08003 Barcelona, Spain
| | | | - Joshua G Schraiber
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA 94404, USA
| | | | - Fabrício Bertuol
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL), Manaus, Amazonas 69080-900, Brazil
| | - Hazel Byrne
- Department of Anthropology, University of Utah, Salt Lake City. UT 84102, USA
| | | | - Izeni Farias
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL), Manaus, Amazonas 69080-900, Brazil
| | - João Valsecchi
- Research Group on Terrestrial Vertebrate Ecology, Mamirauá Institute for Sustainable Development, Tefé, Amazonas, Brazil
- Rede de Pesquisa para Estudos sobre Diversidade, Conservação e Uso da Fauna na Amazônia - RedeFauna, Manaus, Amazonas, Brazil
- Comunidad de Manejo de Fauna Silvestre en la Amazonía y en Latinoamérica - ComFauna, Iquitos, Loreto, Peru
| | - Malu Messias
- Universidade Federal de Rondônia, Porto Velho, Rondônia, Brazil
| | | | - Mihir Trivedi
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology, Hyderabad 500007, India
| | - Rogerio Rossi
- Instituto de Biociências, Universidade Federal do Mato Grosso, Cuiabá, MT, Brazil
| | - Tomas Hrbek
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL), Manaus, Amazonas 69080-900, Brazil
- Department of Biology, Trinity University, San Antonio, TX 78212, USA
| | - Nicole Andriaholinirina
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga, Mahajanga, Madagascar
| | - Clément J Rabarivola
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga, Mahajanga, Madagascar
| | - Alphonse Zaramody
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga, Mahajanga, Madagascar
| | - Clifford J Jolly
- Department of Anthropology, New York University, New York, NY 10003, USA
| | - Jane Phillips-Conroy
- Department of Neuroscience, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA
| | - Gregory Wilkerson
- Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center, Bastrop TX 78602, USA
| | - Christian Abee
- Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center, Bastrop TX 78602, USA
| | - Joe H Simmons
- Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center, Bastrop TX 78602, USA
| | | | - Sree Kanthaswamy
- School of Mathematical and Natural Sciences, Arizona State University, Phoenix, AZ 85004, USA
| | - Fekadu Shiferaw
- Guinea Worm Eradication Program, The Carter Center Ethiopia, Addis Ababa, Ethiopia
| | - Dongdong Wu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
| | - Long Zhou
- Center for Evolutionary and Organismal Biology, Zhejiang University School of Medicine, Hangzhou 310058, China
| | - Yong Shao
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
| | - Guojie Zhang
- Center for Evolutionary and Organismal Biology, Zhejiang University School of Medicine, Hangzhou 310058, China
- Villum Centre for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, DK-2100 Copenhagen, Denmark
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
- Liangzhu Laboratory, Zhejiang University Medical Center, 1369 West Wenyi Road, Hangzhou 311121, China
- Women's Hospital, School of Medicine, Zhejiang University, 1 Xueshi Road, Shangcheng District, Hangzhou 310006, China
| | - Julius D Keyyu
- Tanzania Wildlife Research Institute (TAWIRI), Head Office, P.O. Box 661, Arusha, Tanzania
| | - Sascha Knauf
- Institute of International Animal Health/One Health, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, 17493 Greifswald-Insel Riems, Germany
| | - Minh D Le
- Department of Environmental Ecology, Faculty of Environmental Sciences, University of Science and Central Institute for Natural Resources and Environmental Studies, Vietnam National University, Hanoi, Vietnam
| | - Esther Lizano
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra. PRBB, C. Doctor Aiguader N88, 08003 Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Stefan Merker
- Department of Zoology, State Museum of Natural History Stuttgart, Stuttgart, Germany
| | - Arcadi Navarro
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra. PRBB, C. Doctor Aiguader N88, 08003 Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA) and Universitat Pompeu Fabra. Pg. Luís Companys 23, 08010 Barcelona, Spain
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Av. Doctor Aiguader, N88, 08003 Barcelona, Spain
- BarcelonaBeta Brain Research Center, Pasqual Maragall Foundation, C. Wellington 30, 08005 Barcelona, Spain
| | - Tilo Nadler
- Cuc Phuong Commune, Nho Quan District, Ninh Binh Province, Vietnam
| | - Chiea Chuen Khor
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore
| | - Jessica Lee
- Mandai Nature, 80 Mandai Lake Road, Singapore
| | - Patrick Tan
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore
- SingHealth Duke-NUS Institute of Precision Medicine (PRISM), Singapore
- Cancer and Stem Cell Biology Program, Duke-NUS Medical School, Singapore
| | - Weng Khong Lim
- SingHealth Duke-NUS Institute of Precision Medicine (PRISM), Singapore
- Cancer and Stem Cell Biology Program, Duke-NUS Medical School, Singapore
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore
| | - Andrew C Kitchener
- Department of Natural Sciences, National Museums Scotland, Chambers Street, Edinburgh EH1 1JF, UK, and School of Geosciences, Drummond Street, Edinburgh EH8 9XP, UK
| | - Dietmar Zinner
- Cognitive Ethology Laboratory, Germany Primate Center, Leibniz Institute for Primate Research, 37077 Göttingen, Germany
- Department of Primate Cognition, Georg-August-Universität Göttingen, 37077 Göttingen, Germany
- Leibniz ScienceCampus Primate Cognition, 37077 Göttingen, Germany
| | - Ivo Gut
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri I Reixac 4, 08028 Barcelona, Spain
| | - Amanda D Melin
- Department of Anthropology and Archaeology, University of Calgary, 2500 University Dr NW, Calgary, AB T2N 1N4, Canada
- Department of Medical Genetics, University of Calgary, 3330 Hospital Drive NW, HMRB 202, Calgary, AB T2N 4N1, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, 3330 Hospital Drive NW, HMRB 202, Calgary, AB T2N 4N1, Canada
| | - Katerina Guschanski
- Department of Ecology and Genetics, Animal Ecology, Uppsala University, SE-75236 Uppsala, Sweden
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| | | | - Robin M D Beck
- School of Science, Engineering & Environment, University of Salford, Salford M5 4WT, UK
| | - Govindhaswamy Umapathy
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology, Hyderabad 500007, India
| | - Christian Roos
- Gene Bank of Primates and Primate Genetics Laboratory, German Primate Center, Leibniz Institute for Primate Research, Kellnerweg 4, 37077 Göttingen, Germany
| | - Jean P Boubli
- School of Science, Engineering & Environment, University of Salford, Salford M5 4WT, UK
| | - Jeffrey Rogers
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Kyle Kai-How Farh
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA 94404, USA
| | - Tomas Marques Bonet
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra. PRBB, C. Doctor Aiguader N88, 08003 Barcelona, Spain
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri I Reixac 4, 08028 Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA) and Universitat Pompeu Fabra. Pg. Luís Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
40
|
Mun T, Vaddadi NSK, Langmead B. Pangenomic genotyping with the marker array. Algorithms Mol Biol 2023; 18:2. [PMID: 37147657 PMCID: PMC10161648 DOI: 10.1186/s13015-023-00225-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 04/22/2023] [Indexed: 05/07/2023] Open
Abstract
We present a new method and software tool called rowbowt that applies a pangenome index to the problem of inferring genotypes from short-read sequencing data. The method uses a novel indexing structure called the marker array. Using the marker array, we can genotype variants with respect from large panels like the 1000 Genomes Project while reducing the reference bias that results when aligning to a single linear reference. rowbowt can infer accurate genotypes in less time and memory compared to existing graph-based methods. The method is implemented in the open source software tool rowbowt available at https://github.com/alshai/rowbowt .
Collapse
Affiliation(s)
- Taher Mun
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | | | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
41
|
Mallick S, Micco A, Mah M, Ringbauer H, Lazaridis I, Olalde I, Patterson N, Reich D. The Allen Ancient DNA Resource (AADR): A curated compendium of ancient human genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.06.535797. [PMID: 37066305 PMCID: PMC10104067 DOI: 10.1101/2023.04.06.535797] [Citation(s) in RCA: 33] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
More than two hundred papers have reported genome-wide data from ancient humans. While the raw data for the vast majority are fully publicly available testifying to the commitment of the paleogenomics community to open data, formats for both raw data and meta-data differ. There is thus a need for uniform curation and a centralized, version-controlled compendium that researchers can download, analyze, and reference. Since 2019, we have been maintaining the Allen Ancient DNA Resource (AADR), which aims to provide an up-to-date, curated version of the world's published ancient human DNA data, represented at more than a million single nucleotide polymorphisms (SNPs) at which almost all ancient individuals have been assayed. The AADR has gone through six public releases since it first was made available and crossed the threshold of >10,000 ancient individuals with genome-wide data at the end of 2022. This note is intended as a citable description of the AADR.
Collapse
Affiliation(s)
- Swapan Mallick
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Howard Hughes Medical Institute, Boston, MA 02115, USA
| | - Adam Micco
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Howard Hughes Medical Institute, Boston, MA 02115, USA
| | - Matthew Mah
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Howard Hughes Medical Institute, Boston, MA 02115, USA
| | - Harald Ringbauer
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Max Planck Institute for Evolutionary Anthropology, Leipzig 04103, Germany
| | - Iosif Lazaridis
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Iñigo Olalde
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- BIOMICs Research Group, University of the Basque Country, 01006 Vitoria-Gasteiz, Spain
| | - Nick Patterson
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - David Reich
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Howard Hughes Medical Institute, Boston, MA 02115, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
42
|
Lorig-Roach R, Meredith M, Monlong J, Jain M, Olsen H, McNulty B, Porubsky D, Montague T, Lucas J, Condon C, Eizenga J, Juul S, McKenzie S, Simmonds SE, Park J, Asri M, Koren S, Eichler E, Axel R, Martin B, Carnevali P, Miga K, Paten B. Phased nanopore assembly with Shasta and modular graph phasing with GFAse. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.21.529152. [PMID: 36865218 PMCID: PMC9980101 DOI: 10.1101/2023.02.21.529152] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/30/2023]
Abstract
As a step towards simplifying and reducing the cost of haplotype resolved de novo assembly, we describe new methods for accurately phasing nanopore data with the Shasta genome assembler and a modular tool for extending phasing to the chromosome scale called GFAse. We test using new variants of Oxford Nanopore Technologies' (ONT) PromethION sequencing, including those using proximity ligation and show that newer, higher accuracy ONT reads substantially improve assembly quality.
Collapse
Affiliation(s)
- Ryan Lorig-Roach
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Melissa Meredith
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Jean Monlong
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Miten Jain
- Department of Bioengineering, Department of Physics, Northeastern University, Boston, MA, USA
| | - Hugh Olsen
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Brandy McNulty
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Tessa Montague
- The Mortimer B. Zuckerman Mind Brain Behavior Institute, Department of Neuroscience, Columbia University, New York, NY, USA & Howard Hughes Medical Institute, Columbia University, New York, NY, USA
| | - Julian Lucas
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Chris Condon
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Jordan Eizenga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | - Jimin Park
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome & Research Institute, National Institutes of Health, Bethesda, MD USA
| | - Evan Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA & Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Richard Axel
- The Mortimer B. Zuckerman Mind Brain Behavior Institute, Department of Neuroscience, Columbia University, New York, NY, USA & Howard Hughes Medical Institute, Columbia University, New York, NY, USA
| | - Bruce Martin
- Chan Zuckerberg Initiative Foundation, Redwood City, CA, USA
| | - Paolo Carnevali
- Chan Zuckerberg Initiative Foundation, Redwood City, CA, USA
| | - Karen Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| |
Collapse
|
43
|
Zhao C, Shi ZJ, Pollard KS. Pitfalls of genotyping microbial communities with rapidly growing genome collections. Cell Syst 2023; 14:160-176.e3. [PMID: 36657438 PMCID: PMC9957970 DOI: 10.1016/j.cels.2022.12.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 10/15/2022] [Accepted: 12/19/2022] [Indexed: 01/20/2023]
Abstract
Detecting genetic variants in metagenomic data is a priority for understanding the evolution, ecology, and functional characteristics of microbial communities. Many tools that perform this metagenotyping rely on aligning reads of unknown origin to a database of sequences from many species before calling variants. In this synthesis, we investigate how databases of increasingly diverse and closely related species have pushed the limits of current alignment algorithms, thereby degrading the performance of metagenotyping tools. We identify multi-mapping reads as a prevalent source of errors and illustrate a trade-off between retaining correct alignments versus limiting incorrect alignments, many of which map reads to the wrong species. Then we evaluate several actionable mitigation strategies and review emerging methods showing promise to further improve metagenotyping in response to the rapid growth in genome collections. Our results have implications beyond metagenotyping to the many tools in microbial genomics that depend upon accurate read mapping.
Collapse
Affiliation(s)
- Chunyu Zhao
- Chan Zuckerberg Biohub, San Francisco, CA, USA; Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
| | - Zhou Jason Shi
- Chan Zuckerberg Biohub, San Francisco, CA, USA; Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
| | - Katherine S Pollard
- Chan Zuckerberg Biohub, San Francisco, CA, USA; Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA; Department of Epidemiology & Biostatistics, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
44
|
Deng X, Frandsen PB, Dikow RB, Favre A, Shah DN, Shah RDT, Schneider JV, Heckenhauer J, Pauls SU. The impact of sequencing depth and relatedness of the reference genome in population genomic studies: A case study with two caddisfly species (Trichoptera, Rhyacophilidae, Himalopsyche). Ecol Evol 2022; 12:e9583. [PMID: 36523526 PMCID: PMC9745013 DOI: 10.1002/ece3.9583] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 11/10/2022] [Accepted: 11/16/2022] [Indexed: 12/15/2022] Open
Abstract
Whole genome sequencing for generating SNP data is increasingly used in population genetic studies. However, obtaining genomes for massive numbers of samples is still not within the budgets of many researchers. It is thus imperative to select an appropriate reference genome and sequencing depth to ensure the accuracy of the results for a specific research question, while balancing cost and feasibility. To evaluate the effect of the choice of the reference genome and sequencing depth on downstream analyses, we used five confamilial reference genomes of variable relatedness and three levels of sequencing depth (3.5×, 7.5× and 12×) in a population genomic study on two caddisfly species: Himalopsyche digitata and H. tibetana. Using these 30 datasets (five reference genomes × three depths × two target species), we estimated population genetic indices (inbreeding coefficient, nucleotide diversity, pairwise F ST, and genome-wide distribution of F ST) based on variants and population structure (PCA and admixture) based on genotype likelihood estimates. The results showed that both distantly related reference genomes and lower sequencing depth lead to degradation of resolution. In addition, choosing a more closely related reference genome may significantly remedy the defects caused by low depth. Therefore, we conclude that population genetic studies would benefit from closely related reference genomes, especially as the costs of obtaining a high-quality reference genome continue to decrease. However, to determine a cost-efficient strategy for a specific population genomic study, a trade-off between reference genome relatedness and sequencing depth can be considered.
Collapse
Affiliation(s)
- Xi‐Ling Deng
- Senckenberg Research Institute and Natural History MuseumFrankfurt/MainGermany
- Institute of Insect BiotechnologyJustus‐Liebig‐University GießenGießenGermany
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE‐TBG)Frankfurt/MainGermany
| | - Paul B. Frandsen
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE‐TBG)Frankfurt/MainGermany
- Department of Plant & Wildlife SciencesBrigham Young UniversityProvoUtahUSA
- Data Science Lab, Office of the Chief Information OfficerSmithsonian InstitutionWashingtonDCUSA
| | - Rebecca B. Dikow
- Data Science Lab, Office of the Chief Information OfficerSmithsonian InstitutionWashingtonDCUSA
| | - Adrien Favre
- Senckenberg Research Institute and Natural History MuseumFrankfurt/MainGermany
- Regional Nature Park of the Trient ValleySalvanSwitzerland
| | - Deep Narayan Shah
- Central Department of Environmental ScienceTribhuvan UniversityKirtipurNepal
| | - Ram Devi Tachamo Shah
- Aquatic Ecology Centre, School of ScienceKathmandu UniversityDhulikhelNepal
- Department of Life SciencesSchool of Science, Kathmandu UniversityDhulikhelNepal
| | - Julio V. Schneider
- Senckenberg Research Institute and Natural History MuseumFrankfurt/MainGermany
| | - Jacqueline Heckenhauer
- Senckenberg Research Institute and Natural History MuseumFrankfurt/MainGermany
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE‐TBG)Frankfurt/MainGermany
| | - Steffen U. Pauls
- Senckenberg Research Institute and Natural History MuseumFrankfurt/MainGermany
- Institute of Insect BiotechnologyJustus‐Liebig‐University GießenGießenGermany
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE‐TBG)Frankfurt/MainGermany
| |
Collapse
|
45
|
Molo MS, White JB, Cornish V, Gell RM, Baars O, Singh R, Carbone MA, Isakeit T, Wise KA, Woloshuk CP, Bluhm BH, Horn BW, Heiniger RW, Carbone I. Asymmetrical lineage introgression and recombination in populations of Aspergillus flavus: Implications for biological control. PLoS One 2022; 17:e0276556. [PMID: 36301851 PMCID: PMC9620740 DOI: 10.1371/journal.pone.0276556] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 10/08/2022] [Indexed: 11/23/2022] Open
Abstract
Aspergillus flavus is an agriculturally important fungus that causes ear rot of maize and produces aflatoxins, of which B1 is the most carcinogenic naturally-produced compound. In the US, the management of aflatoxins includes the deployment of biological control agents that comprise two nonaflatoxigenic A. flavus strains, either Afla-Guard (member of lineage IB) or AF36 (lineage IC). We used genotyping-by-sequencing to examine the influence of both biocontrol agents on native populations of A. flavus in cornfields in Texas, North Carolina, Arkansas, and Indiana. This study examined up to 27,529 single-nucleotide polymorphisms (SNPs) in a total of 815 A. flavus isolates, and 353 genome-wide haplotypes sampled before biocontrol application, three months after biocontrol application, and up to three years after initial application. Here, we report that the two distinct A. flavus evolutionary lineages IB and IC differ significantly in their frequency distributions across states. We provide evidence of increased unidirectional gene flow from lineage IB into IC, inferred to be due to the applied Afla-Guard biocontrol strain. Genetic exchange and recombination of biocontrol strains with native strains was detected in as little as three months after biocontrol application and up to one and three years later. There was limited inter-lineage migration in the untreated fields. These findings suggest that biocontrol products that include strains from lineage IB offer the greatest potential for sustained reductions in aflatoxin levels over several years. This knowledge has important implications for developing new biocontrol strategies.
Collapse
Affiliation(s)
- Megan S. Molo
- Department of Entomology and Plant Pathology, Center for Integrated
Fungal Research, North Carolina State University, Raleigh, NC, United States of
America
| | - James B. White
- Department of Entomology and Plant Pathology, Center for Integrated
Fungal Research, North Carolina State University, Raleigh, NC, United States of
America
| | - Vicki Cornish
- Department of Entomology and Plant Pathology, Center for Integrated
Fungal Research, North Carolina State University, Raleigh, NC, United States of
America
| | - Richard M. Gell
- Department of Entomology and Plant Pathology, Center for Integrated
Fungal Research, North Carolina State University, Raleigh, NC, United States of
America
- Program of Genetics, North Carolina State University, Raleigh, North
Carolina, United States of America
| | - Oliver Baars
- Department of Entomology and Plant Pathology, Center for Integrated
Fungal Research, North Carolina State University, Raleigh, NC, United States of
America
| | - Rakhi Singh
- Department of Entomology and Plant Pathology, Center for Integrated
Fungal Research, North Carolina State University, Raleigh, NC, United States of
America
| | - Mary Anna Carbone
- Center for Integrated Fungal Research and Department of Plant and
Microbial Biology, North Carolina State University, Raleigh, NC, United States
of America
| | - Thomas Isakeit
- Department of Plant Pathology and Microbiology, Texas AgriLife Extension
Service, Texas A&M University, College Station, TX, United States of
America
| | - Kiersten A. Wise
- Department of Plant Pathology, University of Kentucky, Princeton, KY,
United States of America
| | - Charles P. Woloshuk
- Department of Plant Pathology and Botany, Purdue University, West
Lafayette, IN, United States of America
| | - Burton H. Bluhm
- University of Arkansas Division of Agriculture, Department of Entomology
and Plant Pathology, Fayetteville, AR, United States of
America
| | - Bruce W. Horn
- United States Department of Agriculture, Agriculture Research Service,
Dawson, GA, United States of America
| | - Ron W. Heiniger
- Department of Crop and Soil Sciences, North Carolina State University,
Raleigh, NC, United States of America
| | - Ignazio Carbone
- Department of Entomology and Plant Pathology, Center for Integrated
Fungal Research, North Carolina State University, Raleigh, NC, United States of
America
- Program of Genetics, North Carolina State University, Raleigh, North
Carolina, United States of America
- * E-mail:
| |
Collapse
|
46
|
Gopalakrishnan S, Ebenesersdóttir SS, Lundstrøm IKC, Turner-Walker G, Moore KHS, Luisi P, Margaryan A, Martin MD, Ellegaard MR, Magnússon ÓÞ, Sigurðsson Á, Snorradóttir S, Magnúsdóttir DN, Laffoon JE, van Dorp L, Liu X, Moltke I, Ávila-Arcos MC, Schraiber JG, Rasmussen S, Juan D, Gelabert P, de-Dios T, Fotakis AK, Iraeta-Orbegozo M, Vågene ÅJ, Denham SD, Christophersen A, Stenøien HK, Vieira FG, Liu S, Günther T, Kivisild T, Moseng OG, Skar B, Cheung C, Sandoval-Velasco M, Wales N, Schroeder H, Campos PF, Guðmundsdóttir VB, Sicheritz-Ponten T, Petersen B, Halgunset J, Gilbert E, Cavalleri GL, Hovig E, Kockum I, Olsson T, Alfredsson L, Hansen TF, Werge T, Willerslev E, Balloux F, Marques-Bonet T, Lalueza-Fox C, Nielsen R, Stefánsson K, Helgason A, Gilbert MTP. The population genomic legacy of the second plague pandemic. Curr Biol 2022; 32:4743-4751.e6. [PMID: 36182700 DOI: 10.1016/j.cub.2022.09.023] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 06/15/2022] [Accepted: 09/09/2022] [Indexed: 11/18/2022]
Abstract
Human populations have been shaped by catastrophes that may have left long-lasting signatures in their genomes. One notable example is the second plague pandemic that entered Europe in ca. 1,347 CE and repeatedly returned for over 300 years, with typical village and town mortality estimated at 10%-40%.1 It is assumed that this high mortality affected the gene pools of these populations. First, local population crashes reduced genetic diversity. Second, a change in frequency is expected for sequence variants that may have affected survival or susceptibility to the etiologic agent (Yersinia pestis).2 Third, mass mortality might alter the local gene pools through its impact on subsequent migration patterns. We explored these factors using the Norwegian city of Trondheim as a model, by sequencing 54 genomes spanning three time periods: (1) prior to the plague striking Trondheim in 1,349 CE, (2) the 17th-19th century, and (3) the present. We find that the pandemic period shaped the gene pool by reducing long distance immigration, in particular from the British Isles, and inducing a bottleneck that reduced genetic diversity. Although we also observe an excess of large FST values at multiple loci in the genome, these are shaped by reference biases introduced by mapping our relatively low genome coverage degraded DNA to the reference genome. This implies that attempts to detect selection using ancient DNA (aDNA) datasets that vary by read length and depth of sequencing coverage may be particularly challenging until methods have been developed to account for the impact of differential reference bias on test statistics.
Collapse
Affiliation(s)
- Shyam Gopalakrishnan
- The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Øster Farimagsgade 5A, 1353 Copenhagen, Denmark.
| | - S Sunna Ebenesersdóttir
- deCODE Genetics, AMGEN Inc., Sturlugata 8, 102 Reykjavík, Iceland; Department of Anthropology, School of Social Sciences, University of Iceland, Gimli, Sæmundargata, 102 Reykjavík, Iceland
| | - Inge K C Lundstrøm
- The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Øster Farimagsgade 5A, 1353 Copenhagen, Denmark
| | - Gordon Turner-Walker
- National Yunlin University of Science & Technology, 123 University Road, Section 3, 64002 Douliu, Yun-Lin County, Taiwan; Department of Archaeology and Anthropology, National Museum of Natural Science, 1 Guanqian Road, North District Taichung City 404023, Taiwan
| | | | - Pierre Luisi
- Facultad de Filosofía y Humanidades, Universidad Nacional de Córdoba, Córdoba, Argentina; Microbial Paleogenomics Unit, Institut Pasteur, 25-28 Rue du Dr Roux, 75015 Paris, France
| | - Ashot Margaryan
- The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Øster Farimagsgade 5A, 1353 Copenhagen, Denmark
| | - Michael D Martin
- NTNU University Museum, Norwegian University of Science and Technology (NTNU), 7491 Trondheim, Norway
| | - Martin Rene Ellegaard
- The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Øster Farimagsgade 5A, 1353 Copenhagen, Denmark; NTNU University Museum, Norwegian University of Science and Technology (NTNU), 7491 Trondheim, Norway
| | | | | | | | | | - Jason E Laffoon
- Department of Archaeological Sciences, Faculty of Archaeology, Leiden University, Leiden, the Netherlands
| | - Lucy van Dorp
- UCL Genetics Institute, Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK
| | - Xiaodong Liu
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200 Copenhagen, Denmark
| | - Ida Moltke
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200 Copenhagen, Denmark
| | - María C Ávila-Arcos
- International Laboratory for Human Genome Research, Laboratorio Internacional de Investigación sobre el Genoma Humano (LIIGH), Universidad Nacional Autónoma de México (UNAM), 3001 Boulevard Juriquilla, 76230 Querétaro, Mexico
| | - Joshua G Schraiber
- Illumina Artificial Intelligence Laboratory, Illumina Inc., San Diego, CA, USA
| | - Simon Rasmussen
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Blegdamsvej 3, 2200 Copenhagen, Denmark
| | - David Juan
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Pere Gelabert
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain; Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria
| | - Toni de-Dios
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Anna K Fotakis
- The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Øster Farimagsgade 5A, 1353 Copenhagen, Denmark
| | - Miren Iraeta-Orbegozo
- The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Øster Farimagsgade 5A, 1353 Copenhagen, Denmark
| | - Åshild J Vågene
- The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Øster Farimagsgade 5A, 1353 Copenhagen, Denmark; Max Planck Institute for the Science of Human History, Kahlaische Strasse 10, 07745 Jena, Germany; Institute for Archaeological Sciences, University of Tübingen, Tübingen, Germany
| | | | - Axel Christophersen
- NTNU University Museum, Norwegian University of Science and Technology (NTNU), 7491 Trondheim, Norway
| | - Hans K Stenøien
- NTNU University Museum, Norwegian University of Science and Technology (NTNU), 7491 Trondheim, Norway
| | - Filipe G Vieira
- The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Øster Farimagsgade 5A, 1353 Copenhagen, Denmark
| | - Shanlin Liu
- The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Øster Farimagsgade 5A, 1353 Copenhagen, Denmark; China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Torsten Günther
- Evolutionsbiologisk Centrum EBC, Norbyv. 18A, 752 36 Uppsala, Sweden
| | - Toomas Kivisild
- KU Leuven, Herestraat 49, 3000 Leuven, Belgium; Institute of Genomics, University of Tartu, Riia 23b, 51010 Tartu, Estonia
| | - Ole Georg Moseng
- Department of Business, History and Social Sciences, University of South-Eastern Norway, Notodden, Norway
| | - Birgitte Skar
- NTNU University Museum, Norwegian University of Science and Technology (NTNU), 7491 Trondheim, Norway
| | - Christina Cheung
- The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Øster Farimagsgade 5A, 1353 Copenhagen, Denmark; EA - Eco-anthropologie (UMR 7206), Muséum National d'Histoire Naturelle, CNRS, Université Paris Diderot, Paris, France
| | - Marcela Sandoval-Velasco
- The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Øster Farimagsgade 5A, 1353 Copenhagen, Denmark
| | - Nathan Wales
- Department of Archaeology, Kings Manor and Principals House, University of York, Exhibition Square, York YO1 7EP, UK
| | - Hannes Schroeder
- The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Øster Farimagsgade 5A, 1353 Copenhagen, Denmark
| | - Paula F Campos
- The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Øster Farimagsgade 5A, 1353 Copenhagen, Denmark; CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos, Matosinhos, Portugal
| | - Valdís B Guðmundsdóttir
- deCODE Genetics, AMGEN Inc., Sturlugata 8, 102 Reykjavík, Iceland; Department of Anthropology, School of Social Sciences, University of Iceland, Gimli, Sæmundargata, 102 Reykjavík, Iceland
| | - Thomas Sicheritz-Ponten
- The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Øster Farimagsgade 5A, 1353 Copenhagen, Denmark; Centre of Excellence for Omics-Driven Computational Biodiscovery (COMBio), Faculty of Applied Sciences, Asian Institute of Medicine, Science and Technology (AIMST), 08100 Bedong, Kedah, Malaysia
| | - Bent Petersen
- The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Øster Farimagsgade 5A, 1353 Copenhagen, Denmark; Centre of Excellence for Omics-Driven Computational Biodiscovery (COMBio), Faculty of Applied Sciences, Asian Institute of Medicine, Science and Technology (AIMST), 08100 Bedong, Kedah, Malaysia
| | | | - Edmund Gilbert
- School of Pharmacy and Biomolecular Sciences, RCSI, Dublin, Ireland; FutureNeuro SFI Research Centre, RCSI, Dublin, Ireland
| | - Gianpiero L Cavalleri
- School of Pharmacy and Biomolecular Sciences, RCSI, Dublin, Ireland; FutureNeuro SFI Research Centre, RCSI, Dublin, Ireland
| | - Eivind Hovig
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway; Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| | - Ingrid Kockum
- Center for Molecular Medicine, Department of Clinical Neuroscience, Neuroimmunology Unit, Karolinska Institutet, Stockholm, Sweden
| | - Tomas Olsson
- Center for Molecular Medicine, Department of Clinical Neuroscience, Neuroimmunology Unit, Karolinska Institutet, Stockholm, Sweden
| | - Lars Alfredsson
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Thomas F Hansen
- Institute of Biological Psychiatry, Copenhagen Mental Health Services, Copenhagen, Denmark; Danish Headache Center, Department of Neurology, Copenhagen University Hospital, 2600 Glostrup, Denmark
| | - Thomas Werge
- Institute of Biological Psychiatry, Copenhagen Mental Health Services, Copenhagen, Denmark; Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark; The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Copenhagen, Denmark; The Globe Institute, Lundbeck Foundation Center for Geogenetics, Øster Voldgade 5-7, 1350 Copenhagen K, Denmark
| | - Eske Willerslev
- The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Øster Farimagsgade 5A, 1353 Copenhagen, Denmark; Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK
| | - Francois Balloux
- UCL Genetics Institute, Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain; Catalan Institution of Research and Advanced Studies (ICREA), Passeig de Lluís Companys, 23, 08010 Barcelona, Spain; CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain; Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Edifici ICTA-ICP, c/ Columnes s/n, 08193 Cerdanyola del Vallès, Barcelona, Spain
| | - Carles Lalueza-Fox
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain; Museu de Ciències Naturals de Barcelona, 08019 Barcelona, Spain
| | - Rasmus Nielsen
- The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Øster Farimagsgade 5A, 1353 Copenhagen, Denmark; Department of Integrative Biology, University of California, Berkeley, 3060 Valley Life Sciences Bldg #3140, Berkeley, CA 94720-3140, USA
| | - Kári Stefánsson
- deCODE Genetics, AMGEN Inc., Sturlugata 8, 102 Reykjavík, Iceland; Faculty of Medicine, University of Iceland, Reykjavík, Iceland
| | - Agnar Helgason
- deCODE Genetics, AMGEN Inc., Sturlugata 8, 102 Reykjavík, Iceland; Department of Anthropology, School of Social Sciences, University of Iceland, Gimli, Sæmundargata, 102 Reykjavík, Iceland
| | - M Thomas P Gilbert
- The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Øster Farimagsgade 5A, 1353 Copenhagen, Denmark; NTNU University Museum, Norwegian University of Science and Technology (NTNU), 7491 Trondheim, Norway
| |
Collapse
|
47
|
Mun T, Vaddadi NSK, Langmead B. Pangenomic Genotyping with the Marker Array. ALGORITHMS IN BIOINFORMATICS : ... INTERNATIONAL WORKSHOP, WABI ..., PROCEEDINGS. WABI (WORKSHOP) 2022; 242:19. [PMID: 36409181 PMCID: PMC9674407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
We present a new method and software tool called rowbowt that applies a pangenome index to the problem of inferring genotypes from short-read sequencing data. The method uses a novel indexing structure called the marker array. Using the marker array, we can genotype variants with respect from large panels like the 1000 Genomes Project while avoiding the reference bias that results when aligning to a single linear reference. rowbowt can infer accurate genotypes in less time and memory compared to existing graph-based methods.
Collapse
Affiliation(s)
- Taher Mun
- Johns Hopkins University, Baltimore MD, USA; Illumina, San Diego, USA
| | | | | |
Collapse
|
48
|
Scott CB, Cárdenas A, Mah M, Narasimhan VM, Rohland N, Toth LT, Voolstra CR, Reich D, Matz MV. Millennia-old coral holobiont DNA provides insight into future adaptive trajectories. Mol Ecol 2022; 31:4979-4990. [PMID: 35943423 DOI: 10.1111/mec.16642] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 07/26/2022] [Accepted: 08/03/2022] [Indexed: 11/28/2022]
Abstract
Ancient DNA (aDNA) has been applied to evolutionary questions across a wide variety of taxa. Here, for the first time, we leverage aDNA from millennia-old fossil coral fragments to gain new insights into a rapidly declining western Atlantic reef ecosystem. We sampled four Acropora palmata fragments (dated 4215 BCE - 1099 CE) obtained from two Florida Keys reef cores. From these samples, we established that it is possible both to sequence ancient DNA from reef cores and place the data in the context of modern-day genetic variation. We recovered varying amounts of nuclear DNA exhibiting the characteristic signatures of aDNA from the A. palmata fragments. To describe the holobiont sensu lato, which plays a crucial role in reef health, we utilized metagenome-assembled genomes as a reference to identify a large additional proportion of ancient microbial DNA from the samples. The samples shared many common microbes with modern-day coral holobionts from the same region, suggesting remarkable holobiont stability over time. Despite efforts, we were unable to recover ancient Symbiodiniaceae reads from the samples. Comparing the ancient A. palmata data to whole-genome sequencing data from living acroporids, we found that while slightly distinct, ancient samples were most closely related to individuals of their own species. Together, these results provide a proof-of-principle showing that it is possible to carry out direct analysis of coral holobiont change over time, which lays a foundation for studying the impacts of environmental stress and evolutionary constraints.
Collapse
Affiliation(s)
- Carly B Scott
- Department of Integrative Biology, University of Texas, Austin, TX, USA
| | - Anny Cárdenas
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Matthew Mah
- Department of Genetics, Harvard Medical School, Boston, MA, USA.,Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA, Austin, TX, USA
| | | | - Nadin Rohland
- Department of Genetics, Harvard Medical School, Boston, MA, USA.,Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Lauren T Toth
- U.S. Geological Survey, St. Petersburg Coastal and Marine Science Center, St. Petersburg, FL
| | | | - David Reich
- Department of Genetics, Harvard Medical School, Boston, MA, USA.,Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA, Austin, TX, USA.,Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Mikhail V Matz
- Department of Integrative Biology, University of Texas, Austin, TX, USA
| |
Collapse
|
49
|
Srigyan M, Bolívar H, Ureña I, Santana J, Petersen A, Iriarte E, Kırdök E, Bergfeldt N, Mora A, Jakobsson M, Abdo K, Braemer F, Smith C, Ibañez JJ, Götherström A, Günther T, Valdiosera C. Bioarchaeological evidence of one of the earliest Islamic burials in the Levant. Commun Biol 2022; 5:554. [PMID: 35672445 PMCID: PMC9174286 DOI: 10.1038/s42003-022-03508-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 05/20/2022] [Indexed: 11/27/2022] Open
Abstract
The Middle East plays a central role in human history harbouring a vast diversity of ethnic, cultural and religious groups. However, much remains to be understood about past and present genomic diversity in this region. Here we present a multidisciplinary bioarchaeological analysis of two individuals dated to the late 7th and early 8th centuries, the Umayyad Era, from Tell Qarassa, an open-air site in modern-day Syria. Radiocarbon dates and burial type are consistent with one of the earliest Islamic Arab burials in the Levant. Interestingly, we found genomic similarity to a genotyped group of modern-day Bedouins and Saudi rather than to most neighbouring Levantine groups. This study represents the genomic analysis of a secondary use site with characteristics consistent with an early Islamic burial in the Levant. We discuss our findings and possible historic scenarios in the light of forces such as genetic drift and their possible interaction with religious and cultural processes (including diet and subsistence practices). Ancient genomic and archaeological data combine to identify a surprisingly early Islamic burial in modern day Syria.
Collapse
Affiliation(s)
- Megha Srigyan
- Human Evolution, Department of Organismal Biology, Uppsala University, Uppsala, Sweden.,Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Héctor Bolívar
- Centre for Palaeogenetics, 10691, Stockholm, Sweden.,Instituto del Patrimonio Cultural de España, 28040, Madrid, Spain
| | - Irene Ureña
- Centre for Palaeogenetics, 10691, Stockholm, Sweden
| | - Jonathan Santana
- Department of Historical Sciences, Universidad de Las Palmas de Gran Canaria, Las Palmas de G.C., E35001, Spain
| | | | - Eneko Iriarte
- Laboratorio de Evolución Humana, Departamento de Historia, Geografía y Comunicación, Universidad de Burgos, 09001, Burgos, Spain
| | - Emrah Kırdök
- Department of Biotechnology, Mersin University, 33343, Mersin, Turkey
| | | | - Alice Mora
- Dept. Archaeology and History, La Trobe University, Melbourne, VIC, 3086, Australia
| | - Mattias Jakobsson
- Human Evolution, Department of Organismal Biology, Uppsala University, Uppsala, Sweden
| | - Khaled Abdo
- General Directorate of Antiquities and Museums, Damascus, Syrian Arab Republic
| | - Frank Braemer
- Université Côte d'Azur, CNRS, Culture et Environment, Préhistoire Antiquité Moyen Age, Nice, France
| | - Colin Smith
- Laboratorio de Evolución Humana, Departamento de Historia, Geografía y Comunicación, Universidad de Burgos, 09001, Burgos, Spain.,Dept. Archaeology and History, La Trobe University, Melbourne, VIC, 3086, Australia
| | - Juan José Ibañez
- Archaeology of Social Dynamics, Milà i Fontanals Institution, Spanish National Research Council (CSIC), Barcelona, Spain
| | | | - Torsten Günther
- Human Evolution, Department of Organismal Biology, Uppsala University, Uppsala, Sweden.
| | - Cristina Valdiosera
- Laboratorio de Evolución Humana, Departamento de Historia, Geografía y Comunicación, Universidad de Burgos, 09001, Burgos, Spain. .,Dept. Archaeology and History, La Trobe University, Melbourne, VIC, 3086, Australia.
| |
Collapse
|
50
|
Jain C, Rhie A, Hansen NF, Koren S, Phillippy AM. Long-read mapping to repetitive reference sequences using Winnowmap2. Nat Methods 2022; 19:705-710. [PMID: 35365778 PMCID: PMC10510034 DOI: 10.1038/s41592-022-01457-8] [Citation(s) in RCA: 51] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Accepted: 03/17/2022] [Indexed: 01/10/2023]
Abstract
Approximately 5-10% of the human genome remains inaccessible due to the presence of repetitive sequences such as segmental duplications and tandem repeat arrays. We show that existing long-read mappers often yield incorrect alignments and variant calls within long, near-identical repeats, as they remain vulnerable to allelic bias. In the presence of a nonreference allele within a repeat, a read sampled from that region could be mapped to an incorrect repeat copy. To address this limitation, we developed a new long-read mapping method, Winnowmap2, by using minimal confidently alignable substrings. Winnowmap2 computes each read mapping through a collection of confident subalignments. This approach is more tolerant of structural variation and more sensitive to paralog-specific variants within repeats. Our experiments highlight that Winnowmap2 successfully addresses the issue of allelic bias, enabling more accurate downstream variant calls in repetitive sequences.
Collapse
Affiliation(s)
- Chirag Jain
- Department of Computational and Data Sciences, Indian Institute of Science, Bangalore, India.
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA.
| | - Arang Rhie
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA
| | - Nancy F Hansen
- Comparative Genomics Analysis Unit, National Human Genome Research Institute, Bethesda, MD, USA
| | - Sergey Koren
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA
| | - Adam M Phillippy
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA
| |
Collapse
|