1
|
Peng D, Mulder OJ, Edge MD. Evaluating ARG-estimation methods in the context of estimating population-mean polygenic score histories. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.24.595829. [PMID: 38854009 PMCID: PMC11160635 DOI: 10.1101/2024.05.24.595829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Scalable methods for estimating marginal coalescent trees across the genome present new opportunities for studying evolution and have generated considerable excitement, with new methods extending scalability to thousands of samples. Benchmarking of the available methods has revealed general tradeoffs between accuracy and scalability, but performance in downstream applications has not always been easily predictable from general performance measures, suggesting that specific features of the ARG may be important for specific downstream applications of estimated ARGs. To exemplify this point, we benchmark ARG estimation methods with respect to a specific set of methods for estimating the historical time course of a population-mean polygenic score (PGS) using the marginal coalescent trees encoded by the ancestral recombination graph (ARG). Here we examine the performance in simulation of six ARG estimation methods: ARGweaver, RENT+, Relate, tsinfer+tsdate, ARG-Needle/ASMC-clust , and SINGER , using their estimated coalescent trees and examining bias, mean squared error (MSE), confidence interval coverage, and Type I and II error rates of the downstream methods. Although it does not scale to the sample sizes attainable by other new methods, SINGER produced the most accurate estimated PGS histories in many instances, even when Relate, tsinfer+tsdate , and ARG-Needle/ASMC-clust used samples ten times as large as those used by SINGER. In general, the best choice of method depends on the number of samples available and the historical time period of interest. In particular, the unprecedented sample sizes allowed by Relate, tsinfer+tsdate , and ARG-Needle/ASMC-clust are of greatest importance when the recent past is of interest-further back in time, most of the tree has coalesced, and differences in contemporary sample size are less salient.
Collapse
|
2
|
Iasi LNM, Chintalapati M, Skov L, Mesa AB, Hajdinjak M, Peter BM, Moorjani P. Neandertal ancestry through time: Insights from genomes of ancient and present-day humans. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.13.593955. [PMID: 38798350 PMCID: PMC11118355 DOI: 10.1101/2024.05.13.593955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Gene flow from Neandertals has shaped the landscape of genetic and phenotypic variation in modern humans. We identify the location and size of introgressed Neandertal ancestry segments in more than 300 genomes spanning the last 50,000 years. We study how Neandertal ancestry is shared among individuals to infer the time and duration of the Neandertal gene flow. We find the correlation of Neandertal segment locations across individuals and their divergence to sequenced Neandertals, both support a model of single major Neandertal gene flow. Our catalog of introgressed segments through time confirms that most natural selection-positive and negative-on Neandertal ancestry variants occurred immediately after the gene flow, and provides new insights into how the contact with Neandertals shaped human origins and adaptation.
Collapse
Affiliation(s)
- Leonardo N. M. Iasi
- Department for Evolutionary Genetics, Max-Planck-Institute for Evolutionary Anthropology; Leipzig, 04301, Germany
| | - Manjusha Chintalapati
- Department of Molecular and Cell Biology, University of California Berkeley; Berkeley, CA 94720, USA
| | - Laurits Skov
- Department of Molecular and Cell Biology, University of California Berkeley; Berkeley, CA 94720, USA
| | - Alba Bossoms Mesa
- Department for Evolutionary Genetics, Max-Planck-Institute for Evolutionary Anthropology; Leipzig, 04301, Germany
| | - Mateja Hajdinjak
- Department for Evolutionary Genetics, Max-Planck-Institute for Evolutionary Anthropology; Leipzig, 04301, Germany
- The Francis Crick Institute; London, NW1 1AT, UK
| | - Benjamin M. Peter
- Department for Evolutionary Genetics, Max-Planck-Institute for Evolutionary Anthropology; Leipzig, 04301, Germany
- Department of Biology, University of Rochester; Rochester NY, 14620,USA
| | - Priya Moorjani
- Department of Molecular and Cell Biology, University of California Berkeley; Berkeley, CA 94720, USA
- Center for Computational Biology, University of California Berkeley; Berkeley, CA 94720, USA
| |
Collapse
|
3
|
Diamantidis D, Fan WTL, Birkner M, Wakeley J. Bursts of coalescence within population pedigrees whenever big families occur. Genetics 2024; 227:iyae030. [PMID: 38408329 DOI: 10.1093/genetics/iyae030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 01/23/2024] [Accepted: 02/18/2024] [Indexed: 02/28/2024] Open
Abstract
We consider a simple diploid population-genetic model with potentially high variability of offspring numbers among individuals. Specifically, against a backdrop of Wright-Fisher reproduction and no selection, there is an additional probability that a big family occurs, meaning that a pair of individuals has a number of offspring on the order of the population size. We study how the pedigree of the population generated under this model affects the ancestral genetic process of a sample of size two at a single autosomal locus without recombination. Our population model is of the type for which multiple-merger coalescent processes have been described. We prove that the conditional distribution of the pairwise coalescence time given the random pedigree converges to a limit law as the population size tends to infinity. This limit law may or may not be the usual exponential distribution of the Kingman coalescent, depending on the frequency of big families. But because it includes the number and times of big families, it differs from the usual multiple-merger coalescent models. The usual multiple-merger coalescent models are seen as describing the ancestral process marginal to, or averaging over, the pedigree. In the limiting ancestral process conditional on the pedigree, the intervals between big families can be modeled using the Kingman coalescent but each big family causes a discrete jump in the probability of coalescence. Analogous results should hold for larger samples and other population models. We illustrate these results with simulations and additional analysis, highlighting their implications for inference and understanding of multilocus data.
Collapse
Affiliation(s)
| | - Wai-Tong Louis Fan
- Department of Mathematics, Indiana University, Bloomington, IN 47405, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Matthias Birkner
- Institut für Mathematik, Johannes-Gutenberg-Universität, 55099 Mainz, Germany
| | - John Wakeley
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
4
|
Wong Y, Ignatieva A, Koskela J, Gorjanc G, Wohns AW, Kelleher J. A general and efficient representation of ancestral recombination graphs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.03.565466. [PMID: 37961279 PMCID: PMC10635123 DOI: 10.1101/2023.11.03.565466] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
As a result of recombination, adjacent nucleotides can have different paths of genetic inheritance and therefore the genealogical trees for a sample of DNA sequences vary along the genome. The structure capturing the details of these intricately interwoven paths of inheritance is referred to as an ancestral recombination graph (ARG). Classical formalisms have focused on mapping coalescence and recombination events to the nodes in an ARG. This approach is out of step with modern developments, which do not represent genetic inheritance in terms of these events or explicitly infer them. We present a simple formalism that defines an ARG in terms of specific genomes and their intervals of genetic inheritance, and show how it generalises these classical treatments and encompasses the outputs of recent methods. We discuss nuances arising from this more general structure, and argue that it forms an appropriate basis for a software standard in this rapidly growing field.
Collapse
Affiliation(s)
- Yan Wong
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, UK
| | - Anastasia Ignatieva
- School of Mathematics and Statistics, University of Glasgow, UK
- Department of Statistics, University of Oxford, UK
| | - Jere Koskela
- School of Mathematics, Statistics and Physics, Newcastle University, UK
- Department of Statistics, University of Warwick, UK
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, UK
| | - Anthony W Wohns
- Broad Institute of MIT and Harvard, Cambridge, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, USA
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, UK
| |
Collapse
|
5
|
Grundler MC, Terhorst J, Bradburd GS. A geographic history of human genetic ancestry. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.27.586858. [PMID: 38585733 PMCID: PMC10996620 DOI: 10.1101/2024.03.27.586858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Describing the distribution of genetic variation across individuals is a fundamental goal of population genetics. In humans, traditional approaches for describing population genetic variation often rely on discrete genetic ancestry labels, which, despite their utility, can obscure the complex, multifaceted nature of human genetic history. These labels risk oversimplifying ancestry by ignoring its temporal depth and geographic continuity, and may therefore conflate notions of race, ethnicity, geography, and genetic ancestry. Here, we present a method that capitalizes on the rich genealogical information encoded in genomic tree sequences to infer the geographic locations of the shared ancestors of a sample of sequenced individuals. We use this method to infer the geographic history of genetic ancestry of a set of human genomes sampled from Europe, Asia, and Africa, accurately recovering major population movements on those continents. Our findings demonstrate the importance of defining the spatial-temporal context of genetic ancestry to describing human genetic variation and caution against the oversimplified interpretations of genetic data prevalent in contemporary discussions of race and ancestry.
Collapse
Affiliation(s)
- Michael C Grundler
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| | - Jonathan Terhorst
- Department of Statistics, University of Michigan, Ann Arbor, MI, USA
| | - Gideon S Bradburd
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
6
|
Redman MG, Horton RH, Carley H, Lucassen A. Ancestry, race and ethnicity: the role and relevance of language in clinical genetics practice. J Med Genet 2024; 61:313-318. [PMID: 38050060 PMCID: PMC10982622 DOI: 10.1136/jmg-2023-109370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 09/28/2023] [Indexed: 12/06/2023]
Abstract
BACKGROUND The terms ancestry, race and ethnicity are used variably within the medical literature and within society and clinical care. Biological lineage can provide an important context for the interpretation of genomic data, but the language used, and practices around when to ascertain this, vary. METHODS Using a fictional case scenario we explore the relevance of questions around ancestry, race and ethnicity in clinical genetic practice. RESULTS In the UK, data on 'ethnicity' are routinely collected by those using genomic medicine, as well as within the wider UK National Health Service, although the reasons for this are not always clear to practitioners and patients. Sometimes it is requested as a proxy for biological lineage to aid variant interpretation, refine estimations of carrier frequency and guide decisions around the need for pharmacogenetic testing. CONCLUSION There are many challenges around the use and utility of these terms. Currently, genomic databases are populated primarily with data from people of European descent, and this can lead to health disparities and poorer service for minoritised or underserved populations. Sensitivity and consideration are needed when communicating with patients around these areas. We explore the role and relevance of language around biological lineage in clinical genetics practice.
Collapse
Affiliation(s)
- Melody Grace Redman
- Yorkshire Regional Genetics Service, Leeds Teaching Hospitals NHS Trust, Leeds, UK
| | - Rachel Helen Horton
- Centre for Personalised Medicine, Nuffield Department of Medicine, Wellcome Trust Centre for Human Genetics, Oxford, Oxfordshire, UK
| | - Helena Carley
- South East Thames Regional Genetics Service, Guy's Hospital, London, UK
| | - Anneke Lucassen
- Centre for Personalised Medicine, Nuffield Department of Medicine, Wellcome Trust Centre for Human Genetics, Oxford, Oxfordshire, UK
| |
Collapse
|
7
|
Huang Z, Kelleher J, Chan YB, Balding DJ. Estimating evolutionary and demographic parameters via ARG-derived IBD. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.07.583855. [PMID: 38559261 PMCID: PMC10979897 DOI: 10.1101/2024.03.07.583855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Inference of demographic and evolutionary parameters from a sample of genome sequences often proceeds by first inferring identical-by-descent (IBD) genome segments. By exploiting efficient data encoding based on the ancestral recombination graph (ARG), we obtain three major advantages over current approaches: (i) no need to impose a length threshold on IBD segments, (ii) IBD can be defined without the hard-to-verify requirement of no recombination, and (iii) computation time can be reduced with little loss of statistical efficiency using only the IBD segments from a set of sequence pairs that scales linearly with sample size. We first demonstrate powerful inferences when true IBD information is available from simulated data. For IBD inferred from real data, we propose an approximate Bayesian computation inference algorithm and use it to show that poorly-inferred short IBD segments can improve estimation precision. We show estimation precision similar to a previously-published estimator despite a 4 000-fold reduction in data used for inference. Computational cost limits model complexity in our approach, but we are able to incorporate unknown nuisance parameters and model misspecification, still finding improved parameter inference.
Collapse
Affiliation(s)
- Zhendong Huang
- Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Australia
| | - Jerome Kelleher
- Oxford Big Data Institute, University of Oxford, United Kingdom
| | - Yao-ban Chan
- Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Australia
| | - David J. Balding
- Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Australia
| |
Collapse
|
8
|
Mallick S, Micco A, Mah M, Ringbauer H, Lazaridis I, Olalde I, Patterson N, Reich D. The Allen Ancient DNA Resource (AADR) a curated compendium of ancient human genomes. Sci Data 2024; 11:182. [PMID: 38341426 PMCID: PMC10858950 DOI: 10.1038/s41597-024-03031-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 01/31/2024] [Indexed: 02/12/2024] Open
Abstract
More than two hundred papers have reported genome-wide data from ancient humans. While the raw data for the vast majority are fully publicly available testifying to the commitment of the paleogenomics community to open data, formats for both raw data and meta-data differ. There is thus a need for uniform curation and a centralized, version-controlled compendium that researchers can download, analyze, and reference. Since 2019, we have been maintaining the Allen Ancient DNA Resource (AADR), which aims to provide an up-to-date, curated version of the world's published ancient human DNA data, represented at more than a million single nucleotide polymorphisms (SNPs) at which almost all ancient individuals have been assayed. The AADR has gone through six public releases at the time of writing and review of this manuscript, and crossed the threshold of >10,000 individuals with published genome-wide ancient DNA data at the end of 2022. This note is intended as a citable descriptor of the AADR.
Collapse
Affiliation(s)
- Swapan Mallick
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
- Howard Hughes Medical Institute, Boston, MA, 02115, USA.
| | - Adam Micco
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
- Howard Hughes Medical Institute, Boston, MA, 02115, USA
| | - Matthew Mah
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
- Howard Hughes Medical Institute, Boston, MA, 02115, USA
| | - Harald Ringbauer
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA
- Max Planck Institute for Evolutionary Anthropology, Leipzig, 04103, Germany
| | - Iosif Lazaridis
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA
| | - Iñigo Olalde
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
- BIOMICs Research Group, University of the Basque Country, 01006, Vitoria-Gasteiz, Spain
| | - Nick Patterson
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA
| | - David Reich
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
- Howard Hughes Medical Institute, Boston, MA, 02115, USA.
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA.
| |
Collapse
|
9
|
Brandt DYC, Huber CD, Chiang CWK, Ortega-Del Vecchyo D. The Promise of Inferring the Past Using the Ancestral Recombination Graph. Genome Biol Evol 2024; 16:evae005. [PMID: 38242694 PMCID: PMC10834162 DOI: 10.1093/gbe/evae005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 12/11/2023] [Accepted: 12/17/2023] [Indexed: 01/21/2024] Open
Abstract
The ancestral recombination graph (ARG) is a structure that represents the history of coalescent and recombination events connecting a set of sequences (Hudson RR. In: Futuyma D, Antonovics J, editors. Gene genealogies and the coalescent process. In: Oxford Surveys in Evolutionary Biology; 1991. p. 1 to 44.). The full ARG can be represented as a set of genealogical trees at every locus in the genome, annotated with recombination events that change the topology of the trees between adjacent loci and the mutations that occurred along the branches of those trees (Griffiths RC, Marjoram P. An ancestral recombination graph. In: Donnelly P, Tavare S, editors. Progress in population genetics and human evolution. Springer; 1997. p. 257 to 270.). Valuable insights can be gained into past evolutionary processes, such as demographic events or the influence of natural selection, by studying the ARG. It is regarded as the "holy grail" of population genetics (Hubisz M, Siepel A. Inference of ancestral recombination graphs using ARGweaver. In: Dutheil JY, editors. Statistical population genomics. New York, NY: Springer US; 2020. p. 231-266.) since it encodes the processes that generate all patterns of allelic and haplotypic variation from which all commonly used summary statistics in population genetic research (e.g. heterozygosity and linkage disequilibrium) can be derived. Many previous evolutionary inferences relied on summary statistics extracted from the genotype matrix. Evolutionary inferences using the ARG represent a significant advancement as the ARG is a representation of the evolutionary history of a sample that shows the past history of recombination, coalescence, and mutation events across a particular sequence. This representation in theory contains as much information, if not more, than the combination of all independent summary statistics that could be derived from the genotype matrix. Consistent with this idea, some of the first ARG-based analyses have proven to be more powerful than summary statistic-based analyses (Speidel L, Forest M, Shi S, Myers SR. A method for genome-wide genealogy estimation for thousands of samples. Nat Genet. 2019:51(9):1321 to 1329.; Stern AJ, Wilton PR, Nielsen R. An approximate full-likelihood method for inferring selection and allele frequency trajectories from DNA sequence data. PLoS Genet. 2019:15(9):e1008384.; Hubisz MJ, Williams AL, Siepel A. Mapping gene flow between ancient hominins through demography-aware inference of the ancestral recombination graph. PLoS Genet. 2020:16(8):e1008895.; Fan C, Mancuso N, Chiang CWK. A genealogical estimate of genetic relationships. Am J Hum Genet. 2022:109(5):812-824.; Fan C, Cahoon JL, Dinh BL, Ortega-Del Vecchyo D, Huber C, Edge MD, Mancuso N, Chiang CWK. A likelihood-based framework for demographic inference from genealogical trees. bioRxiv. 2023.10.10.561787. 2023.; Hejase HA, Mo Z, Campagna L, Siepel A. A deep-learning approach for inference of selective sweeps from the ancestral recombination graph. Mol Biol Evol. 2022:39(1):msab332.; Link V, Schraiber JG, Fan C, Dinh B, Mancuso N, Chiang CWK, Edge MD. Tree-based QTL mapping with expected local genetic relatedness matrices. bioRxiv. 2023.04.07.536093. 2023.; Zhang BC, Biddanda A, Gunnarsson ÁF, Cooper F, Palamara PF. Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits. Nat Genet. 2023:55(5):768-776.). As such, there has been significant interest in the field to investigate 2 main problems related to the ARG: (i) How can we estimate the ARG based on genomic data, and (ii) how can we extract information of past evolutionary processes from the ARG? In this perspective, we highlight 3 topics that pertain to these main issues: The development of computational innovations that enable the estimation of the ARG; remaining challenges in estimating the ARG; and methodological advances for deducing evolutionary forces and mechanisms using the ARG. This perspective serves to introduce the readers to the types of questions that can be explored using the ARG and to highlight some of the most pressing issues that must be addressed in order to make ARG-based inference an indispensable tool for evolutionary research.
Collapse
Affiliation(s)
- Débora Y C Brandt
- Department of Genetics Evolution and Environment, University College London, London, UK
| | - Christian D Huber
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | - Charleston W K Chiang
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Diego Ortega-Del Vecchyo
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma De México, Querétaro, Querétaro, Mexico
| |
Collapse
|
10
|
Rivas-González I, Schierup MH, Wakeley J, Hobolth A. TRAILS: Tree reconstruction of ancestry using incomplete lineage sorting. PLoS Genet 2024; 20:e1010836. [PMID: 38330138 PMCID: PMC10880969 DOI: 10.1371/journal.pgen.1010836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 02/21/2024] [Accepted: 01/22/2024] [Indexed: 02/10/2024] Open
Abstract
Genome-wide genealogies of multiple species carry detailed information about demographic and selection processes on individual branches of the phylogeny. Here, we introduce TRAILS, a hidden Markov model that accurately infers time-resolved population genetics parameters, such as ancestral effective population sizes and speciation times, for ancestral branches using a multi-species alignment of three species and an outgroup. TRAILS leverages the information contained in incomplete lineage sorting fragments by modelling genealogies along the genome as rooted three-leaved trees, each with a topology and two coalescent events happening in discretized time intervals within the phylogeny. Posterior decoding of the hidden Markov model can be used to infer the ancestral recombination graph for the alignment and details on demographic changes within a branch. Since TRAILS performs posterior decoding at the base-pair level, genome-wide scans based on the posterior probabilities can be devised to detect deviations from neutrality. Using TRAILS on a human-chimp-gorilla-orangutan alignment, we recover speciation parameters and extract information about the topology and coalescent times at high resolution.
Collapse
Affiliation(s)
| | - Mikkel H. Schierup
- Bioinformatics Research Center (BiRC), Aarhus University, Aarhus, Denmark
| | - John Wakeley
- Department of Organismic and Evolutionary Biology, Harvard University, Massachusetts, United States of America
| | - Asger Hobolth
- Department of Mathematics, Aarhus University, Aarhus, Denmark
| |
Collapse
|
11
|
Meyer L, Barry P, Riquet F, Foote A, Der Sarkissian C, Cunha RL, Arbiol C, Cerqueira F, Desmarais E, Bordes A, Bierne N, Guinand B, Gagnaire PA. Divergence and gene flow history at two large chromosomal inversions underlying ecotype differentiation in the long-snouted seahorse. Mol Ecol 2024:e17277. [PMID: 38279695 DOI: 10.1111/mec.17277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 12/18/2023] [Accepted: 01/04/2024] [Indexed: 01/28/2024]
Abstract
Chromosomal inversions can play an important role in divergence and reproductive isolation by building and maintaining distinct allelic combinations between evolutionary lineages. Alternatively, they can take the form of balanced polymorphisms that segregate within populations until one arrangement becomes fixed. Many questions remain about how inversion polymorphisms arise, how they are maintained over the long term, and ultimately, whether and how they contribute to speciation. The long-snouted seahorse (Hippocampus guttulatus) is genetically subdivided into geographic lineages and marine-lagoon ecotypes, with shared structural variation underlying lineage and ecotype divergence. Here, we aim to characterize structural variants and to reconstruct their history and suspected role in ecotype formation. We generated a near chromosome-level genome assembly and described genome-wide patterns of diversity and divergence through the analysis of 112 whole-genome sequences from Atlantic, Mediterranean, and Black Sea populations. By also analysing linked-read sequencing data, we found evidence for two chromosomal inversions that were several megabases in length and showed contrasting allele frequency patterns between lineages and ecotypes across the species range. We reveal that these inversions represent ancient intraspecific polymorphisms, one likely being maintained by divergent selection and the other by pseudo-overdominance. A possible selective coupling between the two inversions was further supported by the absence of specific haplotype combinations and a putative functional interaction between the two inversions in reproduction. Lastly, we detected gene flux eroding divergence between inverted alleles at varying levels for the two inversions, with a likely impact on their dynamics and contribution to divergence and speciation.
Collapse
Affiliation(s)
- Laura Meyer
- ISEM, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Pierre Barry
- ISEM, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
- CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos Universidade do Porto, Vairão, Portugal
| | | | - Andrew Foote
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, Oslo, Norway
| | - Clio Der Sarkissian
- Centre for Anthropobiology and Genomics of Toulouse, CNRS, University of Toulouse Paul Sabatier, Toulouse, France
| | - Regina L Cunha
- Centre of Marine Sciences-CCMAR, University of Algarve, Faro, Portugal
| | | | | | - Erick Desmarais
- ISEM, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Anaïs Bordes
- ISEM, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Nicolas Bierne
- ISEM, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Bruno Guinand
- ISEM, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | | |
Collapse
|
12
|
Cousins T, Tabin D, Patterson N, Reich D, Durvasula A. Accurate inference of population history in the presence of background selection. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.18.576291. [PMID: 38313273 PMCID: PMC10838404 DOI: 10.1101/2024.01.18.576291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/06/2024]
Abstract
All published methods for learning about demographic history make the simplifying assumption that the genome evolves neutrally, and do not seek to account for the effects of natural selection on patterns of variation. This is a major concern, as ample work has demonstrated the pervasive effects of natural selection and in particular background selection (BGS) on patterns of genetic variation in diverse species. Simulations and theoretical work have shown that methods to infer changes in effective population size over time (Ne(t)) become increasingly inaccurate as the strength of linked selection increases. Here, we introduce an extension to the Pairwise Sequentially Markovian Coalescent (PSMC) algorithm, PSMC+, which explicitly co-models demographic history and natural selection. We benchmark our method using forward-in-time simulations with BGS and find that our approach improves the accuracy of effective population size inference. Leveraging a high resolution map of BGS in humans, we infer considerable changes in the magnitude of inferred effective population size relative to previous reports. Finally, we separately infer Ne(t) on the X chromosome and on the autosomes in diverse great apes without making a correction for selection, and find that the inferred ratio fluctuates substantially through time in a way that differs across species, showing that uncorrected selection may be an important driver of signals of genetic difference on the X chromosome and autosomes.
Collapse
Affiliation(s)
- Trevor Cousins
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Daniel Tabin
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Nick Patterson
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Boston, MA, USA
| | - Arun Durvasula
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
13
|
Gagnon L, Moreau C, Laprise C, Vézina H, Girard SL. Deciphering the genetic structure of the Quebec founder population using genealogies. Eur J Hum Genet 2024; 32:91-97. [PMID: 37016017 PMCID: PMC10772069 DOI: 10.1038/s41431-023-01356-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 03/07/2023] [Accepted: 03/22/2023] [Indexed: 04/06/2023] Open
Abstract
Using genealogy to study the demographic history of a population makes it possible to overcome the models and assumptions often used in population genetics. The Quebec founder population is one of the few populations in the world having access to the complete genealogy of the last 400 years. The goal of this study is to follow the evolution of the Quebec population structure over time from the beginning of European colonization until the present day. To do so, we calculated the kinship coefficients of all ancestors' pairs in the ascending genealogy of 665 subjects from eight regional and ethnocultural groups per 25-year period. We show that the Quebec population structure appeared progressively in the St. Lawrence valley as early as 1750 with the distinction of the Saguenay and Gaspesian groups. At that time, the ancestors of two groups, the Sagueneans and the Acadians from the Gaspé Peninsula, experienced a marked increase in kinship and inbreeding levels which have shaped the structure and led to the contemporary population structure. Interestingly, this structure arose before the colonization of the Saguenay region and at the very beginning of the Gaspé Peninsula settlement. The resulting regional founder effects in these groups led to differences in the present-day identity-by-descent sharing, the Gaspé and North Shore groups sharing more large segments and the Sagueneans more short segments. This is also reflected by the distribution of the number of most recent common ancestors at different generations and their genetic contribution to the studied subjects.
Collapse
Affiliation(s)
- Laurence Gagnon
- Département des Sciences Fondamentales, Université du Québec à Chicoutimi, Saguenay, Québec, G7H 2B1, Canada
- Centre Intersectoriel en Santé Durable (CISD), Université du Québec à Chicoutimi, Saguenay, Québec, G7H 2B1, Canada
| | - Claudia Moreau
- Département des Sciences Fondamentales, Université du Québec à Chicoutimi, Saguenay, Québec, G7H 2B1, Canada
- Centre Intersectoriel en Santé Durable (CISD), Université du Québec à Chicoutimi, Saguenay, Québec, G7H 2B1, Canada
| | - Catherine Laprise
- Département des Sciences Fondamentales, Université du Québec à Chicoutimi, Saguenay, Québec, G7H 2B1, Canada
- Centre Intersectoriel en Santé Durable (CISD), Université du Québec à Chicoutimi, Saguenay, Québec, G7H 2B1, Canada
- Centre Intégré Universitaire en Santé et Services Sociaux du Saguenay-Lac-Saint-Jean, Saguenay, Québec, G7H 7K9, Canada
| | - Hélène Vézina
- Centre Intersectoriel en Santé Durable (CISD), Université du Québec à Chicoutimi, Saguenay, Québec, G7H 2B1, Canada
- Département des Sciences Humaines et Sociales, Université du Québec à Chicoutimi, Saguenay, Québec, G7H 2B1, Canada
- Projet BALSAC, Université du Québec à Chicoutimi, Saguenay, Québec, G7H 2B1, Canada
| | - Simon L Girard
- Département des Sciences Fondamentales, Université du Québec à Chicoutimi, Saguenay, Québec, G7H 2B1, Canada.
- Centre Intersectoriel en Santé Durable (CISD), Université du Québec à Chicoutimi, Saguenay, Québec, G7H 2B1, Canada.
- Centre de Recherche CERVO, Université Laval, Québec, Québec, G1V 0A6, Canada.
| |
Collapse
|
14
|
Ringbauer H, Huang Y, Akbari A, Mallick S, Olalde I, Patterson N, Reich D. Accurate detection of identity-by-descent segments in human ancient DNA. Nat Genet 2024; 56:143-151. [PMID: 38123640 PMCID: PMC10786714 DOI: 10.1038/s41588-023-01582-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 10/20/2023] [Indexed: 12/23/2023]
Abstract
Long DNA segments shared between two individuals, known as identity-by-descent (IBD), reveal recent genealogical connections. Here we introduce ancIBD, a method for identifying IBD segments in ancient human DNA (aDNA) using a hidden Markov model and imputed genotype probabilities. We demonstrate that ancIBD accurately identifies IBD segments >8 cM for aDNA data with an average depth of >0.25× for whole-genome sequencing or >1× for 1240k single nucleotide polymorphism capture data. Applying ancIBD to 4,248 ancient Eurasian individuals, we identify relatives up to the sixth degree and genealogical connections between archaeological groups. Notably, we reveal long IBD sharing between Corded Ware and Yamnaya groups, indicating that the Yamnaya herders of the Pontic-Caspian Steppe and the Steppe-related ancestry in various European Corded Ware groups share substantial co-ancestry within only a few hundred years. These results show that detecting IBD segments can generate powerful insights into the growing aDNA record, both on a small scale relevant to life stories and on a large scale relevant to major cultural-historical events.
Collapse
Affiliation(s)
- Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany.
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA.
| | - Yilei Huang
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, Universität Leipzig, Leipzig, Germany
| | - Ali Akbari
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Swapan Mallick
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Iñigo Olalde
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- BIOMICs Research Group, University of the Basque Country, Vitoria-Gasteiz, Spain
- Ikerbasque-Basque Foundation of Science, Bilbao, Spain
| | - Nick Patterson
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA.
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
15
|
Lewanski AL, Grundler MC, Bradburd GS. The era of the ARG: An introduction to ancestral recombination graphs and their significance in empirical evolutionary genomics. PLoS Genet 2024; 20:e1011110. [PMID: 38236805 PMCID: PMC10796009 DOI: 10.1371/journal.pgen.1011110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2024] Open
Abstract
In the presence of recombination, the evolutionary relationships between a set of sampled genomes cannot be described by a single genealogical tree. Instead, the genomes are related by a complex, interwoven collection of genealogies formalized in a structure called an ancestral recombination graph (ARG). An ARG extensively encodes the ancestry of the genome(s) and thus is replete with valuable information for addressing diverse questions in evolutionary biology. Despite its potential utility, technological and methodological limitations, along with a lack of approachable literature, have severely restricted awareness and application of ARGs in evolution research. Excitingly, recent progress in ARG reconstruction and simulation have made ARG-based approaches feasible for many questions and systems. In this review, we provide an accessible introduction and exploration of ARGs, survey recent methodological breakthroughs, and describe the potential for ARGs to further existing goals and open avenues of inquiry that were previously inaccessible in evolutionary genomics. Through this discussion, we aim to more widely disseminate the promise of ARGs in evolutionary genomics and encourage the broader development and adoption of ARG-based inference.
Collapse
Affiliation(s)
- Alexander L. Lewanski
- Department of Integrative Biology, Michigan State University, East Lansing, Michigan, United States of America
- W.K. Kellogg Biological Station, Michigan State University, Hickory Corners, Michigan, United States of America
- Ecology, Evolution, and Behavior Program, Michigan State University, East Lansing, Michigan, United States of America
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Michael C. Grundler
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Gideon S. Bradburd
- W.K. Kellogg Biological Station, Michigan State University, Hickory Corners, Michigan, United States of America
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
16
|
Gilbertson EN, Brand CM, McArthur E, Rinker DC, Kuang S, Pollard KS, Capra JA. Machine learning reveals the diversity of human 3D chromatin contact patterns. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.22.573104. [PMID: 38187606 PMCID: PMC10769343 DOI: 10.1101/2023.12.22.573104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Understanding variation in chromatin contact patterns across human populations is critical for interpreting non-coding variants and their ultimate effects on gene expression and phenotypes. However, experimental determination of chromatin contacts at a population-scale is prohibitively expensive. To overcome this challenge, we develop and validate a machine learning method to quantify the diversity 3D chromatin contacts at 2 kilobase resolution from genome sequence alone. We then apply this approach to thousands of diverse modern humans and the inferred human-archaic hominin ancestral genome. While patterns of 3D contact divergence genome-wide are qualitatively similar to patterns of sequence divergence, we find that 3D divergence in local 1-megabase genomic windows does not follow sequence divergence. In particular, we identify 392 windows with significantly greater 3D divergence than expected from sequence. Moreover, 26% of genomic windows have rare 3D contact variation observed in a small number of individuals. Using in silico mutagenesis we find that most sequence changes to do not result in changes to 3D chromatin contacts. However in windows with substantial 3D divergence, just one or a few variants can lead to divergent 3D chromatin contacts without the individuals carrying those variants having high sequence divergence. In summary, inferring 3D chromatin contact maps across human populations reveals diverse contact patterns. We anticipate that these genetically diverse maps of 3D chromatin contact will provide a reference for future work on the function and evolution of 3D chromatin contact variation across human populations.
Collapse
Affiliation(s)
- Erin N Gilbertson
- Biomedical Informatics Graduate Program, University of California San Francisco, San Francisco, CA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA
| | - Colin M Brand
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA
- Department of Epidemiology and Biostatistics, University of California, San Francisco, CA
| | - Evonne McArthur
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN
- Department of Medicine, University of Washington, Seattle, WA
| | - David C Rinker
- Department of Biological Sciences, Vanderbilt University, Nashville, TN
| | - Shuzhen Kuang
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA
| | - Katherine S Pollard
- Biomedical Informatics Graduate Program, University of California San Francisco, San Francisco, CA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA
- Department of Epidemiology and Biostatistics, University of California, San Francisco, CA
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA
- Chan Zuckerberg Biohub SF, San Francisco, CA
| | - John A Capra
- Biomedical Informatics Graduate Program, University of California San Francisco, San Francisco, CA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA
- Department of Epidemiology and Biostatistics, University of California, San Francisco, CA
| |
Collapse
|
17
|
Link V, Schraiber JG, Fan C, Dinh B, Mancuso N, Chiang CWK, Edge MD. Tree-based QTL mapping with expected local genetic relatedness matrices. Am J Hum Genet 2023; 110:2077-2091. [PMID: 38065072 PMCID: PMC10716520 DOI: 10.1016/j.ajhg.2023.10.017] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Revised: 10/26/2023] [Accepted: 10/27/2023] [Indexed: 12/18/2023] Open
Abstract
Understanding the genetic basis of complex phenotypes is a central pursuit of genetics. Genome-wide association studies (GWASs) are a powerful way to find genetic loci associated with phenotypes. GWASs are widely and successfully used, but they face challenges related to the fact that variants are tested for association with a phenotype independently, whereas in reality variants at different sites are correlated because of their shared evolutionary history. One way to model this shared history is through the ancestral recombination graph (ARG), which encodes a series of local coalescent trees. Recent computational and methodological breakthroughs have made it feasible to estimate approximate ARGs from large-scale samples. Here, we explore the potential of an ARG-based approach to quantitative-trait locus (QTL) mapping, echoing existing variance-components approaches. We propose a framework that relies on the conditional expectation of a local genetic relatedness matrix (local eGRM) given the ARG. Simulations show that our method is especially beneficial for finding QTLs in the presence of allelic heterogeneity. By framing QTL mapping in terms of the estimated ARG, we can also facilitate the detection of QTLs in understudied populations. We use local eGRM to analyze two chromosomes containing known body size loci in a sample of Native Hawaiians. Our investigations can provide intuition about the benefits of using estimated ARGs in population- and statistical-genetic methods in general.
Collapse
Affiliation(s)
- Vivian Link
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Joshua G Schraiber
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Caoqi Fan
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Bryan Dinh
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Nicholas Mancuso
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Charleston W K Chiang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Michael D Edge
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
18
|
Gatzke-Kopp L, Keil A, Fabiani M. Diversity and representation. Psychophysiology 2023; 60:e14431. [PMID: 37840332 DOI: 10.1111/psyp.14431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 08/23/2023] [Indexed: 10/17/2023]
Affiliation(s)
- Lisa Gatzke-Kopp
- Human Development and Family Studies, The Pennsylvania State University, University Park, Pennsylvania, USA
| | - Andreas Keil
- Department of Psychology, University of Florida, Gainesville, Florida, USA
| | - Monica Fabiani
- Department of Psychology, University of Illinois Urbana-Champaign, Illinois, Champaign, USA
- Beckman Institute for Advanced Sciences and Technology, University of Illinois Urbana-Champaign, Illinois, Urbana, USA
| |
Collapse
|
19
|
Lewanski AL, Grundler MC, Bradburd GS. The era of the ARG: an empiricist's guide to ancestral recombination graphs. ARXIV 2023:arXiv:2310.12070v1. [PMID: 37904740 PMCID: PMC10614969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 11/01/2023]
Abstract
In the presence of recombination, the evolutionary relationships between a set of sampled genomes cannot be described by a single genealogical tree. Instead, the genomes are related by a complex, interwoven collection of genealogies formalized in a structure called an ancestral recombination graph (ARG). An ARG extensively encodes the ancestry of the genome(s) and thus is replete with valuable information for addressing diverse questions in evolutionary biology. Despite its potential utility, technological and methodological limitations, along with a lack of approachable literature, have severely restricted awareness and application of ARGs in empirical evolution research. Excitingly, recent progress in ARG reconstruction and simulation have made ARG-based approaches feasible for many questions and systems. In this review, we provide an accessible introduction and exploration of ARGs, survey recent methodological breakthroughs, and describe the potential for ARGs to further existing goals and open avenues of inquiry that were previously inaccessible in evolutionary genomics. Through this discussion, we aim to more widely disseminate the promise of ARGs in evolutionary genomics and encourage the broader development and adoption of ARG-based inference.
Collapse
Affiliation(s)
- Alexander L Lewanski
- Department of Integrative Biology, Michigan State University, East Lansing, MI, US
- W.K. Kellogg Biological Station, Michigan State University, Hickory Corners, MI, US
- Ecology, Evolution, and Behavior Program, Michigan State University, East Lansing, MI, US
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, US
| | - Michael C Grundler
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, US
| | - Gideon S Bradburd
- W.K. Kellogg Biological Station, Michigan State University, Hickory Corners, MI, US
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, US
| |
Collapse
|
20
|
Nait Saada J, Tsangalidou Z, Stricker M, Palamara PF. Inference of Coalescence Times and Variant Ages Using Convolutional Neural Networks. Mol Biol Evol 2023; 40:msad211. [PMID: 37738175 PMCID: PMC10581698 DOI: 10.1093/molbev/msad211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 09/11/2023] [Accepted: 09/18/2023] [Indexed: 09/24/2023] Open
Abstract
Accurate inference of the time to the most recent common ancestor (TMRCA) between pairs of individuals and of the age of genomic variants is key in several population genetic analyses. We developed a likelihood-free approach, called CoalNN, which uses a convolutional neural network to predict pairwise TMRCAs and allele ages from sequencing or SNP array data. CoalNN is trained through simulation and can be adapted to varying parameters, such as demographic history, using transfer learning. Across several simulated scenarios, CoalNN matched or outperformed the accuracy of model-based approaches for pairwise TMRCA and allele age prediction. We applied CoalNN to settings for which model-based approaches are under-developed and performed analyses to gain insights into the set of features it uses to perform TMRCA prediction. We next used CoalNN to analyze 2,504 samples from 26 populations in the 1,000 Genome Project data set, inferring the age of ∼80 million variants. We observed substantial variation across populations and for variants predicted to be pathogenic, reflecting heterogeneous demographic histories and the action of negative selection. We used CoalNN's predicted allele ages to construct genome-wide annotations capturing the signature of past negative selection. We performed LD-score regression analysis of heritability using summary association statistics from 63 independent complex traits and diseases (average N=314k), observing increased annotation-specific effects on heritability compared to a previous allele age annotation. These results highlight the effectiveness of using likelihood-free, simulation-trained models to infer properties of gene genealogies in large genomic data sets.
Collapse
Affiliation(s)
| | | | | | - Pier Francesco Palamara
- Department of Statistics, University of Oxford, Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| |
Collapse
|
21
|
Shpak M, Ghanavi HR, Lange JD, Pool JE, Stensmyr MC. Genomes from historical Drosophila melanogaster specimens illuminate adaptive and demographic changes across more than 200 years of evolution. PLoS Biol 2023; 21:e3002333. [PMID: 37824452 PMCID: PMC10569592 DOI: 10.1371/journal.pbio.3002333] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 09/11/2023] [Indexed: 10/14/2023] Open
Abstract
The ability to perform genomic sequencing on long-dead organisms is opening new frontiers in evolutionary research. These opportunities are especially notable in the case of museum collections, from which countless documented specimens may now be suitable for genomic analysis-if data of sufficient quality can be obtained. Here, we report 25 newly sequenced genomes from museum specimens of the model organism Drosophila melanogaster, including the oldest extant specimens of this species. By comparing historical samples ranging from the early 1800s to 1933 against modern-day genomes, we document evolution across thousands of generations, including time periods that encompass the species' initial occupation of northern Europe and an era of rapidly increasing human activity. We also find that the Lund, Sweden population underwent local genetic differentiation during the early 1800s to 1933 interval (potentially due to drift in a small population) but then became more similar to other European populations thereafter (potentially due to increased migration). Within each century-scale time period, our temporal sampling allows us to document compelling candidates for recent natural selection. In some cases, we gain insights regarding previously implicated selection candidates, such as ChKov1, for which our inferred timing of selection favors the hypothesis of antiviral resistance over insecticide resistance. Other candidates are novel, such as the circadian-related gene Ahcy, which yields a selection signal that rivals that of the DDT resistance gene Cyp6g1. These insights deepen our understanding of recent evolution in a model system, and highlight the potential of future museomic studies.
Collapse
Affiliation(s)
- Max Shpak
- Laboratory of Genetics, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | | | - Jeremy D. Lange
- Laboratory of Genetics, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - John E. Pool
- Laboratory of Genetics, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Marcus C. Stensmyr
- Department of Biology, Lund University, Lund, Scania, Sweden
- Max Planck Center on Next Generation Insect Chemical Ecology, Lund, Sweden
| |
Collapse
|
22
|
Salehi Nowbandegani P, Wohns AW, Ballard JL, Lander ES, Bloemendal A, Neale BM, O'Connor LJ. Extremely sparse models of linkage disequilibrium in ancestrally diverse association studies. Nat Genet 2023; 55:1494-1502. [PMID: 37640881 DOI: 10.1038/s41588-023-01487-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 07/24/2023] [Indexed: 08/31/2023]
Abstract
Linkage disequilibrium (LD) is the correlation among nearby genetic variants. In genetic association studies, LD is often modeled using large correlation matrices, but this approach is inefficient, especially in ancestrally diverse studies. In the present study, we introduce LD graphical models (LDGMs), which are an extremely sparse and efficient representation of LD. LDGMs are derived from genome-wide genealogies; statistical relationships among alleles in the LDGM correspond to genealogical relationships among haplotypes. We published LDGMs and ancestry-specific LDGM precision matrices for 18 million common variants (minor allele frequency >1%) in five ancestry groups, validated their accuracy and demonstrated order-of-magnitude improvements in runtime for commonly used LD matrix computations. We implemented an extremely fast multiancestry polygenic prediction method, BLUPx-ldgm, which performs better than a similar method based on the reference LD correlation matrix. LDGMs will enable sophisticated methods that scale to ancestrally diverse genetic association data across millions of variants and individuals.
Collapse
Affiliation(s)
- Pouria Salehi Nowbandegani
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| | - Anthony Wilder Wohns
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Stanford University School of Medicine, Stanford, CA, USA.
| | - Jenna L Ballard
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Graduate Group in Genomics and Computational Biology, University of Pennsylvania, Philadelphia, PA, USA
| | - Eric S Lander
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biology, MIT, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Alex Bloemendal
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Benjamin M Neale
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Luke J O'Connor
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
23
|
Harney É, Micheletti S, Bruwelheide KS, Freyman WA, Bryc K, Akbari A, Jewett E, Comer E, Louis Gates H, Heywood L, Thornton J, Curry R, Ancona Esselmann S, Barca KG, Sedig J, Sirak K, Olalde I, Adamski N, Bernardos R, Broomandkhoshbacht N, Ferry M, Qiu L, Stewardson K, Workman JN, Zalzala F, Mallick S, Micco A, Mah M, Zhang Z, Rohland N, Mountain JL, Owsley DW, Reich D. The genetic legacy of African Americans from Catoctin Furnace. Science 2023; 381:eade4995. [PMID: 37535739 PMCID: PMC10958645 DOI: 10.1126/science.ade4995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 06/20/2023] [Indexed: 08/05/2023]
Abstract
Few African Americans have been able to trace family lineages back to ancestors who died before the 1870 United States Census, the first in which all Black people were listed by name. We analyzed 27 individuals from Maryland's Catoctin Furnace African American Cemetery (1774-1850), identifying 41,799 genetic relatives among consenting research participants in 23andMe, Inc.'s genetic database. One of the highest concentrations of close relatives is in Maryland, suggesting that descendants of the Catoctin individuals remain in the area. We find that many of the Catoctin individuals derived African ancestry from the Wolof or Kongo groups and European ancestry from Great Britain and Ireland. This study demonstrates the power of joint analysis of historical DNA and large datasets generated through direct-to-consumer ancestry testing.
Collapse
Affiliation(s)
- Éadaoin Harney
- 23andMe, Inc.; Sunnyvale, CA 94043, USA
- Department of Human Evolutionary Biology, Harvard University; Cambridge, MA, 02138, USA
| | | | - Karin S. Bruwelheide
- Department of Anthropology, National Museum of Natural History, Smithsonian Institution; Washington DC 20560, USA
| | | | | | - Ali Akbari
- Department of Human Evolutionary Biology, Harvard University; Cambridge, MA, 02138, USA
- Department of Genetics, Harvard Medical School; Boston, MA, 02115, USA
| | | | - Elizabeth Comer
- Catoctin Furnace Historical Society; Thurmont, MD, 21788, USA
| | - Henry Louis Gates
- Hutchins Center for African and African American Research, Harvard University; Cambridge, MA 02138, USA
| | - Linda Heywood
- Department of History/African American Studies, Boston University; Brookline, MA 02446, USA
| | - John Thornton
- Department of History/African American Studies, Boston University; Brookline, MA 02446, USA
| | - Roslyn Curry
- 23andMe, Inc.; Sunnyvale, CA 94043, USA
- Department of Human Evolutionary Biology, Harvard University; Cambridge, MA, 02138, USA
| | | | - Kathryn G. Barca
- Department of Anthropology, National Museum of Natural History, Smithsonian Institution; Washington DC 20560, USA
| | - Jakob Sedig
- Department of Human Evolutionary Biology, Harvard University; Cambridge, MA, 02138, USA
- Department of Genetics, Harvard Medical School; Boston, MA, 02115, USA
| | - Kendra Sirak
- Department of Human Evolutionary Biology, Harvard University; Cambridge, MA, 02138, USA
- Department of Genetics, Harvard Medical School; Boston, MA, 02115, USA
| | - Iñigo Olalde
- Department of Human Evolutionary Biology, Harvard University; Cambridge, MA, 02138, USA
- BIOMICs Research Group, Department of Zoology and Animal Cell Biology, University of the Basque Country UPV/EHU, Vitoria-Gasteiz, Spain
- Ikerbasque—Basque Foundation of Science, Bilbao, Spain
| | - Nicole Adamski
- Department of Genetics, Harvard Medical School; Boston, MA, 02115, USA
- Howard Hughes Medical Institute, Harvard Medical School; Boston, MA, 02115, USA
| | - Rebecca Bernardos
- Department of Genetics, Harvard Medical School; Boston, MA, 02115, USA
- Howard Hughes Medical Institute, Harvard Medical School; Boston, MA, 02115, USA
| | - Nasreen Broomandkhoshbacht
- Department of Genetics, Harvard Medical School; Boston, MA, 02115, USA
- Howard Hughes Medical Institute, Harvard Medical School; Boston, MA, 02115, USA
| | - Matthew Ferry
- Department of Genetics, Harvard Medical School; Boston, MA, 02115, USA
- Howard Hughes Medical Institute, Harvard Medical School; Boston, MA, 02115, USA
| | - Lijun Qiu
- Department of Genetics, Harvard Medical School; Boston, MA, 02115, USA
- Howard Hughes Medical Institute, Harvard Medical School; Boston, MA, 02115, USA
| | - Kristin Stewardson
- Department of Genetics, Harvard Medical School; Boston, MA, 02115, USA
- Howard Hughes Medical Institute, Harvard Medical School; Boston, MA, 02115, USA
| | - J. Noah Workman
- Department of Genetics, Harvard Medical School; Boston, MA, 02115, USA
- Howard Hughes Medical Institute, Harvard Medical School; Boston, MA, 02115, USA
| | - Fatma Zalzala
- Department of Genetics, Harvard Medical School; Boston, MA, 02115, USA
- Howard Hughes Medical Institute, Harvard Medical School; Boston, MA, 02115, USA
| | - Shop Mallick
- Department of Genetics, Harvard Medical School; Boston, MA, 02115, USA
- Howard Hughes Medical Institute, Harvard Medical School; Boston, MA, 02115, USA
- Broad Institute of MIT and Harvard; Cambridge, MA, 02142, USA
| | - Adam Micco
- Department of Genetics, Harvard Medical School; Boston, MA, 02115, USA
- Broad Institute of MIT and Harvard; Cambridge, MA, 02142, USA
| | - Matthew Mah
- Department of Genetics, Harvard Medical School; Boston, MA, 02115, USA
- Howard Hughes Medical Institute, Harvard Medical School; Boston, MA, 02115, USA
- Broad Institute of MIT and Harvard; Cambridge, MA, 02142, USA
| | - Zhao Zhang
- Department of Genetics, Harvard Medical School; Boston, MA, 02115, USA
| | | | - Nadin Rohland
- Department of Genetics, Harvard Medical School; Boston, MA, 02115, USA
| | | | - Douglas W. Owsley
- Department of Anthropology, National Museum of Natural History, Smithsonian Institution; Washington DC 20560, USA
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University; Cambridge, MA, 02138, USA
- Department of Genetics, Harvard Medical School; Boston, MA, 02115, USA
- Howard Hughes Medical Institute, Harvard Medical School; Boston, MA, 02115, USA
- Broad Institute of MIT and Harvard; Cambridge, MA, 02142, USA
| |
Collapse
|
24
|
Ragsdale AP, Thornton KR. Multiple Sources of Uncertainty Confound Inference of Historical Human Generation Times. Mol Biol Evol 2023; 40:msad160. [PMID: 37450583 PMCID: PMC10404577 DOI: 10.1093/molbev/msad160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 07/05/2023] [Accepted: 07/07/2023] [Indexed: 07/18/2023] Open
Abstract
Wang et al. (2023) recently proposed an approach to infer the history of human generation intervals from changes in mutation profiles over time. As the relative proportions of different mutation types depend on the ages of parents, binning variants by the time they arose allows for the inference of changes in average paternal and maternal generation intervals. Applying this approach to published allele age estimates, Wang et al. (2023) inferred long-lasting sex differences in average generation times and surprisingly found that ancestral generation times of West African populations remained substantially higher than those of Eurasian populations extending tens of thousands of generations into the past. Here, we argue that the results and interpretations in Wang et al. (2023) are primarily driven by noise and biases in input data and a lack of validation using independent approaches for estimating allele ages. With the recent development of methods to reconstruct genome-wide gene genealogies, coalescence times, and allele ages, we caution that downstream analyses may be strongly influenced by uncharacterized biases in their output.
Collapse
Affiliation(s)
- Aaron P Ragsdale
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, WI, USA
| | - Kevin R Thornton
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA, USA
| |
Collapse
|
25
|
Kandel AW, Sommer C, Kanaeva Z, Bolus M, Bruch AA, Groth C, Haidle MN, Hertler C, Heß J, Malina M, Märker M, Hochschild V, Mosbrugger V, Schrenk F, Conard NJ. The ROCEEH Out of Africa Database (ROAD): A large-scale research database serves as an indispensable tool for human evolutionary studies. PLoS One 2023; 18:e0289513. [PMID: 37527270 PMCID: PMC10393170 DOI: 10.1371/journal.pone.0289513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 07/19/2023] [Indexed: 08/03/2023] Open
Abstract
Large scale databases are critical for helping scientists decipher long-term patterns in human evolution. This paper describes the conception and development of such a research database and illustrates how big data can be harnessed to formulate new ideas about the past. The Role of Culture in Early Expansions of Humans (ROCEEH) is a transdisciplinary research center whose aim is to study the origins of culture and the multifaceted aspects of human expansions across Africa and Eurasia over the last three million years. To support its research, the ROCEEH team developed an online tool named the ROCEEH Out of Africa Database (ROAD) and implemented its web-based applications. ROAD integrates geographical data as well as archaeological, paleoanthropological, paleontological and paleobotanical content within a robust chronological framework. In fact, a unique feature of ROAD is its ability to dynamically link scientific data both spatially and temporally, thereby allowing its reuse in ways that were not originally conceived. The data stem from published sources spanning the last 150 years, including those generated by the research team. Descriptions of these data rely on the development of a standardized vocabulary and profit from online explanations of each table and attribute. By synthesizing legacy data, ROAD facilitates the reuse of heritage data in novel ways. Database queries yield structured information in a variety of interoperable formats. By visualizing data on maps, users can explore this vast dataset and develop their own theories. By downloading data, users can conduct further quantitative analyses, for example with Geographic Information Systems, modeling programs and artificial intelligence. In this paper, we demonstrate the innovative nature of ROAD and show how it helps scientists studying human evolution to access datasets from different fields, thereby connecting the social and natural sciences. Because it permits the reuse of "old" data in new ways, ROAD is now an indispensable tool for researchers of human evolution and paleogeography.
Collapse
Affiliation(s)
- Andrew W Kandel
- The Role of Culture in Early Expansions of Humans, Heidelberg Academy of Sciences and Humanities, Tübingen, Germany
| | - Christian Sommer
- The Role of Culture in Early Expansions of Humans, Heidelberg Academy of Sciences and Humanities, Tübingen, Germany
| | - Zara Kanaeva
- The Role of Culture in Early Expansions of Humans, Heidelberg Academy of Sciences and Humanities, Tübingen, Germany
| | - Michael Bolus
- The Role of Culture in Early Expansions of Humans, Heidelberg Academy of Sciences and Humanities, Tübingen, Germany
- Department of Geosciences, Working Group Early Prehistory and Quaternary Ecology, University of Tübingen, Tübingen, Germany
| | - Angela A Bruch
- The Role of Culture in Early Expansions of Humans, Senckenberg Forschungsinstitut, Heidelberg Academy of Sciences and Humanities, Frankfurt/Main, Germany
| | - Claudia Groth
- The Role of Culture in Early Expansions of Humans, Senckenberg Forschungsinstitut, Heidelberg Academy of Sciences and Humanities, Frankfurt/Main, Germany
| | - Miriam N Haidle
- Department of Geosciences, Working Group Early Prehistory and Quaternary Ecology, University of Tübingen, Tübingen, Germany
- The Role of Culture in Early Expansions of Humans, Senckenberg Forschungsinstitut, Heidelberg Academy of Sciences and Humanities, Frankfurt/Main, Germany
| | - Christine Hertler
- The Role of Culture in Early Expansions of Humans, Senckenberg Forschungsinstitut, Heidelberg Academy of Sciences and Humanities, Frankfurt/Main, Germany
| | - Julia Heß
- The Role of Culture in Early Expansions of Humans, Senckenberg Forschungsinstitut, Heidelberg Academy of Sciences and Humanities, Frankfurt/Main, Germany
| | - Maria Malina
- The Role of Culture in Early Expansions of Humans, Heidelberg Academy of Sciences and Humanities, Tübingen, Germany
| | - Michael Märker
- Department of Earth and Environmental Sciences, University of Pavia, Pavia, Italy
- Working Group on "Soil Erosion and Feedbacks", Leibniz Centre for Agricultural Landscape Research (ZALF), Müncheberg, Germany
| | - Volker Hochschild
- Institute of Geography, Department of Geosciences, University of Tübingen, Tübingen, Germany
| | - Volker Mosbrugger
- The Role of Culture in Early Expansions of Humans, Senckenberg Forschungsinstitut, Heidelberg Academy of Sciences and Humanities, Frankfurt/Main, Germany
| | - Friedemann Schrenk
- The Role of Culture in Early Expansions of Humans, Senckenberg Forschungsinstitut, Heidelberg Academy of Sciences and Humanities, Frankfurt/Main, Germany
| | - Nicholas J Conard
- Department of Geosciences, Working Group Early Prehistory and Quaternary Ecology, University of Tübingen, Tübingen, Germany
| |
Collapse
|
26
|
Hernández CL. Mitochondrial DNA in Human Diversity and Health: From the Golden Age to the Omics Era. Genes (Basel) 2023; 14:1534. [PMID: 37628587 PMCID: PMC10453943 DOI: 10.3390/genes14081534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 07/21/2023] [Accepted: 07/24/2023] [Indexed: 08/27/2023] Open
Abstract
Mitochondrial DNA (mtDNA) is a small fraction of our hereditary material. However, this molecule has had an overwhelming presence in scientific research for decades until the arrival of high-throughput studies. Several appealing properties justify the application of mtDNA to understand how human populations are-from a genetic perspective-and how individuals exhibit phenotypes of biomedical importance. Here, I review the basics of mitochondrial studies with a focus on the dawn of the field, analysis methods and the connection between two sides of mitochondrial genetics: anthropological and biomedical. The particularities of mtDNA, with respect to inheritance pattern, evolutionary rate and dependence on the nuclear genome, explain the challenges of associating mtDNA composition and diseases. Finally, I consider the relevance of this single locus in the context of omics research. The present work may serve as a tribute to a tool that has provided important insights into the past and present of humankind.
Collapse
Affiliation(s)
- Candela L Hernández
- Department of Biodiversity, Ecology and Evolution, Faculty of Biological Sciences, Complutense University of Madrid, 28040 Madrid, Spain
| |
Collapse
|
27
|
Anderson-Trocmé L, Nelson D, Zabad S, Diaz-Papkovich A, Kryukov I, Baya N, Touvier M, Jeffery B, Dina C, Vézina H, Kelleher J, Gravel S. On the genes, genealogies, and geographies of Quebec. Science 2023; 380:849-855. [PMID: 37228217 DOI: 10.1126/science.add5300] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 04/24/2023] [Indexed: 05/27/2023]
Abstract
Population genetic models only provide coarse representations of real-world ancestry. We used a pedigree compiled from 4 million parish records and genotype data from 2276 French and 20,451 French Canadian individuals to finely model and trace French Canadian ancestry through space and time. The loss of ancestral French population structure and the appearance of spatial and regional structure highlights a wide range of population expansion models. Geographic features shaped migrations, and we find enrichments for migration, genetic, and genealogical relatedness patterns within river networks across regions of Quebec. Finally, we provide a freely accessible simulated whole-genome sequence dataset with spatiotemporal metadata for 1,426,749 individuals reflecting intricate French Canadian population structure. Such realistic population-scale simulations provide opportunities to investigate population genetics at an unprecedented resolution.
Collapse
Affiliation(s)
- Luke Anderson-Trocmé
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- McGill University Genome Centre, Montreal, QC, Canada
| | - Dominic Nelson
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- McGill University Genome Centre, Montreal, QC, Canada
| | - Shadi Zabad
- School of Computer Science, McGill University, Montreal, QC, Canada
| | - Alex Diaz-Papkovich
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- Quantitative Life Sciences, McGill University, Montreal, QC, Canada
| | - Ivan Kryukov
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- McGill University Genome Centre, Montreal, QC, Canada
| | - Nikolas Baya
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Mathilde Touvier
- Sorbonne Paris Nord University, INSERM U1153, INRAE U1125, CNAM, Nutritional Epidemiology Research Team (EREN), Epidemiology and Statistics Research Center, University Paris Cité (CRESS), Bobigny, France
| | - Ben Jeffery
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Christian Dina
- Nantes Université, CNRS, INSERM, l'institut du thorax, Nantes, France
| | - Hélène Vézina
- BALSAC Project, Université du Québec á Chicoutimi, Chicoutimi, QC, Canada
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- McGill University Genome Centre, Montreal, QC, Canada
| |
Collapse
|
28
|
Zhang BC, Biddanda A, Gunnarsson ÁF, Cooper F, Palamara PF. Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits. Nat Genet 2023; 55:768-776. [PMID: 37127670 PMCID: PMC10181934 DOI: 10.1038/s41588-023-01379-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2021] [Accepted: 03/22/2023] [Indexed: 05/03/2023]
Abstract
Genome-wide genealogies compactly represent the evolutionary history of a set of genomes and inferring them from genetic data has the potential to facilitate a wide range of analyses. We introduce a method, ARG-Needle, for accurately inferring biobank-scale genealogies from sequencing or genotyping array data, as well as strategies to utilize genealogies to perform association and other complex trait analyses. We use these methods to build genome-wide genealogies using genotyping data for 337,464 UK Biobank individuals and test for association across seven complex traits. Genealogy-based association detects more rare and ultra-rare signals (N = 134, frequency range 0.0007-0.1%) than genotype imputation using ~65,000 sequenced haplotypes (N = 64). In a subset of 138,039 exome sequencing samples, these associations strongly tag (average r = 0.72) underlying sequencing variants enriched (4.8×) for loss-of-function variation. These results demonstrate that inferred genome-wide genealogies may be leveraged in the analysis of complex traits, complementing approaches that require the availability of large, population-specific sequencing panels.
Collapse
Affiliation(s)
- Brian C Zhang
- Department of Statistics, University of Oxford, Oxford, UK
| | - Arjun Biddanda
- Department of Statistics, University of Oxford, Oxford, UK
| | - Árni Freyr Gunnarsson
- Department of Statistics, University of Oxford, Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Fergus Cooper
- Department of Computer Science, University of Oxford, Oxford, UK
| | - Pier Francesco Palamara
- Department of Statistics, University of Oxford, Oxford, UK.
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.
| |
Collapse
|
29
|
Ringbauer H, Huang Y, Akbari A, Mallick S, Patterson N, Reich D. ancIBD - Screening for identity by descent segments in human ancient DNA. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.08.531671. [PMID: 36945531 PMCID: PMC10028887 DOI: 10.1101/2023.03.08.531671] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Long DNA sequences shared between two individuals, known as Identical by descent (IBD) segments, are a powerful signal for identifying close and distant biological relatives because they only arise when the pair shares a recent common ancestor. Existing methods to call IBD segments between present-day genomes cannot be straightforwardly applied to ancient DNA data (aDNA) due to typically low coverage and high genotyping error rates. We present ancIBD, a method to identify IBD segments for human aDNA data implemented as a Python package. Our approach is based on a Hidden Markov Model, using as input genotype probabilities imputed based on a modern reference panel of genomic variation. Through simulation and downsampling experiments, we demonstrate that ancIBD robustly identifies IBD segments longer than 8 centimorgan for aDNA data with at least either 0.25x average whole-genome sequencing (WGS) coverage depth or at least 1x average depth for in-solution enrichment experiments targeting a widely used aDNA SNP set ('1240k'). This application range allows us to screen a substantial fraction of the aDNA record for IBD segments and we showcase two downstream applications. First, leveraging the fact that biological relatives up to the sixth degree are expected to share multiple long IBD segments, we identify relatives between 10,156 ancient Eurasian individuals and document evidence of long-distance migration, for example by identifying a pair of two approximately fifth-degree relatives who were buried 1410km apart in Central Asia 5000 years ago. Second, by applying ancIBD, we reveal new details regarding the spread of ancestry related to Steppe pastoralists into Europe starting 5000 years ago. We find that the first individuals in Central and Northern Europe carrying high amounts of Steppe-ancestry, associated with the Corded Ware culture, share high rates of long IBD (12-25 cM) with Yamnaya herders of the Pontic-Caspian steppe, signaling a strong bottleneck and a recent biological connection on the order of only few hundred years, providing evidence that the Yamnaya themselves are a main source of Steppe ancestry in Corded Ware people. We also detect elevated sharing of long IBD segments between Corded Ware individuals and people associated with the Globular Amphora culture (GAC) from Poland and Ukraine, who were Copper Age farmers not yet carrying Steppe-like ancestry. These IBD links appear for all Corded Ware groups in our analysis, indicating that individuals related to GAC contexts must have had a major demographic impact early on in the genetic admixtures giving rise to various Corded Ware groups across Europe. These results show that detecting IBD segments in aDNA can generate new insights both on a small scale, relevant to understanding the life stories of people, and on the macroscale, relevant to large-scale cultural-historical events.
Collapse
Affiliation(s)
- Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Yilei Huang
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, Universität Leipzig, Leipzig, Germanÿ
| | - Ali Akbari
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Swapan Mallick
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Nick Patterson
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
30
|
Fan S, Spence JP, Feng Y, Hansen MEB, Terhorst J, Beltrame MH, Ranciaro A, Hirbo J, Beggs W, Thomas N, Nyambo T, Mpoloka SW, Mokone GG, Njamnshi A, Folkunang C, Meskel DW, Belay G, Song YS, Tishkoff SA. Whole-genome sequencing reveals a complex African population demographic history and signatures of local adaptation. Cell 2023; 186:923-939.e14. [PMID: 36868214 PMCID: PMC10568978 DOI: 10.1016/j.cell.2023.01.042] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 10/16/2022] [Accepted: 01/30/2023] [Indexed: 03/05/2023]
Abstract
We conduct high coverage (>30×) whole-genome sequencing of 180 individuals from 12 indigenous African populations. We identify millions of unreported variants, many predicted to be functionally important. We observe that the ancestors of southern African San and central African rainforest hunter-gatherers (RHG) diverged from other populations >200 kya and maintained a large effective population size. We observe evidence for ancient population structure in Africa and for multiple introgression events from "ghost" populations with highly diverged genetic lineages. Although currently geographically isolated, we observe evidence for gene flow between eastern and southern Khoesan-speaking hunter-gatherer populations lasting until ∼12 kya. We identify signatures of local adaptation for traits related to skin color, immune response, height, and metabolic processes. We identify a positively selected variant in the lightly pigmented San that influences pigmentation in vitro by regulating the enhancer activity and gene expression of PDPK1.
Collapse
Affiliation(s)
- Shaohua Fan
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China; Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jeffrey P Spence
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Yuanqing Feng
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Matthew E B Hansen
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jonathan Terhorst
- Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Marcia H Beltrame
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Alessia Ranciaro
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jibril Hirbo
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - William Beggs
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Neil Thomas
- Computer Science Division, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Thomas Nyambo
- Department of Biochemistry, Kampala International University in Tanzania, P.O. Box 9790, Dar es Salaam, Tanzania
| | - Sununguko Wata Mpoloka
- Department of Biological Sciences, Faculty of Science, University of Botswana Gaborone, Private Bag UB 0022, Gaborone, Botswana
| | - Gaonyadiwe George Mokone
- Department of Biomedical Sciences, Faculty of Medicine, University of Botswana Gaborone, Private Bag UB 0022, Gaborone, Botswana
| | - Alfred Njamnshi
- Department of Neurology, Central Hospital Yaoundé; Brain Research Africa Initiative (BRAIN), Neuroscience Lab, Faculty of Medicine and Biomedical Sciences, The University of Yaoundé I, P.O. Box 337, Yaoundé, Cameroon
| | - Charles Folkunang
- Department of Pharmacotoxicology and Pharmacokinetics, Faculty of Medicine and Biomedical Sciences, The University of Yaoundé I, P.O. Box 337, Yaoundé, Cameroon
| | - Dawit Wolde Meskel
- Department of Microbial Cellular and Molecular Biology, Addis Ababa University, P.O. Box 1176, Addis Ababa, Ethiopia
| | - Gurja Belay
- Department of Microbial Cellular and Molecular Biology, Addis Ababa University, P.O. Box 1176, Addis Ababa, Ethiopia
| | - Yun S Song
- Computer Science Division, University of California, Berkeley, Berkeley, CA 94720, USA; Department of Statistics, University of California, Berkeley, Berkeley, CA 94720, USA; Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| | - Sarah A Tishkoff
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
31
|
Browning BL, Browning SR. Statistical phasing of 150,119 sequenced genomes in the UK Biobank. Am J Hum Genet 2023; 110:161-165. [PMID: 36450278 PMCID: PMC9892698 DOI: 10.1016/j.ajhg.2022.11.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 11/08/2022] [Indexed: 12/03/2022] Open
Abstract
The first release of UK Biobank whole-genome sequence data contains 150,119 genomes. We present an open-source pipeline for filtering, phasing, and indexing these genomes on the cloud-based UK Biobank Research Analysis Platform. This pipeline makes it possible to apply haplotype-based methods to UK Biobank whole-genome sequence data. The pipeline uses BCFtools for marker filtering, Beagle for genotype phasing, and Tabix for VCF indexing. We used the pipeline to phase 406 million single-nucleotide variants on chromosomes 1-22 and X at a cost of £2,309. The maximum time required to process a chromosome was 2.6 days. In order to assess phase accuracy, we modified the pipeline to exclude trio parents. We observed a switch error rate of 0.0016 on chromosome 20 in the White British trio offspring. If we exclude markers with nonmajor allele frequency < 0.1% after phasing, this switch error rate decreases by 80% to 0.00032.
Collapse
Affiliation(s)
- Brian L Browning
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA; Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
| | - Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
32
|
Davranoglou LR, Embirikos L. Toad zoonyms mirror the linguistic and demographic history of Greece. PLoS One 2023; 18:e0283136. [PMID: 36989260 PMCID: PMC10057758 DOI: 10.1371/journal.pone.0283136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Accepted: 03/02/2023] [Indexed: 03/30/2023] Open
Abstract
The common toad (Bufo bufo) has been the subject of many folk tales and superstitions in Western Europe, and as a result, it is characterised by numerous common names (zoonyms). However, the zoonyms of the toad and its associated traditions have remained unexplored in the Balkans, one of Europe's linguistic hotspots. In the present study, it was attempted to fill this knowledge gap by focusing on Greece, where more than 7.700 individuals were interviewed both in the field and through online platforms, in order to document toad zoonyms from all varieties and dialects of Greek, as well as local non-Greek languages such as Arvanitika, South Slavic dialects, and Vlach. It was found that the academically unattested zoonyms of the toad provide an unmatched and previously unexplored linguistic and ethnographic tool, as they reflect the linguistic, demographic, and historical processes that shaped modern Greece. This is particularly pertinent in the 21st century, when a majority of the country's dialects and languages are in danger of imminent extinction-and some have already gone silent. Overall, the present study shows the significance of recording zoonyms of indigenous and threatened languages as excellent linguistic and ethnographic tools that safeguard our planet's ethnolinguistic diversity and enhance our understanding on how pre-industrial communities interacted with their local fauna. Furthermore, in contrast to all other European countries, which only possess one or only a few zoonyms for the toad, the Greek world boasts an unmatched 37 zoonyms, which attest to its role as a linguistic hotspot.
Collapse
Affiliation(s)
| | - Leonidas Embirikos
- Oxford University Museum of Natural History, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
33
|
Recombination-aware phylogeographic inference using the structured coalescent with ancestral recombination. PLoS Comput Biol 2022; 18:e1010422. [PMID: 35984849 PMCID: PMC9447913 DOI: 10.1371/journal.pcbi.1010422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 09/06/2022] [Accepted: 07/21/2022] [Indexed: 11/19/2022] Open
Abstract
Movement of individuals between populations or demes is often restricted, especially between geographically isolated populations. The structured coalescent provides an elegant theoretical framework for describing how movement between populations shapes the genealogical history of sampled individuals and thereby structures genetic variation within and between populations. However, in the presence of recombination an individual may inherit different regions of their genome from different parents, resulting in a mosaic of genealogical histories across the genome, which can be represented by an Ancestral Recombination Graph (ARG). In this case, different genomic regions may have different ancestral histories and so different histories of movement between populations. Recombination therefore poses an additional challenge to phylogeographic methods that aim to reconstruct the movement of individuals from genealogies, although also a potential benefit in that different loci may contain additional information about movement. Here, we introduce the Structured Coalescent with Ancestral Recombination (SCAR) model, which builds on recent approximations to the structured coalescent by incorporating recombination into the ancestry of sampled individuals. The SCAR model allows us to infer how the migration history of sampled individuals varies across the genome from ARGs, and improves estimation of key population genetic parameters such as population sizes, recombination rates and migration rates. Using the SCAR model, we explore the potential and limitations of phylogeographic inference using full ARGs. We then apply the SCAR to lineages of the recombining fungus Aspergillus flavus sampled across the United States to explore patterns of recombination and migration across the genome. Phylogeographic methods are widely used to reconstruct the historical movement of individuals between different populations. When applied to infectious pathogens, these methods are often used to reconstruct the origin or source of novel pathogen lineages. Most existing phylogeographic methods reconstruct movement based on a single phylogenetic tree, which is assumed to reflect the genetic ancestry of all sampled individuals. However in populations undergoing recombination, genetic material can be exchanged between lineages such that individuals may inherit different regions of their genome from different ancestors. In this case, phylogenetic relationships among individuals can only be captured by a reticulated network rather than any single tree. Ancestral Recombination Graphs (ARGs) provide one way of capturing these reticulate relationships and we develop new models that allow for demographic inference of historical population sizes, recombination rates and migration rates between subpopulations from ARGs. By accounting for recombination, our models not only allow for accurate demographic inference, but can take full advantage of the additional information contained in ARGs about how ancestry varies across genomes to more precisely reconstruct the movement of genetic material between populations.
Collapse
|
34
|
Brucato N, André M, Hudjashov G, Mondal M, Cox MP, Leavesley M, Ricaut FX. Chronology of natural selection in Oceanian genomes. iScience 2022; 25:104583. [PMID: 35880026 PMCID: PMC9308150 DOI: 10.1016/j.isci.2022.104583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 05/11/2022] [Accepted: 06/07/2022] [Indexed: 11/30/2022] Open
Abstract
As human populations left Asia to first settle in Oceania around 50,000 years ago, they entered a territory ecologically separated from the Old World for millions of years. We analyzed genomic data of 239 modern Oceanian individuals to detect and date signals of selection specific to this region. Combining both relative and absolute dating approaches, we identified a strong selection pattern between 52,000 and 54,000 years ago in the genomes of descendants of the first settlers of Sahul. This strikingly corresponds to the dates of initial settlement as inferred from archaeological evidence. Loci under selection during this period, some showing enrichment in Denisovan ancestry, overlap genes involved in the immune response and diet, especially based on plants. Pathogens and natural resources, especially from endemic plants, therefore appear to have acted as strong selective pressures on the genomes of the first settlers of Sahul. 239 human genomes from both sides of the Wallacean ecogeographical barriers Signals of selection are dated between -54,000 to -52,000 in modern Oceanian genomes Genes related to immunity and diet were under strong selection Denisovan introgressions participated to the genetic adaptations present in Oceanians
Collapse
Affiliation(s)
- Nicolas Brucato
- Laboratoire Évolution et Diversité Biologique (EDB UMR 5174), Université de Toulouse Midi-Pyrénées, CNRS, IRD, UPS. 118 route de Narbonne, Bat 4R1, 31062 cedex 9 Toulouse, France
| | - Mathilde André
- Laboratoire Évolution et Diversité Biologique (EDB UMR 5174), Université de Toulouse Midi-Pyrénées, CNRS, IRD, UPS. 118 route de Narbonne, Bat 4R1, 31062 cedex 9 Toulouse, France.,Institute of Genomics, University of Tartu, Tartu, 51010 Tartumaa, Estonia
| | - Georgi Hudjashov
- Institute of Genomics, University of Tartu, Tartu, 51010 Tartumaa, Estonia
| | - Mayukh Mondal
- Institute of Genomics, University of Tartu, Tartu, 51010 Tartumaa, Estonia
| | - Murray P Cox
- School of Natural Sciences, Massey University, Palmerston North 4442, New Zealand
| | - Matthew Leavesley
- Strand of Anthropology, Sociology and Archaeology, School of Humanities and Social Sciences, University of Papua New Guinea, PO Box 320, National Capital District 134, Papua New Guinea.,College of Arts, Society and Education, James Cook University, P.O. Box 6811, Cairns, QLD 4870, Australia.,ARC Centre of Excellence for Australian Biodiversity and Heritage, University of Wollongong, Wollongong, NSW 2522, Australia
| | - François-Xavier Ricaut
- Laboratoire Évolution et Diversité Biologique (EDB UMR 5174), Université de Toulouse Midi-Pyrénées, CNRS, IRD, UPS. 118 route de Narbonne, Bat 4R1, 31062 cedex 9 Toulouse, France
| |
Collapse
|
35
|
Rowe TB, Stafford TW, Fisher DC, Enghild JJ, Quigg JM, Ketcham RA, Sagebiel JC, Hanna R, Colbert MW. Human Occupation of the North American Colorado Plateau ∼37,000 Years Ago. Front Ecol Evol 2022. [DOI: 10.3389/fevo.2022.903795] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Calibrating human population dispersals across Earth’s surface is fundamental to assessing rates and timing of anthropogenic impacts and distinguishing ecological phenomena influenced by humans from those that were not. Here, we describe the Hartley mammoth locality, which dates to 38,900–36,250 cal BP by AMS 14C analysis of hydroxyproline from bone collagen. We accept the standard view that elaborate stone technology of the Eurasian Upper Paleolithic was introduced into the Americas by arrival of the Native American clade ∼16,000 cal BP. It follows that if older cultural sites exist in the Americas, they might only be diagnosed using nuanced taphonomic approaches. We employed computed tomography (CT and μCT) and other state-of-the-art methods that had not previously been applied to investigating ancient American sites. This revealed multiple lines of taphonomic evidence suggesting that two mammoths were butchered using expedient lithic and bone technology, along with evidence diagnostic of controlled (domestic) fire. That this may be an ancient cultural site is corroborated by independent genetic evidence of two founding populations for humans in the Americas, which has already raised the possibility of a dispersal into the Americas by people of East Asian ancestry that preceded the Native American clade by millennia. The Hartley mammoth locality thus provides a new deep point of chronologic reference for occupation of the Americas and the attainment by humans of a near-global distribution.
Collapse
|
36
|
Temporal mapping of derived high-frequency gene variants supports the mosaic nature of the evolution of Homo sapiens. Sci Rep 2022; 12:9937. [PMID: 35705575 PMCID: PMC9200848 DOI: 10.1038/s41598-022-13589-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 05/25/2022] [Indexed: 11/25/2022] Open
Abstract
Large-scale estimations of the time of emergence of variants are essential to examine hypotheses concerning human evolution with precision. Using an open repository of genetic variant age estimations, we offer here a temporal evaluation of various evolutionarily relevant datasets, such as Homo sapiens-specific variants, high-frequency variants found in genetic windows under positive selection, introgressed variants from extinct human species, as well as putative regulatory variants specific to various brain regions. We find a recurrent bimodal distribution of high-frequency variants, but also evidence for specific enrichments of gene categories in distinct time windows, pointing to different periods of phenotypic changes, resulting in a mosaic. With a temporal classification of genetic mutations in hand, we then applied a machine learning tool to predict what genes have changed more in certain time windows, and which tissues these genes may have impacted more. Overall, we provide a fine-grained temporal mapping of derived variants in Homo sapiens that helps to illuminate the intricate evolutionary history of our species.
Collapse
|
37
|
Y. C. Brandt D, Wei X, Deng Y, Vaughn AH, Nielsen R. Evaluation of methods for estimating coalescence times using ancestral recombination graphs. Genetics 2022; 221:iyac044. [PMID: 35333304 PMCID: PMC9071567 DOI: 10.1093/genetics/iyac044] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Accepted: 03/08/2022] [Indexed: 11/12/2022] Open
Abstract
The ancestral recombination graph is a structure that describes the joint genealogies of sampled DNA sequences along the genome. Recent computational methods have made impressive progress toward scalably estimating whole-genome genealogies. In addition to inferring the ancestral recombination graph, some of these methods can also provide ancestral recombination graphs sampled from a defined posterior distribution. Obtaining good samples of ancestral recombination graphs is crucial for quantifying statistical uncertainty and for estimating population genetic parameters such as effective population size, mutation rate, and allele age. Here, we use standard neutral coalescent simulations to benchmark the estimates of pairwise coalescence times from 3 popular ancestral recombination graph inference programs: ARGweaver, Relate, and tsinfer+tsdate. We compare (1) the true coalescence times to the inferred times at each locus; (2) the distribution of coalescence times across all loci to the expected exponential distribution; (3) whether the sampled coalescence times have the properties expected of a valid posterior distribution. We find that inferred coalescence times at each locus are most accurate in ARGweaver, and often more accurate in Relate than in tsinfer+tsdate. However, all 3 methods tend to overestimate small coalescence times and underestimate large ones. Lastly, the posterior distribution of ARGweaver is closer to the expected posterior distribution than Relate's, but this higher accuracy comes at a substantial trade-off in scalability. The best choice of method will depend on the number and length of input sequences and on the goal of downstream analyses, and we provide guidelines for the best practices.
Collapse
Affiliation(s)
- Débora Y. C. Brandt
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Xinzhu Wei
- Department of Computational Biology, Cornell University, Ithaca, NY 14850, USA
| | - Yun Deng
- Center for Computational Biology, University of California, Berkeley, CA 94720, USA
| | - Andrew H Vaughn
- Center for Computational Biology, University of California, Berkeley, CA 94720, USA
| | - Rasmus Nielsen
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, USA
- Center for Computational Biology, University of California, Berkeley, CA 94720, USA
- Department of Statistics, University of California Berkeley, Berkeley, CA 94720, USA
- GLOBE Institute, University of Copenhagen, Copenhagen K 1350, Denmark
| |
Collapse
|
38
|
Fan C, Mancuso N, Chiang CW. A genealogical estimate of genetic relationships. Am J Hum Genet 2022; 109:812-824. [PMID: 35417677 DOI: 10.1016/j.ajhg.2022.03.016] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 03/25/2022] [Indexed: 12/23/2022] Open
Abstract
The application of genetic relationships among individuals, characterized by a genetic relationship matrix (GRM), has far-reaching effects in human genetics. However, the current standard to calculate the GRM treats linked markers as independent and does not explicitly model the underlying genealogical history of the study sample. Here, we propose a coalescent-informed framework, namely the expected GRM (eGRM), to infer the expected relatedness between pairs of individuals given an ancestral recombination graph (ARG) of the sample. Through extensive simulations, we show that the eGRM is an unbiased estimate of latent pairwise genome-wide relatedness and is robust when computed with ARG inferred from incomplete genetic data. As a result, the eGRM better captures the structure of a population than the canonical GRM, even when using the same genetic information. More importantly, our framework allows a principled approach to estimate the eGRM at different time depths of the ARG, thereby revealing the time-varying nature of population structure in a sample. When applied to SNP array genotypes from a population sample from Northern and Eastern Finland, we find that clustering analysis with the eGRM reveals population structure driven by subpopulations that would not be apparent via the canonical GRM and that temporally the population model is consistent with recent divergence and expansion. Taken together, our proposed eGRM provides a robust tree-centric estimate of relatedness with wide application to genetic studies.
Collapse
|
39
|
Tang L. Human genealogical histories. Nat Methods 2022; 19:400. [PMID: 35396479 DOI: 10.1038/s41592-022-01471-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
40
|
Rees J, Andrés A. Inferring human evolutionary history. Science 2022; 375:817-818. [PMID: 35201893 DOI: 10.1126/science.abo0498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Unified genetic genealogy improves our understanding of how humans evolved.
Collapse
Affiliation(s)
- Jasmin Rees
- UCL Genetics Institute, Department of Genetics, Evolution and Environnment, University College London, London, UK.,Genetics and Genomic Medicine Programme, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Aida Andrés
- UCL Genetics Institute, Department of Genetics, Evolution and Environnment, University College London, London, UK.,Genetics and Genomic Medicine Programme, Great Ormond Street Institute of Child Health, University College London, London, UK
| |
Collapse
|