1
|
Lu DS, Peris D, Sønstebø JH, James TY, Rieseberg LH, Maurice S, Kauserud H, Ravinet M, Skrede I. Reticulate evolution and rapid development of reproductive barriers upon secondary contact in a forest fungus. Curr Biol 2024; 34:4513-4525.e6. [PMID: 39317194 DOI: 10.1016/j.cub.2024.08.046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 05/12/2024] [Accepted: 08/27/2024] [Indexed: 09/26/2024]
Abstract
Reproductive barriers between sister species of the mushroom-forming fungi tend to be stronger in sympatry, leading to speculation on whether they are being reinforced by selection against hybrids. We have used population genomic analyses together with in vitro crosses of a global sample of the wood decay fungus Trichaptum abietinum to investigate reproductive barriers within this species complex and the processes that have shaped them. Our phylogeographic analyses show that T. abietinum is delimited into six major genetic groups: one in Asia, two in Europe, and three in North America. The groups present in Europe are interfertile and admixed, whereas our crosses show that the North American groups are reproductively isolated. In Asia, a more complex pattern appears, with partial intersterility between subgroups that likely originated independently and more recently than the reproductive barriers in North America. We found pre-mating barriers in T. abietinum to be moderately correlated with genomic divergence, whereas mean growth reduction of the mated hybrids showed a strong correlation with increasing genomic divergence. Genome-wide association analyses identified candidate genes with programmed cell death annotations, which are known to be involved in intersterility in distantly related fungi, although their link here remains unproven. Our demographic modeling and phylogenetic network analyses fit a scenario where reproductive barriers in Trichaptum abietinum could have been reinforced upon secondary contact between groups that diverged in allopatry during the Pleistocene glacial cycles. Our combination of experimental and genomic approaches demonstrates how T. abietinum is a tractable system for studying speciation mechanisms.
Collapse
Affiliation(s)
- Dabao Sun Lu
- Department of Biosciences, University of Oslo, Blindernveien 31, 0371 Oslo, Norway.
| | - David Peris
- Department of Biosciences, University of Oslo, Blindernveien 31, 0371 Oslo, Norway; Department of Biotechnology, Institute of Agrochemistry and Food Biotechnology (IATA), CSIC, Carrer del Catedrático Agustín Escardino 7, 46980 Paterna, Valencia, Spain
| | - Jørn Henrik Sønstebø
- Department of Natural Sciences and Environmental Health, University of South-Eastern Norway, Gullbringvegen 36, 3800 Bø, Norway
| | - Timothy Y James
- Department of Ecology and Evolutionary Biology, University of Michigan, 105 North University Ave Biological Sciences Building, Ann Arbor, MI 48109-1085, USA
| | - Loren H Rieseberg
- Department of Botany and Biodiversity Research Centre, The University of British Columbia, 3156-6270 University Blvd., Vancouver, BC V6T 1Z4, Canada
| | - Sundy Maurice
- Department of Biosciences, University of Oslo, Blindernveien 31, 0371 Oslo, Norway
| | - Håvard Kauserud
- Department of Biosciences, University of Oslo, Blindernveien 31, 0371 Oslo, Norway
| | - Mark Ravinet
- Department of Biosciences, University of Oslo, Blindernveien 31, 0371 Oslo, Norway; School of Life Sciences, University of Nottingham, East Dr., Nottingham NG7 2TQ, UK
| | - Inger Skrede
- Department of Biosciences, University of Oslo, Blindernveien 31, 0371 Oslo, Norway.
| |
Collapse
|
2
|
Whitehouse LS, Ray DD, Schrider DR. Tree sequences as a general-purpose tool for population genetic inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.20.581288. [PMID: 39185244 PMCID: PMC11343121 DOI: 10.1101/2024.02.20.581288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
As population genetics data increases in size new methods have been developed to store genetic information in efficient ways, such as tree sequences. These data structures are computationally and storage efficient, but are not interchangeable with existing data structures used for many population genetic inference methodologies such as the use of convolutional neural networks (CNNs) applied to population genetic alignments. To better utilize these new data structures we propose and implement a graph convolutional network (GCN) to directly learn from tree sequence topology and node data, allowing for the use of neural network applications without an intermediate step of converting tree sequences to population genetic alignment format. We then compare our approach to standard CNN approaches on a set of previously defined benchmarking tasks including recombination rate estimation, positive selection detection, introgression detection, and demographic model parameter inference. We show that tree sequences can be directly learned from using a GCN approach and can be used to perform well on these common population genetics inference tasks with accuracies roughly matching or even exceeding that of a CNN-based method. As tree sequences become more widely used in population genetics research we foresee developments and optimizations of this work to provide a foundation for population genetics inference moving forward.
Collapse
|
3
|
Temple SD, Waples RK, Browning SR. Modeling recent positive selection using identity-by-descent segments. Am J Hum Genet 2024:S0002-9297(24)00333-1. [PMID: 39362217 DOI: 10.1016/j.ajhg.2024.08.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 08/29/2024] [Accepted: 08/30/2024] [Indexed: 10/05/2024] Open
Abstract
Recent positive selection can result in an excess of long identity-by-descent (IBD) haplotype segments overlapping a locus. The statistical methods that we propose here address three major objectives in studying selective sweeps: scanning for regions of interest, identifying possible sweeping alleles, and estimating a selection coefficient s. First, we implement a selection scan to locate regions with excess IBD rates. Second, we estimate the allele frequency and location of an unknown sweeping allele by aggregating over variants that are more abundant in an inferred outgroup with excess IBD rate versus the rest of the sample. Third, we propose an estimator for the selection coefficient and quantify uncertainty using the parametric bootstrap. Comparing against state-of-the-art methods in extensive simulations, we show that our methods are more precise at estimating s when s≥0.015. We also show that our 95% confidence intervals contain s in nearly 95% of our simulations. We apply these methods to study positive selection in European ancestry samples from the Trans-Omics for Precision Medicine project. We analyze eight loci where IBD rates are more than four standard deviations above the genome-wide median, including LCT where the maximum IBD rate is 35 standard deviations above the genome-wide median. Overall, we present robust and accurate approaches to study recent adaptive evolution without knowing the identity of the causal allele or using time series data.
Collapse
Affiliation(s)
- Seth D Temple
- Department of Statistics, University of Washington, Seattle, WA, USA.
| | - Ryan K Waples
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
| |
Collapse
|
4
|
Schraiber JG, Spence JP, Edge MD. Estimation of demography and mutation rates from one million haploid genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.18.613708. [PMID: 39345369 PMCID: PMC11429810 DOI: 10.1101/2024.09.18.613708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
As genetic sequencing costs have plummeted, datasets with sizes previously un-thinkable have begun to appear. Such datasets present new opportunities to learn about evolutionary history, particularly via rare alleles that record the very recent past. However, beyond the computational challenges inherent in the analysis of many large-scale datasets, large population-genetic datasets present theoretical problems. In particular, the majority of population-genetic tools require the assumption that each mutant allele in the sample is the result of a single mutation (the "infinite sites" assumption), which is violated in large samples. Here, we present DR EVIL, a method for estimating mutation rates and recent demographic history from very large samples. DR EVIL avoids the infinite-sites assumption by using a diffusion approximation to a branching-process model with recurrent mutation. The branching-process approach limits the method to rare alleles, but, along with recent results, renders tractable likelihoods with recurrent mutation. We show that DR EVIL performs well in simulations and apply it to rare-variant data from a million haploid samples, identifying a signal of mutation-rate heterogeneity within commonly analyzed classes and predicting that in modern sample sizes, most rare variants at sites with high mutation rates represent the descendants of multiple mutation events.
Collapse
|
5
|
Akbari A, Barton AR, Gazal S, Li Z, Kariminejad M, Perry A, Zeng Y, Mittnik A, Patterson N, Mah M, Zhou X, Price AL, Lander ES, Pinhasi R, Rohland N, Mallick S, Reich D. Pervasive findings of directional selection realize the promise of ancient DNA to elucidate human adaptation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.14.613021. [PMID: 39314480 PMCID: PMC11419161 DOI: 10.1101/2024.09.14.613021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
We present a method for detecting evidence of natural selection in ancient DNA time-series data that leverages an opportunity not utilized in previous scans: testing for a consistent trend in allele frequency change over time. By applying this to 8433 West Eurasians who lived over the past 14000 years and 6510 contemporary people, we find an order of magnitude more genome-wide significant signals than previous studies: 347 independent loci with >99% probability of selection. Previous work showed that classic hard sweeps driving advantageous mutations to fixation have been rare over the broad span of human evolution, but in the last ten millennia, many hundreds of alleles have been affected by strong directional selection. Discoveries include an increase from ~0% to ~20% in 4000 years for the major risk factor for celiac disease at HLA-DQB1; a rise from ~0% to ~8% in 6000 years of blood type B; and fluctuating selection at the TYK2 tuberculosis risk allele rising from ~2% to ~9% from ~5500 to ~3000 years ago before dropping to ~3%. We identify instances of coordinated selection on alleles affecting the same trait, with the polygenic score today predictive of body fat percentage decreasing by around a standard deviation over ten millennia, consistent with the "Thrifty Gene" hypothesis that a genetic predisposition to store energy during food scarcity became disadvantageous after farming. We also identify selection for combinations of alleles that are today associated with lighter skin color, lower risk for schizophrenia and bipolar disease, slower health decline, and increased measures related to cognitive performance (scores on intelligence tests, household income, and years of schooling). These traits are measured in modern industrialized societies, so what phenotypes were adaptive in the past is unclear. We estimate selection coefficients at 9.9 million variants, enabling study of how Darwinian forces couple to allelic effects and shape the genetic architecture of complex traits.
Collapse
Affiliation(s)
- Ali Akbari
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Alison R Barton
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Steven Gazal
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Zheng Li
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | | | - Annabel Perry
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Yating Zeng
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Alissa Mittnik
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Nick Patterson
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Matthew Mah
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Xiang Zhou
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Alkes L Price
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Eric S Lander
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Ron Pinhasi
- Department of Biology, Massachusetts Institute of Technology (MIT), Cambridge, MA, USA
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria
| | - Nadin Rohland
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Swapan Mallick
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - David Reich
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
6
|
Sellinger T, Johannes F, Tellier A. Improved inference of population histories by integrating genomic and epigenomic data. eLife 2024; 12:RP89470. [PMID: 39264367 PMCID: PMC11392530 DOI: 10.7554/elife.89470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2024] Open
Abstract
With the availability of high-quality full genome polymorphism (SNPs) data, it becomes feasible to study the past demographic and selective history of populations in exquisite detail. However, such inferences still suffer from a lack of statistical resolution for recent, for example bottlenecks, events, and/or for populations with small nucleotide diversity. Additional heritable (epi)genetic markers, such as indels, transposable elements, microsatellites, or cytosine methylation, may provide further, yet untapped, information on the recent past population history. We extend the Sequential Markovian Coalescent (SMC) framework to jointly use SNPs and other hyper-mutable markers. We are able to (1) improve the accuracy of demographic inference in recent times, (2) uncover past demographic events hidden to SNP-based inference methods, and (3) infer the hyper-mutable marker mutation rates under a finite site model. As a proof of principle, we focus on demographic inference in Arabidopsis thaliana using DNA methylation diversity data from 10 European natural accessions. We demonstrate that segregating single methylated polymorphisms (SMPs) satisfy the modeling assumptions of the SMC framework, while differentially methylated regions (DMRs) are not suitable as their length exceeds that of the genomic distance between two recombination events. Combining SNPs and SMPs while accounting for site- and region-level epimutation processes, we provide new estimates of the glacial age bottleneck and post-glacial population expansion of the European A. thaliana population. Our SMC framework readily accounts for a wide range of heritable genomic markers, thus paving the way for next-generation inference of evolutionary history by combining information from several genetic and epigenetic markers.
Collapse
Affiliation(s)
- Thibaut Sellinger
- Professorship for Population Genetics, Department of Life Science Systems, Technical University of Munich, Munich, Germany
- Department of Environment and Biodiversity, Paris Lodron University of Salzburg, Salzburg, Austria
| | - Frank Johannes
- Professorship for Plant Epigenomics, Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany
| | - Aurélien Tellier
- Professorship for Population Genetics, Department of Life Science Systems, Technical University of Munich, Munich, Germany
| |
Collapse
|
7
|
Wong Y, Ignatieva A, Koskela J, Gorjanc G, Wohns AW, Kelleher J. A general and efficient representation of ancestral recombination graphs. Genetics 2024; 228:iyae100. [PMID: 39013109 PMCID: PMC11373519 DOI: 10.1093/genetics/iyae100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 06/05/2024] [Indexed: 07/18/2024] Open
Abstract
As a result of recombination, adjacent nucleotides can have different paths of genetic inheritance and therefore the genealogical trees for a sample of DNA sequences vary along the genome. The structure capturing the details of these intricately interwoven paths of inheritance is referred to as an ancestral recombination graph (ARG). Classical formalisms have focused on mapping coalescence and recombination events to the nodes in an ARG. However, this approach is out of step with some modern developments, which do not represent genetic inheritance in terms of these events or explicitly infer them. We present a simple formalism that defines an ARG in terms of specific genomes and their intervals of genetic inheritance, and show how it generalizes these classical treatments and encompasses the outputs of recent methods. We discuss nuances arising from this more general structure, and argue that it forms an appropriate basis for a software standard in this rapidly growing field.
Collapse
Affiliation(s)
- Yan Wong
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| | - Anastasia Ignatieva
- School of Mathematics and Statistics, University of Glasgow, Glasgow G12 8TA, UK
- Department of Statistics, University of Oxford, Oxford OX1 3LB, UK
| | - Jere Koskela
- School of Mathematics, Statistics and Physics, Newcastle University, Newcastle NE1 7RU, UK
- Department of Statistics, University of Warwick, Coventry CV4 7AL, UK
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh EH25 9RG, UK
| | - Anthony W Wohns
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305-5101, USA
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| |
Collapse
|
8
|
DeHaas D, Pan Z, Wei X. Genotype Representation Graphs: Enabling Efficient Analysis of Biobank-Scale Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.23.590800. [PMID: 38712040 PMCID: PMC11071416 DOI: 10.1101/2024.04.23.590800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Computational analysis of a large number of genomes requires a data structure that can represent the dataset compactly while also enabling efficient operations on variants and samples. Current practice is to store large-scale genetic polymorphism data using tabular data structures and file formats, where rows and columns represent samples and genetic variants. However, encoding genetic data in such formats has become unsustainable. For example, the UK Biobank polymorphism data of 200,000 phased whole genomes has exceeded 350 terabytes (TB) in Variant Call Format (VCF), cumbersome and inefficient to work with. To mitigate the computational burden, we introduce the Genotype Representation Graph (GRG), an extremely compact data structure to losslessly present phased whole-genome polymorphisms. A GRG is a fully connected hierarchical graph that exploits variant-sharing across samples, leveraging ideas inspired by Ancestral Recombination Graphs. Capturing variant-sharing in a multitree structure compresses biobank-scale human data to the point where it can fit in a typical server's RAM (5-26 gigabytes (GB) per chromosome), and enables graph-traversal algorithms to trivially reuse computed values, both of which can significantly reduce computation time. We have developed a command-line tool and a library usable via both C++ and Python for constructing and processing GRG files which scales to a million whole genomes. It takes 160GB disk space to encode the information in 200,000 UK Biobank phased whole genomes as a GRG, more than 13 times smaller than the size of compressed VCF. We show that summaries of genetic variants such as allele frequency and association effect can be computed on GRG via graph traversal that runs significantly faster than all tested alternatives, including vcf.gz, PLINK BED, tree sequence, XSI, and Savvy. Furthermore, GRG is particularly suitable for doing repeated calculations and interactive data analysis. We anticipate that GRG-based algorithms will improve the scalability of various types of computation and generally lower the cost of analyzing large genomic datasets.
Collapse
Affiliation(s)
- Drew DeHaas
- Department of Computational Biology, Cornell University, Ithaca, NY
| | - Ziqing Pan
- Department of Computational Biology, Cornell University, Ithaca, NY
| | - Xinzhu Wei
- Department of Computational Biology, Cornell University, Ithaca, NY
| |
Collapse
|
9
|
Grundler MC, Terhorst J, Bradburd GS. A geographic history of human genetic ancestry. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.27.586858. [PMID: 38585733 PMCID: PMC10996620 DOI: 10.1101/2024.03.27.586858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Describing the distribution of genetic variation across individuals is a fundamental goal of population genetics. In humans, traditional approaches for describing population genetic variation often rely on discrete genetic ancestry labels, which, despite their utility, can obscure the complex, multi-faceted nature of human genetic history. These labels risk oversimplifying ancestry by ignoring its temporal depth and geographic continuity, and may therefore conflate notions of race, ethnicity, geography, and genetic ancestry. Here, we present a method that capitalizes on the rich genealogical information encoded in genomic tree sequences to infer the geographic locations of the shared ancestors of a sample of sequenced individuals. We use this method to infer the geographic history of genetic ancestry of a set of human genomes sampled from Europe, Asia, and Africa, accurately recovering major population movements on those continents. Our findings demonstrate the importance of defining the spatial-temporal context of genetic ancestry to describing human genetic variation and caution against the oversimplified interpretations of genetic data prevalent in contemporary discussions of race and ancestry.
Collapse
Affiliation(s)
- Michael C Grundler
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| | - Jonathan Terhorst
- Department of Statistics, University of Michigan, Ann Arbor, MI, USA
| | - Gideon S Bradburd
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
10
|
Vaughn AH, Nielsen R. Fast and Accurate Estimation of Selection Coefficients and Allele Histories from Ancient and Modern DNA. Mol Biol Evol 2024; 41:msae156. [PMID: 39078618 PMCID: PMC11321360 DOI: 10.1093/molbev/msae156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Revised: 07/02/2024] [Accepted: 07/10/2024] [Indexed: 07/31/2024] Open
Abstract
We here present CLUES2, a full-likelihood method to infer natural selection from sequence data that is an extension of the method CLUES. We make several substantial improvements to the CLUES method that greatly increases both its applicability and its speed. We add the ability to use ancestral recombination graphs on ancient data as emissions to the underlying hidden Markov model, which enables CLUES2 to use both temporal and linkage information to make estimates of selection coefficients. We also fully implement the ability to estimate distinct selection coefficients in different epochs, which allows for the analysis of changes in selective pressures through time, as well as selection with dominance. In addition, we greatly increase the computational efficiency of CLUES2 over CLUES using several approximations to the forward-backward algorithms and develop a new way to reconstruct historic allele frequencies by integrating over the uncertainty in the estimation of the selection coefficients. We illustrate the accuracy of CLUES2 through extensive simulations and validate the importance sampling framework for integrating over the uncertainty in the inference of gene trees. We also show that CLUES2 is well-calibrated by showing that under the null hypothesis, the distribution of log-likelihood ratios follows a χ2 distribution with the appropriate degrees of freedom. We run CLUES2 on a set of recently published ancient human data from Western Eurasia and test for evidence of changing selection coefficients through time. We find significant evidence of changing selective pressures in several genes correlated with the introduction of agriculture to Europe and the ensuing dietary and demographic shifts of that time. In particular, our analysis supports previous hypotheses of strong selection on lactase persistence during periods of ancient famines and attenuated selection in more modern periods.
Collapse
Affiliation(s)
- Andrew H Vaughn
- Center for Computational Biology, University of California, Berkeley, CA 94720, USA
| | - Rasmus Nielsen
- Departments of Integrative Biology and Statistics, University of California, Berkeley, CA 94720, USA
- Center for GeoGenetics, University of Copenhagen, Copenhagen DK-1350, Denmark
| |
Collapse
|
11
|
Kamitaki N, Hujoel MLA, Mukamel RE, Gebara E, McCarroll SA, Loh PR. A sequence of SVA retrotransposon insertions in ASIP shaped human pigmentation. Nat Genet 2024; 56:1583-1591. [PMID: 39048794 PMCID: PMC11319198 DOI: 10.1038/s41588-024-01841-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 06/21/2024] [Indexed: 07/27/2024]
Abstract
Retrotransposons comprise about 45% of the human genome1, but their contributions to human trait variation and evolution are only beginning to be explored2,3. Here, we find that a sequence of SVA retrotransposon insertions in an early intron of the ASIP (agouti signaling protein) gene has probably shaped human pigmentation several times. In the UK Biobank (n = 169,641), a recent 3.3-kb SVA insertion polymorphism associated strongly with lighter skin pigmentation (0.22 [0.21-0.23] s.d.; P = 2.8 × 10-351) and increased skin cancer risk (odds ratio = 1.23 [1.18-1.27]; P = 1.3 × 10-28), appearing to underlie one of the strongest common genetic influences on these phenotypes within European populations4-6. ASIP expression in skin displayed the same association pattern, with the SVA insertion allele exhibiting 2.2-fold (1.9-2.6) increased expression. This effect had an unusual apparent mechanism: an earlier, nonpolymorphic, human-specific SVA retrotransposon 3.9 kb upstream appeared to have caused ASIP hypofunction by nonproductive splicing, which the new (polymorphic) SVA insertion largely eliminated. Extended haplotype homozygosity indicated that the insertion allele has risen to allele frequencies up to 11% in European populations over the past several thousand years. These results indicate that a sequence of retrotransposon insertions contributed to a species-wide increase, then a local decrease, of human pigmentation.
Collapse
Affiliation(s)
- Nolan Kamitaki
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| | - Margaux L A Hujoel
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ronen E Mukamel
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Edward Gebara
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Steven A McCarroll
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
| | - Po-Ru Loh
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
12
|
Taylor DJ, Eizenga JM, Li Q, Das A, Jenike KM, Kenny EE, Miga KH, Monlong J, McCoy RC, Paten B, Schatz MC. Beyond the Human Genome Project: The Age of Complete Human Genome Sequences and Pangenome References. Annu Rev Genomics Hum Genet 2024; 25:77-104. [PMID: 38663087 PMCID: PMC11451085 DOI: 10.1146/annurev-genom-021623-081639] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/29/2024]
Abstract
The Human Genome Project was an enormous accomplishment, providing a foundation for countless explorations into the genetics and genomics of the human species. Yet for many years, the human genome reference sequence remained incomplete and lacked representation of human genetic diversity. Recently, two major advances have emerged to address these shortcomings: complete gap-free human genome sequences, such as the one developed by the Telomere-to-Telomere Consortium, and high-quality pangenomes, such as the one developed by the Human Pangenome Reference Consortium. Facilitated by advances in long-read DNA sequencing and genome assembly algorithms, complete human genome sequences resolve regions that have been historically difficult to sequence, including centromeres, telomeres, and segmental duplications. In parallel, pangenomes capture the extensive genetic diversity across populations worldwide. Together, these advances usher in a new era of genomics research, enhancing the accuracy of genomic analysis, paving the path for precision medicine, and contributing to deeper insights into human biology.
Collapse
Affiliation(s)
- Dylan J Taylor
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| | - Jordan M Eizenga
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
| | - Arun Das
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
| | - Katharine M Jenike
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA;
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA;
| | - Karen H Miga
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Jean Monlong
- Institut de Recherche en Santé Digestive, Université de Toulouse, INSERM, INRA, ENVT, UPS, Toulouse, France;
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| | - Benedict Paten
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| |
Collapse
|
13
|
Wong TKF, Cherryh C, Rodrigo AG, Hahn MW, Minh BQ, Lanfear R. MAST: Phylogenetic Inference with Mixtures Across Sites and Trees. Syst Biol 2024; 73:375-391. [PMID: 38421146 PMCID: PMC11282360 DOI: 10.1093/sysbio/syae008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 12/18/2023] [Accepted: 02/27/2024] [Indexed: 03/02/2024] Open
Abstract
Hundreds or thousands of loci are now routinely used in modern phylogenomic studies. Concatenation approaches to tree inference assume that there is a single topology for the entire dataset, but different loci may have different evolutionary histories due to incomplete lineage sorting (ILS), introgression, and/or horizontal gene transfer; even single loci may not be treelike due to recombination. To overcome this shortcoming, we introduce an implementation of a multi-tree mixture model that we call mixtures across sites and trees (MAST). This model extends a prior implementation by Boussau et al. (2009) by allowing users to estimate the weight of each of a set of pre-specified bifurcating trees in a single alignment. The MAST model allows each tree to have its own weight, topology, branch lengths, substitution model, nucleotide or amino acid frequencies, and model of rate heterogeneity across sites. We implemented the MAST model in a maximum-likelihood framework in the popular phylogenetic software, IQ-TREE. Simulations show that we can accurately recover the true model parameters, including branch lengths and tree weights for a given set of tree topologies, under a wide range of biologically realistic scenarios. We also show that we can use standard statistical inference approaches to reject a single-tree model when data are simulated under multiple trees (and vice versa). We applied the MAST model to multiple primate datasets and found that it can recover the signal of ILS in the Great Apes, as well as the asymmetry in minor trees caused by introgression among several macaque species. When applied to a dataset of 4 Platyrrhine species for which standard concatenated maximum likelihood (ML) and gene tree approaches disagree, we observe that MAST gives the highest weight (i.e., the largest proportion of sites) to the tree also supported by gene tree approaches. These results suggest that the MAST model is able to analyze a concatenated alignment using ML while avoiding some of the biases that come with assuming there is only a single tree. We discuss how the MAST model can be extended in the future.
Collapse
Affiliation(s)
- Thomas K F Wong
- School of Computing, Australian National University, Canberra, ACT 2601, Australia
| | - Caitlin Cherryh
- Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| | - Allen G Rodrigo
- School of Biological Sciences, University of Auckland, Auckland 1142, New Zealand
| | - Matthew W Hahn
- Department of Biology and Department of Computer Science, Indiana University, Bloomington, Indiana 47405, USA
| | - Bui Quang Minh
- School of Computing, Australian National University, Canberra, ACT 2601, Australia
| | - Robert Lanfear
- Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| |
Collapse
|
14
|
Guo B, Takala-Harrison S, O’Connor TD. Benchmarking and Optimization of Methods for the Detection of Identity-By-Descent in High-Recombining Plasmodium falciparum Genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.04.592538. [PMID: 38746392 PMCID: PMC11092787 DOI: 10.1101/2024.05.04.592538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Genomic surveillance is crucial for identifying at-risk populations for targeted malaria control and elimination. Identity-by-descent (IBD) is increasingly being used in Plasmodium population genomics to estimate genetic relatedness, effective population size (N e ), population structure, and signals of positive selection. Despite its potential, a thorough evaluation of IBD segment detection tools for species with high recombination rates, such as P. falciparum, remains absent. Here, we perform comprehensive benchmarking of IBD callers - probabilistic (hmmIBD, isoRelate), identity-by-state-based (hap-IBD, phased IBD) and others (Refined IBD) - using population genetic simulations tailored for high recombination, and IBD quality metrics at both the IBD segment level and the IBD-based downstream inference level. Our results demonstrate that low marker density per genetic unit, related to high recombination relative to mutation, significantly compromises the accuracy of detected IBD segments. In genomes with high recombination rates resembling P. falciparum, most IBD callers exhibit high false negative rates for shorter IBD segments, which can be partially mitigated through optimization of IBD caller parameters, especially those related to marker density. Notably, IBD detected with optimized parameters allows for more accurate capture of selection signals and population structure; IBD-based N e inference is very sensitive to IBD detection errors, with IBD called from hmmIBD uniquely providing less biased estimates of N e in this context. Validation with empirical data from the MalariaGEN Pf 7 database, representing different transmission settings, corroborates these findings. We conclude that context-specific evaluation and parameter optimization are essential for accurate IBD detection in high-recombining species and recommend hmmIBD for quality-sensitive analysis, such as estimation of N e in these species. Our optimization and high-level benchmarking methods not only improve IBD segment detection in high-recombining genomes but also enhance overall genomic analysis, paving the way for more accurate genomic surveillance and targeted intervention strategies for malaria.
Collapse
Affiliation(s)
- Bing Guo
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD USA
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Shannon Takala-Harrison
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD USA
| | - Timothy D. O’Connor
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| |
Collapse
|
15
|
Marsh JI, Johri P. Biases in ARG-Based Inference of Historical Population Size in Populations Experiencing Selection. Mol Biol Evol 2024; 41:msae118. [PMID: 38874402 PMCID: PMC11245712 DOI: 10.1093/molbev/msae118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 06/05/2024] [Accepted: 06/11/2024] [Indexed: 06/15/2024] Open
Abstract
Inferring the demographic history of populations provides fundamental insights into species dynamics and is essential for developing a null model to accurately study selective processes. However, background selection and selective sweeps can produce genomic signatures at linked sites that mimic or mask signals associated with historical population size change. While the theoretical biases introduced by the linked effects of selection have been well established, it is unclear whether ancestral recombination graph (ARG)-based approaches to demographic inference in typical empirical analyses are susceptible to misinference due to these effects. To address this, we developed highly realistic forward simulations of human and Drosophila melanogaster populations, including empirically estimated variability of gene density, mutation rates, recombination rates, purifying, and positive selection, across different historical demographic scenarios, to broadly assess the impact of selection on demographic inference using a genealogy-based approach. Our results indicate that the linked effects of selection minimally impact demographic inference for human populations, although it could cause misinference in populations with similar genome architecture and population parameters experiencing more frequent recurrent sweeps. We found that accurate demographic inference of D. melanogaster populations by ARG-based methods is compromised by the presence of pervasive background selection alone, leading to spurious inferences of recent population expansion, which may be further worsened by recurrent sweeps, depending on the proportion and strength of beneficial mutations. Caution and additional testing with species-specific simulations are needed when inferring population history with non-human populations using ARG-based approaches to avoid misinference due to the linked effects of selection.
Collapse
Affiliation(s)
- Jacob I Marsh
- Department of Biology, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Parul Johri
- Department of Biology, University of North Carolina, Chapel Hill, NC 27599, USA
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
- Integrative Program for Biological and Genome Sciences, University of North Carolina, Chapel Hill, NC 27599, USA
| |
Collapse
|
16
|
Ma X, Lu Y, Xu S. Adaptive Evolution of Two Distinct Adaptive Haplotypes of Neanderthal Origin at the Immunoglobulin Heavy-chain Locus in East Asian and European Populations. Mol Biol Evol 2024; 41:msae147. [PMID: 39011558 DOI: 10.1093/molbev/msae147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 07/05/2024] [Accepted: 07/11/2024] [Indexed: 07/17/2024] Open
Abstract
Immunoglobulins (Igs) have a crucial role in humoral immunity. Two recent studies have reported a high-frequency Neanderthal-introgressed haplotype throughout Eurasia and a high-frequency Neanderthal-introgressed haplotype specific to southern East Asia at the immunoglobulin heavy-chain (IGH) gene locus on chromosome 14q32.33. Surprisingly, we found the previously reported high-frequency Neanderthal-introgressed haplotype does not exist throughout Eurasia. Instead, our study identified two distinct high-frequency haplotypes of putative Neanderthal origin in East Asia and Europe, although they shared introgressed alleles. Notably, the alleles of putative Neanderthal origin reduced the expression of IGHG1 and increased the expression of IGHG2 and IGHG3 in various tissues. These putatively introgressed alleles also affected the production of IgG1 upon antigen stimulation and increased the risk of systemic lupus erythematosus. Additionally, the greatest genetic differentiation across the whole genome between southern and northern East Asians was observed for the East Asian haplotype of putative Neanderthal origin. The frequency decreased from southern to northern East Asia and correlated positively with the genome-wide proportion of southern East Asian ancestry, indicating that this putative positive selection likely occurred in the common ancestor of southern East Asian populations before the admixture with northern East Asian populations.
Collapse
Affiliation(s)
- Xixian Ma
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China
| | - Yan Lu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Center for Evolutionary Biology, School of Life Sciences, Department of Liver Surgery and Transplantation Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, China
- Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai China
| | - Shuhua Xu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Center for Evolutionary Biology, School of Life Sciences, Department of Liver Surgery and Transplantation Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, China
| |
Collapse
|
17
|
Sun Z, Pan L, Tian A, Chen P. Critically-ill COVID-19 susceptibility gene CCR3 shows natural selection in sub-Saharan Africans. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2024; 121:105594. [PMID: 38636619 DOI: 10.1016/j.meegid.2024.105594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 03/28/2024] [Accepted: 04/15/2024] [Indexed: 04/20/2024]
Abstract
The prevalence of COVID-19 critical illness varies across ethnicities, with recent studies suggesting that genetic factors may contribute to this variation. The aim of this study was to investigate natural selection signals of genes associated with critically-ill COVID-19 in sub-Saharan Africans. Severe COVID-19 SNPs were obtained from the HGI website. Selection signals were assessed in 661 sub-Sahara Africans from 1000 Genomes Project using integrated haplotype score (iHS), cross-population extended haplotype homozygosity (XP-EHH), and fixation index (Fst). Allele frequency trajectory analysis of ancient DNA samples were used to validate the existing of selection in sub-Sahara Africans. We also used Mendelian randomization to decipher the correlation between natural selection and critically-ill COVID-19. We identified that CCR3 exhibited significant natural selection signals in sub-Sahara Africans. Within the CCR3 gene, rs17217831-A showed both high iHS (Standardized iHS = 2) and high XP-EHH (Standardized XP-EHH = 2.5) in sub-Sahara Africans. Allele frequency trajectory of CCR3 rs17217831-A revealed natural selection occurring in the recent 1,500 years. Natural selection resulted in increased CCR3 expression in sub-Sahara Africans. Mendelian Randomization provided evidence that increased blood CCR3 expression and eosinophil counts lowered the risk of critically ill COVID-19. Our findings suggest that sub-Saharan Africans are resistant to critically ill COVID-19 due to natural selection and identify CCR3 as a potential novel therapeutic target.
Collapse
Affiliation(s)
- Zewen Sun
- Department of Genetics, College of Basic Medical Sciences, Jilin University, Changchun, Jilin, China
| | - Lin Pan
- Department of Genetics, College of Basic Medical Sciences, Jilin University, Changchun, Jilin, China; The First Hospital of Jilin University, Changchun, Jilin 130021, China
| | - Aowen Tian
- Department of Pathology, College of Basic Medical Sciences, Jilin University, Changchun, Jilin, China; Key Laboratory of Pathobiology, Ministry of Education, Jilin University, Changchun, Jilin, China
| | - Peng Chen
- Department of Genetics, College of Basic Medical Sciences, Jilin University, Changchun, Jilin, China; Department of Pathology, College of Basic Medical Sciences, Jilin University, Changchun, Jilin, China; Key Laboratory of Pathobiology, Ministry of Education, Jilin University, Changchun, Jilin, China.
| |
Collapse
|
18
|
Rueda-M N, Pardo-Diaz C, Montejo-Kovacevich G, McMillan WO, Kozak KM, Arias CF, Ready J, McCarthy S, Durbin R, Jiggins CD, Meier JI, Salazar C. Genomic evidence reveals three W-autosome fusions in Heliconius butterflies. PLoS Genet 2024; 20:e1011318. [PMID: 39024186 PMCID: PMC11257349 DOI: 10.1371/journal.pgen.1011318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 05/24/2024] [Indexed: 07/20/2024] Open
Abstract
Sex chromosomes are evolutionarily labile in many animals and sometimes fuse with autosomes, creating so-called neo-sex chromosomes. Fusions between sex chromosomes and autosomes have been proposed to reduce sexual conflict and to promote adaptation and reproductive isolation among species. Recently, advances in genomics have fuelled the discovery of such fusions across the tree of life. Here, we discovered multiple fusions leading to neo-sex chromosomes in the sapho subclade of the classical adaptive radiation of Heliconius butterflies. Heliconius butterflies generally have 21 chromosomes with very high synteny. However, the five Heliconius species in the sapho subclade show large variation in chromosome number ranging from 21 to 60. We find that the W chromosome is fused with chromosome 4 in all of them. Two sister species pairs show subsequent fusions between the W and chromosomes 9 or 14, respectively. These fusions between autosomes and sex chromosomes make Heliconius butterflies an ideal system for studying the role of neo-sex chromosomes in adaptive radiations and the degeneration of sex chromosomes over time. Our findings emphasize the capability of short-read resequencing to detect genomic signatures of fusion events between sex chromosomes and autosomes even when sex chromosomes are not explicitly assembled.
Collapse
Affiliation(s)
- Nicol Rueda-M
- Biology Program, Faculty of Natural Sciences, Universidad del Rosario, Bogotá, Colombia
- Tree of Life Programme, Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Carolina Pardo-Diaz
- Biology Program, Faculty of Natural Sciences, Universidad del Rosario, Bogotá, Colombia
| | | | | | - Krzysztof M. Kozak
- Smithsonian Tropical Research Institute, Panama City, Panama
- Museum of Vertebrate Zoology, Berkeley, California, United States of America
| | - Carlos F. Arias
- Smithsonian Tropical Research Institute, Panama City, Panama
- Data Science Lab, Office of the Chief Information Officer, Smithsonian Institution, Washington, Washington DC, United States of America
| | - Jonathan Ready
- Institute for Biological Sciences, Federal University of Pará - UFPA, Belém, Brazil
- Centre for Advanced Studies of Biodiversity - CEABIO, Belém, Brazil
| | - Shane McCarthy
- Tree of Life Programme, Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Richard Durbin
- Tree of Life Programme, Wellcome Sanger Institute, Hinxton, United Kingdom
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - Chris D. Jiggins
- Department of Zoology, University of Cambridge, Cambridge, United Kingdom
| | - Joana I. Meier
- Tree of Life Programme, Wellcome Sanger Institute, Hinxton, United Kingdom
- Department of Zoology, University of Cambridge, Cambridge, United Kingdom
| | - Camilo Salazar
- Biology Program, Faculty of Natural Sciences, Universidad del Rosario, Bogotá, Colombia
| |
Collapse
|
19
|
Talenti A, Wilkinson T, Cook EA, Hemmink JD, Paxton E, Mutinda M, Ngulu SD, Jayaraman S, Bishop RP, Obara I, Hourlier T, Garcia Giron C, Martin FJ, Labuschagne M, Atimnedi P, Nanteza A, Keyyu JD, Mramba F, Caron A, Cornelis D, Chardonnet P, Fyumagwa R, Lembo T, Auty HK, Michaux J, Smitz N, Toye P, Robert C, Prendergast JGD, Morrison LJ. Continent-wide genomic analysis of the African buffalo (Syncerus caffer). Commun Biol 2024; 7:792. [PMID: 38951693 PMCID: PMC11217449 DOI: 10.1038/s42003-024-06481-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Accepted: 06/21/2024] [Indexed: 07/03/2024] Open
Abstract
The African buffalo (Syncerus caffer) is a wild bovid with a historical distribution across much of sub-Saharan Africa. Genomic analysis can provide insights into the evolutionary history of the species, and the key selective pressures shaping populations, including assessment of population level differentiation, population fragmentation, and population genetic structure. In this study we generated the highest quality de novo genome assembly (2.65 Gb, scaffold N50 69.17 Mb) of African buffalo to date, and sequenced a further 195 genomes from across the species distribution. Principal component and admixture analyses provided little support for the currently described four subspecies. Estimating Effective Migration Surfaces analysis suggested that geographical barriers have played a significant role in shaping gene flow and the population structure. Estimated effective population sizes indicated a substantial drop occurring in all populations 5-10,000 years ago, coinciding with the increase in human populations. Finally, signatures of selection were enriched for key genes associated with the immune response, suggesting infectious disease exert a substantial selective pressure upon the African buffalo. These findings have important implications for understanding bovid evolution, buffalo conservation and population management.
Collapse
Affiliation(s)
- Andrea Talenti
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, United Kingdom
- Centre for Tropical Livestock Genetics and Health (CTLGH), Roslin Institute, University of Edinburgh, Easter Bush Campus, Roslin, EH25 9RG, United Kingdom
| | - Toby Wilkinson
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, United Kingdom
- Centre for Tropical Livestock Genetics and Health (CTLGH), Roslin Institute, University of Edinburgh, Easter Bush Campus, Roslin, EH25 9RG, United Kingdom
| | - Elizabeth A Cook
- International Livestock Research Institute, P.O. Box 30709, Nairobi, 00100, Kenya
- Centre for Tropical Livestock Genetics and Health (CTLGH), ILRI Kenya, P.O. Box 30709, Nairobi, 00100, Kenya
| | - Johanneke D Hemmink
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, United Kingdom
- Centre for Tropical Livestock Genetics and Health (CTLGH), Roslin Institute, University of Edinburgh, Easter Bush Campus, Roslin, EH25 9RG, United Kingdom
- International Livestock Research Institute, P.O. Box 30709, Nairobi, 00100, Kenya
- Centre for Tropical Livestock Genetics and Health (CTLGH), ILRI Kenya, P.O. Box 30709, Nairobi, 00100, Kenya
| | - Edith Paxton
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, United Kingdom
| | - Matthew Mutinda
- Kenya Wildlife Service, P.O. Box 40241, Nairobi, 00100, Kenya
| | | | - Siddharth Jayaraman
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, United Kingdom
| | - Richard P Bishop
- International Livestock Research Institute, P.O. Box 30709, Nairobi, 00100, Kenya
| | - Isaiah Obara
- Institute for Parasitology and Tropical Veterinary Medicine, Freie Universität Berlin, Robert-von-Ostertag-Str. 7-13, 14163, Berlin, Germany
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, United Kingdom
| | - Carlos Garcia Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, United Kingdom
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, United Kingdom
| | | | | | - Anne Nanteza
- College of Veterinary Medicine, Animal Resources and Biosecurity, Makerere University, Kampala, Uganda
| | - Julius D Keyyu
- Tanzania Wildlife Research Institute, Box 661, Arusha, Tanzania
| | - Furaha Mramba
- Vector and Vector-Borne Diseases Institute, Tanga, Tanzania
| | - Alexandre Caron
- ASTRE, University of Montpellier (UMR), CIRAD, 34090, Montpellier, France
- CIRAD, UMR ASTRE, RP-PCP, Maputo, 01009, Mozambique
- Faculdade Veterinaria, Universidade Eduardo Mondlan, Maputo, Mozambique
| | - Daniel Cornelis
- CIRAD, Forêts et Sociétés, 34398, Montpellier, France
- Forêts et Sociétés, University of Montpellier, CIRAD, 34090, Montpellier, France
| | | | - Robert Fyumagwa
- Tanzania Wildlife Research Institute, Box 661, Arusha, Tanzania
| | - Tiziana Lembo
- School of Biodiversity, One Health and Veterinary Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom
| | - Harriet K Auty
- School of Biodiversity, One Health and Veterinary Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom
| | - Johan Michaux
- Laboratoire de Génétique de la Conservation, Institut de Botanique (Bat. 22), Université de Liège (Sart Tilman), Chemin de la Vallée 4, B4000, Liège, Belgium
| | - Nathalie Smitz
- Royal Museum for Central Africa (BopCo), Leuvensesteenweg 13, 3080, Tervuren, Belgium
| | - Philip Toye
- International Livestock Research Institute, P.O. Box 30709, Nairobi, 00100, Kenya
- Centre for Tropical Livestock Genetics and Health (CTLGH), ILRI Kenya, P.O. Box 30709, Nairobi, 00100, Kenya
| | - Christelle Robert
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, United Kingdom
- Centre for Tropical Livestock Genetics and Health (CTLGH), Roslin Institute, University of Edinburgh, Easter Bush Campus, Roslin, EH25 9RG, United Kingdom
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Crewe Road South, Edinburgh, EH4 2XU, United Kingdom
| | - James G D Prendergast
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, United Kingdom
- Centre for Tropical Livestock Genetics and Health (CTLGH), Roslin Institute, University of Edinburgh, Easter Bush Campus, Roslin, EH25 9RG, United Kingdom
| | - Liam J Morrison
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, United Kingdom.
- Centre for Tropical Livestock Genetics and Health (CTLGH), Roslin Institute, University of Edinburgh, Easter Bush Campus, Roslin, EH25 9RG, United Kingdom.
| |
Collapse
|
20
|
Yermakovich D, André M, Brucato N, Kariwiga J, Leavesley M, Pankratov V, Mondal M, Ricaut FX, Dannemann M. Denisovan admixture facilitated environmental adaptation in Papua New Guinean populations. Proc Natl Acad Sci U S A 2024; 121:e2405889121. [PMID: 38889149 PMCID: PMC11214076 DOI: 10.1073/pnas.2405889121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 05/16/2024] [Indexed: 06/20/2024] Open
Abstract
Neandertals and Denisovans, having inhabited distinct regions in Eurasia and possibly Oceania for over 200,000 y, experienced ample time to adapt to diverse environmental challenges these regions presented. Among present-day human populations, Papua New Guineans (PNG) stand out as one of the few carrying substantial amounts of both Neandertal and Denisovan DNA, a result of past admixture events with these archaic human groups. This study investigates the distribution of introgressed Denisovan and Neandertal DNA within two distinct PNG populations, residing in the highlands of Mt Wilhelm and the lowlands of Daru Island. These locations exhibit unique environmental features, some of which may parallel the challenges that archaic humans once confronted and adapted to. Our results show that PNG highlanders carry higher levels of Denisovan DNA compared to PNG lowlanders. Among the Denisovan-like haplotypes with higher frequencies in highlander populations, those exhibiting the greatest frequency difference compared to lowlander populations also demonstrate more pronounced differences in population frequencies than frequency-matched nonarchaic variants. Two of the five most highly differentiated of those haplotypes reside in genomic areas linked to brain development genes. Conversely, Denisovan-like haplotypes more frequent in lowlanders overlap with genes associated with immune response processes. Our findings suggest that Denisovan DNA has provided genetic variation associated with brain biology and immune response to PNG genomes, some of which might have facilitated adaptive processes to environmental challenges.
Collapse
Affiliation(s)
- Danat Yermakovich
- Center of Genomics, Evolution and Medicine, Institute of Genomics, University of Tartu, Tartu51010, Estonia
| | - Mathilde André
- Center of Genomics, Evolution and Medicine, Institute of Genomics, University of Tartu, Tartu51010, Estonia
| | - Nicolas Brucato
- Centre de Recherche sur la Biodiversité et l'Environnement, Université de Toulouse, Centre National de la Recherche Scientifique, Institut de Recherche pour le Développement, Toulouse Institut National Polytechnique, Université Toulouse 3–Paul Sabatier, cedex 9, Toulouse31062, France
| | - Jason Kariwiga
- Strand of Anthropology, Sociology and Archaeology, School of Humanities and Social Sciences, University of Papua New Guinea, PO Box 320, University 134, National Capital District, Papua New Guinea
- School of Social Science, University of Queensland, St. Lucia, QLD4072, Australia
| | - Matthew Leavesley
- Strand of Anthropology, Sociology and Archaeology, School of Humanities and Social Sciences, University of Papua New Guinea, PO Box 320, University 134, National Capital District, Papua New Guinea
- The Australian Research Council Centre of Excellence for Australian Biodiversity and Heritage & College of Arts, Society and Education, James Cook University, Cairns, QLD4870, Australia
| | - Vasili Pankratov
- Center of Genomics, Evolution and Medicine, Institute of Genomics, University of Tartu, Tartu51010, Estonia
| | - Mayukh Mondal
- Center of Genomics, Evolution and Medicine, Institute of Genomics, University of Tartu, Tartu51010, Estonia
- Institute of Clinical Molecular Biology, Christian-Albrechts-Universität zu Kiel, Kiel24118, Germany
| | - François-Xavier Ricaut
- Centre de Recherche sur la Biodiversité et l'Environnement, Université de Toulouse, Centre National de la Recherche Scientifique, Institut de Recherche pour le Développement, Toulouse Institut National Polytechnique, Université Toulouse 3–Paul Sabatier, cedex 9, Toulouse31062, France
| | - Michael Dannemann
- Center of Genomics, Evolution and Medicine, Institute of Genomics, University of Tartu, Tartu51010, Estonia
| |
Collapse
|
21
|
Soni V, Jensen JD. Temporal challenges in detecting balancing selection from population genomic data. G3 (BETHESDA, MD.) 2024; 14:jkae069. [PMID: 38551137 DOI: 10.1093/g3journal/jkae069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 12/21/2023] [Accepted: 03/19/2024] [Indexed: 04/28/2024]
Abstract
The role of balancing selection in maintaining genetic variation remains an open question in population genetics. Recent years have seen numerous studies identifying candidate loci potentially experiencing balancing selection, most predominantly in human populations. There are however numerous alternative evolutionary processes that may leave similar patterns of variation, thereby potentially confounding inference, and the expected signatures of balancing selection additionally change in a temporal fashion. Here we use forward-in-time simulations to quantify expected statistical power to detect balancing selection using both site frequency spectrum- and linkage disequilibrium-based methods under a variety of evolutionarily realistic null models. We find that whilst site frequency spectrum-based methods have little power immediately after a balanced mutation begins segregating, power increases with time since the introduction of the balanced allele. Conversely, linkage disequilibrium-based methods have considerable power whilst the allele is young, and power dissipates rapidly as the time since introduction increases. Taken together, this suggests that site frequency spectrum-based methods are most effective at detecting long-term balancing selection (>25N generations since the introduction of the balanced allele) whilst linkage disequilibrium-based methods are effective over much shorter timescales (<1N generations), thereby leaving a large time frame over which current methods have little power to detect the action of balancing selection. Finally, we investigate the extent to which alternative evolutionary processes may mimic these patterns, and demonstrate the need for caution in attempting to distinguish the signatures of balancing selection from those of both neutral processes (e.g. population structure and admixture) as well as of alternative selective processes (e.g. partial selective sweeps).
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ 85281, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ 85281, USA
| |
Collapse
|
22
|
Stankey CT, Bourges C, Haag LM, Turner-Stokes T, Piedade AP, Palmer-Jones C, Papa I, Silva Dos Santos M, Zhang Q, Cameron AJ, Legrini A, Zhang T, Wood CS, New FN, Randzavola LO, Speidel L, Brown AC, Hall A, Saffioti F, Parkes EC, Edwards W, Direskeneli H, Grayson PC, Jiang L, Merkel PA, Saruhan-Direskeneli G, Sawalha AH, Tombetti E, Quaglia A, Thorburn D, Knight JC, Rochford AP, Murray CD, Divakar P, Green M, Nye E, MacRae JI, Jamieson NB, Skoglund P, Cader MZ, Wallace C, Thomas DC, Lee JC. A disease-associated gene desert directs macrophage inflammation through ETS2. Nature 2024; 630:447-456. [PMID: 38839969 PMCID: PMC11168933 DOI: 10.1038/s41586-024-07501-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 05/01/2024] [Indexed: 06/07/2024]
Abstract
Increasing rates of autoimmune and inflammatory disease present a burgeoning threat to human health1. This is compounded by the limited efficacy of available treatments1 and high failure rates during drug development2, highlighting an urgent need to better understand disease mechanisms. Here we show how functional genomics could address this challenge. By investigating an intergenic haplotype on chr21q22-which has been independently linked to inflammatory bowel disease, ankylosing spondylitis, primary sclerosing cholangitis and Takayasu's arteritis3-6-we identify that the causal gene, ETS2, is a central regulator of human inflammatory macrophages and delineate the shared disease mechanism that amplifies ETS2 expression. Genes regulated by ETS2 were prominently expressed in diseased tissues and more enriched for inflammatory bowel disease GWAS hits than most previously described pathways. Overexpressing ETS2 in resting macrophages reproduced the inflammatory state observed in chr21q22-associated diseases, with upregulation of multiple drug targets, including TNF and IL-23. Using a database of cellular signatures7, we identified drugs that might modulate this pathway and validated the potent anti-inflammatory activity of one class of small molecules in vitro and ex vivo. Together, this illustrates the power of functional genomics, applied directly in primary human cells, to identify immune-mediated disease mechanisms and potential therapeutic opportunities.
Collapse
Affiliation(s)
- C T Stankey
- Genetic Mechanisms of Disease Laboratory, The Francis Crick Institute, London, UK
- Department of Immunology and Inflammation, Imperial College London, London, UK
- Washington University School of Medicine, St Louis, MO, USA
| | - C Bourges
- Genetic Mechanisms of Disease Laboratory, The Francis Crick Institute, London, UK
| | - L M Haag
- Division of Gastroenterology, Infectious Diseases and Rheumatology, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - T Turner-Stokes
- Genetic Mechanisms of Disease Laboratory, The Francis Crick Institute, London, UK
- Department of Immunology and Inflammation, Imperial College London, London, UK
| | - A P Piedade
- Genetic Mechanisms of Disease Laboratory, The Francis Crick Institute, London, UK
| | - C Palmer-Jones
- Department of Gastroenterology, Royal Free Hospital, London, UK
- Institute for Liver and Digestive Health, Division of Medicine, University College London, London, UK
| | - I Papa
- Genetic Mechanisms of Disease Laboratory, The Francis Crick Institute, London, UK
| | | | - Q Zhang
- Genomics of Inflammation and Immunity Group, Human Genetics Programme, Wellcome Sanger Institute, Hinxton, UK
| | - A J Cameron
- Wolfson Wohl Cancer Centre, School of Cancer Sciences, University of Glasgow, Glasgow, UK
| | - A Legrini
- Wolfson Wohl Cancer Centre, School of Cancer Sciences, University of Glasgow, Glasgow, UK
| | - T Zhang
- Wolfson Wohl Cancer Centre, School of Cancer Sciences, University of Glasgow, Glasgow, UK
| | - C S Wood
- Wolfson Wohl Cancer Centre, School of Cancer Sciences, University of Glasgow, Glasgow, UK
| | - F N New
- NanoString Technologies, Seattle, WA, USA
| | - L O Randzavola
- Department of Immunology and Inflammation, Imperial College London, London, UK
| | - L Speidel
- Ancient Genomics Laboratory, The Francis Crick Institute, London, UK
- Genetics Institute, University College London, London, UK
| | - A C Brown
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - A Hall
- The Sheila Sherlock Liver Centre, Royal Free Hospital, London, UK
- Department of Cellular Pathology, Royal Free Hospital, London, UK
| | - F Saffioti
- Institute for Liver and Digestive Health, Division of Medicine, University College London, London, UK
- The Sheila Sherlock Liver Centre, Royal Free Hospital, London, UK
| | - E C Parkes
- Genetic Mechanisms of Disease Laboratory, The Francis Crick Institute, London, UK
| | - W Edwards
- Cambridge Institute of Therapeutic Immunology and Infectious Disease, University of Cambridge, Cambridge, UK
| | - H Direskeneli
- Department of Internal Medicine, Division of Rheumatology, Marmara University, Istanbul, Turkey
| | - P C Grayson
- Systemic Autoimmunity Branch, NIAMS, National Institutes of Health, Bethesda, MD, USA
| | - L Jiang
- Department of Rheumatology, Zhongshan Hospital, Fudan University, Shanghai, China
| | - P A Merkel
- Division of Rheumatology, Department of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Division of Epidemiology, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - G Saruhan-Direskeneli
- Department of Physiology, Istanbul University, Istanbul Faculty of Medicine, Istanbul, Turkey
| | - A H Sawalha
- Division of Rheumatology, Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
- Division of Rheumatology and Clinical Immunology, Department of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Lupus Center of Excellence, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Immunology, University of Pittsburgh, Pittsburgh, PA, USA
| | - E Tombetti
- Department of Biomedical and Clinical Sciences, Milan University, Milan, Italy
- Internal Medicine and Rheumatology, ASST FBF-Sacco, Milan, Italy
| | - A Quaglia
- Department of Cellular Pathology, Royal Free Hospital, London, UK
- UCL Cancer Institute, London, UK
| | - D Thorburn
- Institute for Liver and Digestive Health, Division of Medicine, University College London, London, UK
- The Sheila Sherlock Liver Centre, Royal Free Hospital, London, UK
| | - J C Knight
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
- Chinese Academy of Medical Sciences Institute, Nuffield Department of Medicine, University of Oxford, Oxford, UK
- NIHR Comprehensive Biomedical Research Centre, Oxford, UK
| | - A P Rochford
- Department of Gastroenterology, Royal Free Hospital, London, UK
- Institute for Liver and Digestive Health, Division of Medicine, University College London, London, UK
| | - C D Murray
- Department of Gastroenterology, Royal Free Hospital, London, UK
- Institute for Liver and Digestive Health, Division of Medicine, University College London, London, UK
| | - P Divakar
- NanoString Technologies, Seattle, WA, USA
| | - M Green
- Experimental Histopathology STP, The Francis Crick Institute, London, UK
| | - E Nye
- Experimental Histopathology STP, The Francis Crick Institute, London, UK
| | - J I MacRae
- Metabolomics STP, The Francis Crick Institute, London, UK
| | - N B Jamieson
- Wolfson Wohl Cancer Centre, School of Cancer Sciences, University of Glasgow, Glasgow, UK
| | - P Skoglund
- Ancient Genomics Laboratory, The Francis Crick Institute, London, UK
| | - M Z Cader
- Cambridge Institute of Therapeutic Immunology and Infectious Disease, University of Cambridge, Cambridge, UK
- Department of Medicine, University of Cambridge, Cambridge, UK
| | - C Wallace
- Cambridge Institute of Therapeutic Immunology and Infectious Disease, University of Cambridge, Cambridge, UK
- MRC Biostatistics Unit, Cambridge Institute of Public Health, Cambridge, UK
| | - D C Thomas
- Cambridge Institute of Therapeutic Immunology and Infectious Disease, University of Cambridge, Cambridge, UK
- Department of Medicine, University of Cambridge, Cambridge, UK
| | - J C Lee
- Genetic Mechanisms of Disease Laboratory, The Francis Crick Institute, London, UK.
- Department of Gastroenterology, Royal Free Hospital, London, UK.
- Institute for Liver and Digestive Health, Division of Medicine, University College London, London, UK.
| |
Collapse
|
23
|
Peng D, Mulder OJ, Edge MD. Evaluating ARG-estimation methods in the context of estimating population-mean polygenic score histories. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.24.595829. [PMID: 38854009 PMCID: PMC11160635 DOI: 10.1101/2024.05.24.595829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Scalable methods for estimating marginal coalescent trees across the genome present new opportunities for studying evolution and have generated considerable excitement, with new methods extending scalability to thousands of samples. Benchmarking of the available methods has revealed general tradeoffs between accuracy and scalability, but performance in downstream applications has not always been easily predictable from general performance measures, suggesting that specific features of the ARG may be important for specific downstream applications of estimated ARGs. To exemplify this point, we benchmark ARG estimation methods with respect to a specific set of methods for estimating the historical time course of a population-mean polygenic score (PGS) using the marginal coalescent trees encoded by the ancestral recombination graph (ARG). Here we examine the performance in simulation of six ARG estimation methods: ARGweaver, RENT+, Relate, tsinfer+tsdate, ARG-Needle/ASMC-clust , and SINGER , using their estimated coalescent trees and examining bias, mean squared error (MSE), confidence interval coverage, and Type I and II error rates of the downstream methods. Although it does not scale to the sample sizes attainable by other new methods, SINGER produced the most accurate estimated PGS histories in many instances, even when Relate, tsinfer+tsdate , and ARG-Needle/ASMC-clust used samples ten times as large as those used by SINGER. In general, the best choice of method depends on the number of samples available and the historical time period of interest. In particular, the unprecedented sample sizes allowed by Relate, tsinfer+tsdate , and ARG-Needle/ASMC-clust are of greatest importance when the recent past is of interest-further back in time, most of the tree has coalesced, and differences in contemporary sample size are less salient.
Collapse
|
24
|
Goli RC, Chishi KG, Ganguly I, Singh S, Dixit S, Rathi P, Diwakar V, Sree C C, Limbalkar OM, Sukhija N, Kanaka K. Global and Local Ancestry and its Importance: A Review. Curr Genomics 2024; 25:237-260. [PMID: 39156729 PMCID: PMC11327809 DOI: 10.2174/0113892029298909240426094055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Revised: 03/02/2024] [Accepted: 03/11/2024] [Indexed: 08/20/2024] Open
Abstract
The fastest way to significantly change the composition of a population is through admixture, an evolutionary mechanism. In animal breeding history, genetic admixture has provided both short-term and long-term advantages by utilizing the phenomenon of complementarity and heterosis in several traits and genetic diversity, respectively. The traditional method of admixture analysis by pedigree records has now been replaced greatly by genome-wide marker data that enables more precise estimations. Among these markers, SNPs have been the popular choice since they are cost-effective, not so laborious, and automation of genotyping is easy. Certain markers can suggest the possibility of a population's origin from a sample of DNA where the source individual is unknown or unwilling to disclose their lineage, which are called Ancestry-Informative Markers (AIMs). Revealing admixture level at the locus-specific level is termed as local ancestry and can be exploited to identify signs of recent selective response and can account for genetic drift. Considering the importance of genetic admixture and local ancestry, in this mini-review, both concepts are illustrated, encompassing basics, their estimation/identification methods, tools/software used and their applications.
Collapse
Affiliation(s)
| | - Kiyevi G. Chishi
- ICAR-National Dairy Research Institute, Karnal, 132001, Haryana, India
| | - Indrajit Ganguly
- ICAR-National Bureau of Animal Genetic Resources, Karnal, 132001, Haryana, India
| | - Sanjeev Singh
- ICAR-National Bureau of Animal Genetic Resources, Karnal, 132001, Haryana, India
| | - S.P. Dixit
- ICAR-National Bureau of Animal Genetic Resources, Karnal, 132001, Haryana, India
| | - Pallavi Rathi
- ICAR-National Dairy Research Institute, Karnal, 132001, Haryana, India
| | - Vikas Diwakar
- ICAR-National Dairy Research Institute, Karnal, 132001, Haryana, India
| | - Chandana Sree C
- ICAR-National Dairy Research Institute, Karnal, 132001, Haryana, India
| | | | - Nidhi Sukhija
- ICAR-National Dairy Research Institute, Karnal, 132001, Haryana, India
- Central Tasar Research and Training Institute, Ranchi, 835303, Jharkhand, India
| | - K.K Kanaka
- ICAR- Indian Institute of Agricultural Biotechnology, Ranchi, 834010, Jharkhand, India
| |
Collapse
|
25
|
Diamantidis D, Fan WTL, Birkner M, Wakeley J. Bursts of coalescence within population pedigrees whenever big families occur. Genetics 2024; 227:iyae030. [PMID: 38408329 DOI: 10.1093/genetics/iyae030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 01/23/2024] [Accepted: 02/18/2024] [Indexed: 02/28/2024] Open
Abstract
We consider a simple diploid population-genetic model with potentially high variability of offspring numbers among individuals. Specifically, against a backdrop of Wright-Fisher reproduction and no selection, there is an additional probability that a big family occurs, meaning that a pair of individuals has a number of offspring on the order of the population size. We study how the pedigree of the population generated under this model affects the ancestral genetic process of a sample of size two at a single autosomal locus without recombination. Our population model is of the type for which multiple-merger coalescent processes have been described. We prove that the conditional distribution of the pairwise coalescence time given the random pedigree converges to a limit law as the population size tends to infinity. This limit law may or may not be the usual exponential distribution of the Kingman coalescent, depending on the frequency of big families. But because it includes the number and times of big families, it differs from the usual multiple-merger coalescent models. The usual multiple-merger coalescent models are seen as describing the ancestral process marginal to, or averaging over, the pedigree. In the limiting ancestral process conditional on the pedigree, the intervals between big families can be modeled using the Kingman coalescent but each big family causes a discrete jump in the probability of coalescence. Analogous results should hold for larger samples and other population models. We illustrate these results with simulations and additional analysis, highlighting their implications for inference and understanding of multilocus data.
Collapse
Affiliation(s)
| | - Wai-Tong Louis Fan
- Department of Mathematics, Indiana University, Bloomington, IN 47405, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Matthias Birkner
- Institut für Mathematik, Johannes-Gutenberg-Universität, 55099 Mainz, Germany
| | - John Wakeley
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
26
|
Alam O, Purugganan MD. Domestication and the evolution of crops: variable syndromes, complex genetic architectures, and ecological entanglements. THE PLANT CELL 2024; 36:1227-1241. [PMID: 38243576 PMCID: PMC11062453 DOI: 10.1093/plcell/koae013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 12/01/2023] [Accepted: 12/14/2023] [Indexed: 01/21/2024]
Abstract
Domestication can be considered a specialized mutualism in which a domesticator exerts control over the reproduction or propagation (fitness) of a domesticated species to gain resources or services. The evolution of crops by human-associated selection provides a powerful set of models to study recent evolutionary adaptations and their genetic bases. Moreover, the domestication and dispersal of crops such as rice, maize, and wheat during the Holocene transformed human social and political organization by serving as the key mechanism by which human societies fed themselves. Here we review major themes and identify emerging questions in three fundamental areas of crop domestication research: domestication phenotypes and syndromes, genetic architecture underlying crop evolution, and the ecology of domestication. Current insights on the domestication syndrome in crops largely come from research on cereal crops such as rice and maize, and recent work indicates distinct domestication phenotypes can arise from different domestication histories. While early studies on the genetics of domestication often identified single large-effect loci underlying major domestication traits, emerging evidence supports polygenic bases for many canonical traits such as shattering and plant architecture. Adaptation in human-constructed environments also influenced ecological traits in domesticates such as resource acquisition rates and interactions with other organisms such as root mycorrhizal fungi and pollinators. Understanding the ecological context of domestication will be key to developing resource-efficient crops and implementing more sustainable land management and cultivation practices.
Collapse
Affiliation(s)
- Ornob Alam
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
| | - Michael D Purugganan
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
- Center for Genomics and Systems Biology, New York University Abu Dhabi, Abu Dhabi, United Arab Emirates
- Institute for the Study of the Ancient World, New York University, New York, NY, 10028, USA
| |
Collapse
|
27
|
Wilson CG, Pieszko T, Nowell RW, Barraclough TG. Recombination in bdelloid rotifer genomes: asexuality, transfer and stress. Trends Genet 2024; 40:422-436. [PMID: 38458877 DOI: 10.1016/j.tig.2024.02.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/31/2024] [Accepted: 02/01/2024] [Indexed: 03/10/2024]
Abstract
Bdelloid rotifers constitute a class of microscopic animals living in freshwater habitats worldwide. Several strange features of bdelloids have drawn attention: their ability to tolerate desiccation and other stresses, a lack of reported males across the clade despite centuries of study, and unusually high numbers of horizontally acquired, non-metazoan genes. Genome sequencing is transforming our understanding of their lifestyle and its consequences, while in turn providing wider insights about recombination and genome organisation in animals. Many questions remain, not least how to reconcile apparent genomic signatures of sex with the continued absence of reported males, why bdelloids have so many horizontally acquired genes, and how their remarkable ability to survive stress interacts with recombination and other genomic processes.
Collapse
Affiliation(s)
- Christopher G Wilson
- Department of Biology, University of Oxford, 11a Mansfield Road, Oxford OX1 3SZ, UK.
| | - Tymoteusz Pieszko
- Department of Biology, University of Oxford, 11a Mansfield Road, Oxford OX1 3SZ, UK
| | - Reuben W Nowell
- Institute of Ecology and Evolution, Ashworth Laboratories, Charlotte Auerbach Road, Edinburgh EH9 3FL, UK; Biological and Environmental Sciences, School of Natural Sciences, University of Stirling, Stirling FK9 4LA, UK
| | | |
Collapse
|
28
|
André M, Brucato N, Hudjasov G, Pankratov V, Yermakovich D, Montinaro F, Kreevan R, Kariwiga J, Muke J, Boland A, Deleuze JF, Meyer V, Evans N, Cox MP, Leavesley M, Dannemann M, Org T, Metspalu M, Mondal M, Ricaut FX. Positive selection in the genomes of two Papua New Guinean populations at distinct altitude levels. Nat Commun 2024; 15:3352. [PMID: 38688933 PMCID: PMC11061283 DOI: 10.1038/s41467-024-47735-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 04/08/2024] [Indexed: 05/02/2024] Open
Abstract
Highlanders and lowlanders of Papua New Guinea have faced distinct environmental stress, such as hypoxia and environment-specific pathogen exposure, respectively. In this study, we explored the top genomics regions and the candidate driver SNPs for selection in these two populations using newly sequenced whole-genomes of 54 highlanders and 74 lowlanders. We identified two candidate SNPs under selection - one in highlanders, associated with red blood cell traits and another in lowlanders, which is associated with white blood cell count - both potentially influencing the heart rate of Papua New Guineans in opposite directions. We also observed four candidate driver SNPs that exhibit linkage disequilibrium with an introgressed haplotype, highlighting the need to explore the possibility of adaptive introgression within these populations. This study reveals that the signatures of positive selection in highlanders and lowlanders of Papua New Guinea align closely with the challenges they face, which are specific to their environments.
Collapse
Affiliation(s)
- Mathilde André
- Estonian Biocentre, Institute of Genomics, University of Tartu, Riia 23b, 51010, Tartu, Tartumaa, Estonia
- Centre for Genomics, Evolution & Medicine, Institute of Genomics, University of Tartu, Riia 23b, 51010, Tartu, Tartumaa, Estonia
| | - Nicolas Brucato
- Centre de Recherche sur la Biodiversité et l'Environnement (CRBE), Université de Toulouse, CNRS, IRD, Toulouse INP, Université Toulouse 3 - Paul Sabatier (UT3), Toulouse, France
| | - Georgi Hudjasov
- Centre for Genomics, Evolution & Medicine, Institute of Genomics, University of Tartu, Riia 23b, 51010, Tartu, Tartumaa, Estonia
| | - Vasili Pankratov
- Centre for Genomics, Evolution & Medicine, Institute of Genomics, University of Tartu, Riia 23b, 51010, Tartu, Tartumaa, Estonia
| | - Danat Yermakovich
- Centre for Genomics, Evolution & Medicine, Institute of Genomics, University of Tartu, Riia 23b, 51010, Tartu, Tartumaa, Estonia
| | - Francesco Montinaro
- Estonian Biocentre, Institute of Genomics, University of Tartu, Riia 23b, 51010, Tartu, Tartumaa, Estonia
- Department of Biosciences, Biotechnology and the Environment, University of Bari, Bari, Italy
| | - Rita Kreevan
- Centre for Genomics, Evolution & Medicine, Institute of Genomics, University of Tartu, Riia 23b, 51010, Tartu, Tartumaa, Estonia
| | - Jason Kariwiga
- Strand of Anthropology, Sociology and Archaeology, School of Humanities and Social Sciences, University of Papua New Guinea, University 134, PO Box 320, National Capital District, Papua New Guinea
- School of Social Science, University of Queensland, St Lucia, QLD, Australia
| | - John Muke
- Social Research Institute Ltd, Port Moresby, Papua New Guinea
| | - Anne Boland
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine (CNRGH), 91057, Evry, France
| | - Jean-François Deleuze
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine (CNRGH), 91057, Evry, France
| | - Vincent Meyer
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine (CNRGH), 91057, Evry, France
| | - Nicholas Evans
- ARC Centre of Excellence for the Dynamics of Language, Coombs Building, Fellows Road, CHL, CAP, Australian National University, Canberra, ACT, Australia
| | - Murray P Cox
- School of Natural Sciences, Massey University, Palmerston North, New Zealand
- Department of Statistics, University of Auckland, Auckland, New Zealand
| | - Matthew Leavesley
- Strand of Anthropology, Sociology and Archaeology, School of Humanities and Social Sciences, University of Papua New Guinea, University 134, PO Box 320, National Capital District, Papua New Guinea
- College of Arts, Society and Education, James Cook University, P.O. Box 6811, Cairns, QLD, 4870, Australia
- ARC Centre of Excellence for Australian Biodiversity and Heritage, University of Wollongong, Wollongong, NSW, 2522, Australia
| | - Michael Dannemann
- Centre for Genomics, Evolution & Medicine, Institute of Genomics, University of Tartu, Riia 23b, 51010, Tartu, Tartumaa, Estonia
| | - Tõnis Org
- Centre for Genomics, Evolution & Medicine, Institute of Genomics, University of Tartu, Riia 23b, 51010, Tartu, Tartumaa, Estonia
| | - Mait Metspalu
- Estonian Biocentre, Institute of Genomics, University of Tartu, Riia 23b, 51010, Tartu, Tartumaa, Estonia
| | - Mayukh Mondal
- Centre for Genomics, Evolution & Medicine, Institute of Genomics, University of Tartu, Riia 23b, 51010, Tartu, Tartumaa, Estonia.
- Institute of Clinical Molecular Biology, Christian-Albrechts-Universität zu Kiel, 24118, Kiel, Germany.
| | - François-Xavier Ricaut
- Centre de Recherche sur la Biodiversité et l'Environnement (CRBE), Université de Toulouse, CNRS, IRD, Toulouse INP, Université Toulouse 3 - Paul Sabatier (UT3), Toulouse, France.
| |
Collapse
|
29
|
Fujiwara K, Kubo S, Endo T, Takada T, Shiroishi T, Suzuki H, Osada N. Inference of selective forces on house mouse genomes during secondary contact in East Asia. Genome Res 2024; 34:366-375. [PMID: 38508692 PMCID: PMC11067880 DOI: 10.1101/gr.278828.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 03/15/2024] [Indexed: 03/22/2024]
Abstract
The house mouse (Mus musculus), which is commensal to humans, has spread globally via human activities, leading to secondary contact between genetically divergent subspecies. This pattern of genetic admixture can provide insights into the selective forces at play in this well-studied model organism. Our analysis of 163 house mouse genomes, with a particular focus on East Asia, revealed substantial admixture between the subspecies castaneus and musculus, particularly in Japan and southern China. We revealed, despite the different level of autosomal admixture among regions, that all Y Chromosomes in the East Asian samples belonged to the musculus-type haplogroup, potentially explained by genomic conflict under sex-ratio distortion owing to varying copy numbers of ampliconic genes on sex chromosomes, Slx and Sly Our computer simulations, designed to replicate the observed scenario, show that the preferential fixation of musculus-type Y Chromosomes can be achieved with a slight increase in the male-to-female birth ratio. We also investigated the influence of selection on the posthybridization of the subspecies castaneus and musculus in Japan. Even though the genetic background of most Japanese samples closely resembles the subspecies musculus, certain genomic regions overrepresented the castaneus-like genetic components, particularly in immune-related genes. Furthermore, a large genomic block (∼2 Mbp) containing a vomeronasal/olfactory receptor gene cluster predominantly harbored castaneus-type haplotypes in the Japanese samples, highlighting the crucial role of olfaction-based recognition in shaping hybrid genomes.
Collapse
Affiliation(s)
- Kazumichi Fujiwara
- Mouse Genomics Resource Laboratory, National Institute of Genetics, Mishima 411-8540, Japan
- Graduate School of Information Science and Technology, Hokkaido University, Sapporo 060-0814, Japan
| | - Shunpei Kubo
- Graduate School of Information Science and Technology, Hokkaido University, Sapporo 060-0814, Japan
| | - Toshinori Endo
- Graduate School of Information Science and Technology, Hokkaido University, Sapporo 060-0814, Japan
| | - Toyoyuki Takada
- Integrated BioResource Information Division, RIKEN BioResource Research Center, Tsukuba 305-0074, Japan
| | | | - Hitoshi Suzuki
- Graduate School of Environmental Science, Hokkaido University, Sapporo 060-0810, Japan
| | - Naoki Osada
- Graduate School of Information Science and Technology, Hokkaido University, Sapporo 060-0814, Japan;
| |
Collapse
|
30
|
Wong Y, Ignatieva A, Koskela J, Gorjanc G, Wohns AW, Kelleher J. A general and efficient representation of ancestral recombination graphs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.03.565466. [PMID: 37961279 PMCID: PMC10635123 DOI: 10.1101/2023.11.03.565466] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
As a result of recombination, adjacent nucleotides can have different paths of genetic inheritance and therefore the genealogical trees for a sample of DNA sequences vary along the genome. The structure capturing the details of these intricately interwoven paths of inheritance is referred to as an ancestral recombination graph (ARG). Classical formalisms have focused on mapping coalescence and recombination events to the nodes in an ARG. This approach is out of step with modern developments, which do not represent genetic inheritance in terms of these events or explicitly infer them. We present a simple formalism that defines an ARG in terms of specific genomes and their intervals of genetic inheritance, and show how it generalises these classical treatments and encompasses the outputs of recent methods. We discuss nuances arising from this more general structure, and argue that it forms an appropriate basis for a software standard in this rapidly growing field.
Collapse
Affiliation(s)
- Yan Wong
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, UK
| | - Anastasia Ignatieva
- School of Mathematics and Statistics, University of Glasgow, UK
- Department of Statistics, University of Oxford, UK
| | - Jere Koskela
- School of Mathematics, Statistics and Physics, Newcastle University, UK
- Department of Statistics, University of Warwick, UK
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, UK
| | - Anthony W. Wohns
- Broad Institute of MIT and Harvard, Cambridge, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, USA
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, UK
| |
Collapse
|
31
|
Chen J, Liu C, Li W, Zhang W, Wang Y, Clark AG, Lu J. From sub-Saharan Africa to China: Evolutionary history and adaptation of Drosophila melanogaster revealed by population genomics. SCIENCE ADVANCES 2024; 10:eadh3425. [PMID: 38630810 PMCID: PMC11023512 DOI: 10.1126/sciadv.adh3425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Accepted: 03/13/2024] [Indexed: 04/19/2024]
Abstract
Drosophila melanogaster is a widely used model organism for studying environmental adaptation. However, the genetic diversity of populations in Asia is poorly understood, leaving a notable gap in our knowledge of the global evolution and adaptation of this species. We sequenced genomes of 292 D. melanogaster strains from various ecological settings in China and analyzed them along with previously published genome sequences. We have identified six global genetic ancestry groups, despite the presence of widespread genetic admixture. The strains from China represent a unique ancestry group, although detectable differentiation exists among populations within China. We deciphered the global migration and demography of D. melanogaster, and identified widespread signals of adaptation, including genetic changes in response to insecticides. We validated the effects of insecticide resistance variants using population cage trials and deep sequencing. This work highlights the importance of population genomics in understanding the genetic underpinnings of adaptation, an effort that is particularly relevant given the deterioration of ecosystems.
Collapse
Affiliation(s)
- Junhao Chen
- State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, School of Life Sciences, Peking University, Beijing 100871, China
| | - Chenlu Liu
- State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, School of Life Sciences, Peking University, Beijing 100871, China
| | - Weixuan Li
- State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, School of Life Sciences, Peking University, Beijing 100871, China
| | - Wenxia Zhang
- State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, School of Life Sciences, Peking University, Beijing 100871, China
| | - Yirong Wang
- College of Biology, Hunan University, Changsha 410082, China
| | - Andrew G. Clark
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Jian Lu
- State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, School of Life Sciences, Peking University, Beijing 100871, China
| |
Collapse
|
32
|
Lu Z, Wang X, Carr M, Kim A, Gazal S, Mohammadi P, Wu L, Gusev A, Pirruccello J, Kachuri L, Mancuso N. Improved multi-ancestry fine-mapping identifies cis-regulatory variants underlying molecular traits and disease risk. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.15.24305836. [PMID: 38699369 PMCID: PMC11065034 DOI: 10.1101/2024.04.15.24305836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
Multi-ancestry statistical fine-mapping of cis-molecular quantitative trait loci (cis-molQTL) aims to improve the precision of distinguishing causal cis-molQTLs from tagging variants. However, existing approaches fail to reflect shared genetic architectures. To solve this limitation, we present the Sum of Shared Single Effects (SuShiE) model, which leverages LD heterogeneity to improve fine-mapping precision, infer cross-ancestry effect size correlations, and estimate ancestry-specific expression prediction weights. We apply SuShiE to mRNA expression measured in PBMCs (n=956) and LCLs (n=814) together with plasma protein levels (n=854) from individuals of diverse ancestries in the TOPMed MESA and GENOA studies. We find SuShiE fine-maps cis-molQTLs for 16% more genes compared with baselines while prioritizing fewer variants with greater functional enrichment. SuShiE infers highly consistent cis-molQTL architectures across ancestries on average; however, we also find evidence of heterogeneity at genes with predicted loss-of-function intolerance, suggesting that environmental interactions may partially explain differences in cis-molQTL effect sizes across ancestries. Lastly, we leverage estimated cis-molQTL effect-sizes to perform individual-level TWAS and PWAS on six white blood cell-related traits in AOU Biobank individuals (n=86k), and identify 44 more genes compared with baselines, further highlighting its benefits in identifying genes relevant for complex disease risk. Overall, SuShiE provides new insights into the cis-genetic architecture of molecular traits.
Collapse
Affiliation(s)
- Zeyun Lu
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Xinran Wang
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Matthew Carr
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Artem Kim
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Steven Gazal
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA
| | - Pejman Mohammadi
- Center for Immunity and Immunotherapies, Seattle Children’s Research Institute, Seattle, WA, USA
- Department of Pediatrics, University of Washington School of Medicine, Seattle, WA, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Lang Wu
- Cancer Epidemiology Division, Population Sciences in the Pacific Program, University of Hawaiʻi Cancer Center, University of Hawaiʻi at Mānoa, Honolulu, HI, USA
| | - Alexander Gusev
- Harvard Medical School and Dana-Farber Cancer Institute, Boston, MA, USA
| | - James Pirruccello
- Division of Cardiology, University of California San Francisco, San Francisco, CA, USA
| | - Linda Kachuri
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Nicholas Mancuso
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA
| |
Collapse
|
33
|
de Smith AJ, Wahlster L, Jeon S, Kachuri L, Black S, Langie J, Cato LD, Nakatsuka N, Chan TF, Xia G, Mazumder S, Yang W, Gazal S, Eng C, Hu D, Burchard EG, Ziv E, Metayer C, Mancuso N, Yang JJ, Ma X, Wiemels JL, Yu F, Chiang CWK, Sankaran VG. A noncoding regulatory variant in IKZF1 increases acute lymphoblastic leukemia risk in Hispanic/Latino children. CELL GENOMICS 2024; 4:100526. [PMID: 38537633 PMCID: PMC11019360 DOI: 10.1016/j.xgen.2024.100526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 12/11/2023] [Accepted: 02/27/2024] [Indexed: 04/04/2024]
Abstract
Hispanic/Latino children have the highest risk of acute lymphoblastic leukemia (ALL) in the US compared to other racial/ethnic groups, yet the basis of this remains incompletely understood. Through genetic fine-mapping analyses, we identified a new independent childhood ALL risk signal near IKZF1 in self-reported Hispanic/Latino individuals, but not in non-Hispanic White individuals, with an effect size of ∼1.44 (95% confidence interval = 1.33-1.55) and a risk allele frequency of ∼18% in Hispanic/Latino populations and <0.5% in European populations. This risk allele was positively associated with Indigenous American ancestry, showed evidence of selection in human history, and was associated with reduced IKZF1 expression. We identified a putative causal variant in a downstream enhancer that is most active in pro-B cells and interacts with the IKZF1 promoter. This variant disrupts IKZF1 autoregulation at this enhancer and results in reduced enhancer activity in B cell progenitors. Our study reveals a genetic basis for the increased ALL risk in Hispanic/Latino children.
Collapse
Affiliation(s)
- Adam J de Smith
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, University of Southern California Keck School of Medicine, Los Angeles, CA 90033, USA; USC Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA 90033, USA.
| | - Lara Wahlster
- Division of Hematology/Oncology, Boston Children's Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Soyoung Jeon
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, University of Southern California Keck School of Medicine, Los Angeles, CA 90033, USA; USC Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA 90033, USA
| | - Linda Kachuri
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Susan Black
- Division of Hematology/Oncology, Boston Children's Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Jalen Langie
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, University of Southern California Keck School of Medicine, Los Angeles, CA 90033, USA; USC Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA 90033, USA
| | - Liam D Cato
- Division of Hematology/Oncology, Boston Children's Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | | | - Tsz-Fung Chan
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, University of Southern California Keck School of Medicine, Los Angeles, CA 90033, USA; USC Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA 90033, USA
| | - Guangze Xia
- GMU-GIBH Joint School of Life Sciences, The Guangdong-Hong Kong-Macau Joint Laboratory for Cell Fate Regulation and Diseases, Guangzhou National Laboratory, Guangzhou Medical University, Guangzhou, China
| | - Soumyaa Mazumder
- Division of Hematology/Oncology, Boston Children's Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Wenjian Yang
- Department of Pharmacy and Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Steven Gazal
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, University of Southern California Keck School of Medicine, Los Angeles, CA 90033, USA; USC Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA 90033, USA
| | - Celeste Eng
- Department of Medicine, Institute for Human Genetics, Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA 94143, USA; Department of Bioengineering and Biotherapeutic Sciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Donglei Hu
- Department of Medicine, Institute for Human Genetics, Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Esteban González Burchard
- Department of Medicine, Institute for Human Genetics, Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA 94143, USA; Department of Bioengineering and Biotherapeutic Sciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Elad Ziv
- Department of Medicine, Institute for Human Genetics, Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Catherine Metayer
- School of Public Health, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Nicholas Mancuso
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, University of Southern California Keck School of Medicine, Los Angeles, CA 90033, USA; USC Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA 90033, USA
| | - Jun J Yang
- Department of Pharmacy and Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Xiaomei Ma
- Yale School of Public Health, New Haven, CT 06520, USA
| | - Joseph L Wiemels
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, University of Southern California Keck School of Medicine, Los Angeles, CA 90033, USA; USC Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA 90033, USA
| | - Fulong Yu
- Division of Hematology/Oncology, Boston Children's Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; GMU-GIBH Joint School of Life Sciences, The Guangdong-Hong Kong-Macau Joint Laboratory for Cell Fate Regulation and Diseases, Guangzhou National Laboratory, Guangzhou Medical University, Guangzhou, China
| | - Charleston W K Chiang
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, University of Southern California Keck School of Medicine, Los Angeles, CA 90033, USA; USC Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA 90033, USA
| | - Vijay G Sankaran
- Division of Hematology/Oncology, Boston Children's Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
34
|
Song H, Chu J, Li W, Li X, Fang L, Han J, Zhao S, Ma Y. A Novel Approach Utilizing Domain Adversarial Neural Networks for the Detection and Classification of Selective Sweeps. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2304842. [PMID: 38308186 PMCID: PMC11005742 DOI: 10.1002/advs.202304842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 01/10/2024] [Indexed: 02/04/2024]
Abstract
The identification and classification of selective sweeps are of great significance for improving the understanding of biological evolution and exploring opportunities for precision medicine and genetic improvement. Here, a domain adaptation sweep detection and classification (DASDC) method is presented to balance the alignment of two domains and the classification performance through a domain-adversarial neural network and its adversarial learning modules. DASDC effectively addresses the issue of mismatch between training data and real genomic data in deep learning models, leading to a significant improvement in its generalization capability, prediction robustness, and accuracy. The DASDC method demonstrates improved identification performance compared to existing methods and excels in classification performance, particularly in scenarios where there is a mismatch between application data and training data. The successful implementation of DASDC in real data of three distinct species highlights its potential as a useful tool for identifying crucial functional genes and investigating adaptive evolutionary mechanisms, particularly with the increasing availability of genomic data.
Collapse
Affiliation(s)
- Hui Song
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
| | - Jinyu Chu
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
| | - Wangjiao Li
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
| | - Xinyun Li
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
- Hubei Hongshan LaboratoryWuhan430070China
| | - Lingzhao Fang
- Center for Quantitative Genetics and GenomicsAarhus UniversityAarhus8000Denmark
| | - Jianlin Han
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
- CAAS‐ILRI Joint Laboratory on Livestock and Forage Genetic ResourcesInstitute of Animal ScienceChinese Academy of Agricultural Sciences (CAAS)Beijing100193China
- Livestock Genetics ProgramInternational Livestock Research Institute (ILRI)Nairobi00100Kenya
| | - Shuhong Zhao
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
- Hubei Hongshan LaboratoryWuhan430070China
- Lingnan Modern Agricultural Science and Technology Guangdong LaboratoryGuangzhou510642China
| | - Yunlong Ma
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
- Hubei Hongshan LaboratoryWuhan430070China
- Lingnan Modern Agricultural Science and Technology Guangdong LaboratoryGuangzhou510642China
| |
Collapse
|
35
|
Reynolds AZ, Niedbalski SD. Sex-biased gene regulation varies across human populations as a result of adaptive evolution. AMERICAN JOURNAL OF BIOLOGICAL ANTHROPOLOGY 2024; 183:e24888. [PMID: 38100225 PMCID: PMC11279473 DOI: 10.1002/ajpa.24888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 11/14/2023] [Accepted: 11/28/2023] [Indexed: 03/03/2024]
Abstract
OBJECTIVES Studies of human sexual dimorphism and gender disparities in health focus on ostensibly universal molecular sex differences, such as sex chromosomes and circulating hormone levels, while ignoring the extraordinary diversity in biology, behavior, and culture acquired by different human populations over their unique evolutionary histories. MATERIALS AND METHODS Using RNA-Seq data and whole genome sequences from 1000G and HGDP, we investigate variation in sex-biased gene expression across 11 human populations and test whether population-level variation in sex-biased expression may have resulted from adaptive evolution in regions containing sex-specific regulatory variants. RESULTS We find that sex-biased gene expression in humans is highly variable, mostly population-specific, and demonstrates between population reversals. Expression quantitative trait locus mapping reveals sex-specific regulatory regions with evidence of recent positive natural selection, suggesting that variation in sex-biased expression may have evolved as an adaptive response to ancestral environments experienced by human populations. DISCUSSION These results indicate that sex-biased gene expression is more flexible than previously thought and is not generally shared among human populations. Instead, molecular phenotypes associated with sex depend on complex interactions between population-specific molecular evolution and physiological responses to contemporary socioecologies.
Collapse
Affiliation(s)
- Adam Z. Reynolds
- Department of Anthropology, University of New Mexico, Albuquerque, NM
| | | |
Collapse
|
36
|
Martiniano R, Haber M, Almarri MA, Mattiangeli V, Kuijpers MCM, Chamel B, Breslin EM, Littleton J, Almahari S, Aloraifi F, Bradley DG, Lombard P, Durbin R. Ancient genomes illuminate Eastern Arabian population history and adaptation against malaria. CELL GENOMICS 2024; 4:100507. [PMID: 38417441 PMCID: PMC10943591 DOI: 10.1016/j.xgen.2024.100507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 11/01/2023] [Accepted: 01/31/2024] [Indexed: 03/01/2024]
Abstract
The harsh climate of Arabia has posed challenges in generating ancient DNA from the region, hindering the direct examination of ancient genomes for understanding the demographic processes that shaped Arabian populations. In this study, we report whole-genome sequence data obtained from four Tylos-period individuals from Bahrain. Their genetic ancestry can be modeled as a mixture of sources from ancient Anatolia, Levant, and Iran/Caucasus, with variation between individuals suggesting population heterogeneity in Bahrain before the onset of Islam. We identify the G6PD Mediterranean mutation associated with malaria resistance in three out of four ancient Bahraini samples and estimate that it rose in frequency in Eastern Arabia from 5 to 6 kya onward, around the time agriculture appeared in the region. Our study characterizes the genetic composition of ancient Arabians, shedding light on the population history of Bahrain and demonstrating the feasibility of studies of ancient DNA in the region.
Collapse
Affiliation(s)
- Rui Martiniano
- School of Biological and Environmental Sciences, Liverpool John Moores University, L3 3AF Liverpool, UK.
| | - Marc Haber
- Institute of Cancer and Genomic Sciences, University of Birmingham Dubai, Dubai, United Arab Emirates
| | - Mohamed A Almarri
- Department of Forensic Science and Criminology, Dubai Police GHQ, Dubai, United Arab Emirates; College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, United Arab Emirates
| | | | - Mirte C M Kuijpers
- Department of Ecology, Behavior and Evolution, School of Biological Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Berenice Chamel
- Institut Français du Proche-Orient (MEAE/CNRS), Beirut, Lebanon
| | - Emily M Breslin
- Smurfit Institute of Genetics, Trinity College Dublin, Dublin 2, Ireland
| | - Judith Littleton
- School of Social Sciences, University of Auckland, Auckland, New Zealand
| | - Salman Almahari
- Bahrain Authority for Culture and Antiquities, Manama, Kingdom of Bahrain
| | - Fatima Aloraifi
- Mersey and West Lancashire Teaching Hospitals NHS Trust, Whiston Hospital, Warrington Road, Prescot, L35 5DR Liverpool, UK
| | - Daniel G Bradley
- Smurfit Institute of Genetics, Trinity College Dublin, Dublin 2, Ireland
| | - Pierre Lombard
- Bahrain Authority for Culture and Antiquities, Manama, Kingdom of Bahrain; Archéorient UMR 5133, CNRS, Université Lyon 2, Maison de l'Orient et de la Méditerranée - Jean Pouilloux, Lyon, France
| | - Richard Durbin
- Department of Genetics, University of Cambridge, CB2 3EH Cambridge, UK.
| |
Collapse
|
37
|
Huang Z, Kelleher J, Chan YB, Balding DJ. Estimating evolutionary and demographic parameters via ARG-derived IBD. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.07.583855. [PMID: 38559261 PMCID: PMC10979897 DOI: 10.1101/2024.03.07.583855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Inference of demographic and evolutionary parameters from a sample of genome sequences often proceeds by first inferring identical-by-descent (IBD) genome segments. By exploiting efficient data encoding based on the ancestral recombination graph (ARG), we obtain three major advantages over current approaches: (i) no need to impose a length threshold on IBD segments, (ii) IBD can be defined without the hard-to-verify requirement of no recombination, and (iii) computation time can be reduced with little loss of statistical efficiency using only the IBD segments from a set of sequence pairs that scales linearly with sample size. We first demonstrate powerful inferences when true IBD information is available from simulated data. For IBD inferred from real data, we propose an approximate Bayesian computation inference algorithm and use it to show that poorly-inferred short IBD segments can improve estimation precision. We show estimation precision similar to a previously-published estimator despite a 4 000-fold reduction in data used for inference. Computational cost limits model complexity in our approach, but we are able to incorporate unknown nuisance parameters and model misspecification, still finding improved parameter inference.
Collapse
Affiliation(s)
- Zhendong Huang
- Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Australia
| | - Jerome Kelleher
- Oxford Big Data Institute, University of Oxford, United Kingdom
| | - Yao-ban Chan
- Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Australia
| | - David J. Balding
- Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Australia
| |
Collapse
|
38
|
Przeworski M. 2023 ASHG Scientific Achievement Award. Am J Hum Genet 2024; 111:425-427. [PMID: 38458164 PMCID: PMC10995464 DOI: 10.1016/j.ajhg.2023.12.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 12/12/2023] [Indexed: 03/10/2024] Open
Abstract
This article is based on the address given by the author at the 2023 meeting of The American Society of Human Genetics (ASHG) in Washington, D.C. A video of the original address can be found at the ASHG website.
Collapse
Affiliation(s)
- Molly Przeworski
- Departments of Biological Sciences and Systems Biology, Columbia University, New York, NY, USA.
| |
Collapse
|
39
|
Herman RW, Clucas G, Younger J, Bates J, Robinson B, Reddy S, Stepanuk J, O'Brien K, Veeramah K, Lynch HJ. Whole genome sequencing reveals stepping-stone dispersal buffered against founder effects in a range expanding seabird. Mol Ecol 2024; 33:e17282. [PMID: 38299701 DOI: 10.1111/mec.17282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 12/20/2023] [Accepted: 01/04/2024] [Indexed: 02/02/2024]
Abstract
Many species are shifting their ranges in response to climate-driven environmental changes, particularly in high-latitude regions. However, the patterns of dispersal and colonization during range shifting events are not always clear. Understanding how populations are connected through space and time can reveal how species navigate a changing environment. Here, we present a fine-scale population genomics study of gentoo penguins (Pygoscelis papua), a presumed site-faithful colonial nesting species that has increased in population size and expanded its range south along the Western Antarctic Peninsula. Using whole genome sequencing, we analysed 129 gentoo penguin individuals across 12 colonies located at or near the southern range edge. Through a detailed examination of fine-scale population structure, admixture, and population divergence, we inferred that gentoo penguins historically dispersed rapidly in a stepping-stone pattern from the South Shetland Islands leading to the colonization of Anvers Island, and then the adjacent mainland Western Antarctica Peninsula. Recent southward expansion along the Western Antarctic Peninsula also followed a stepping-stone dispersal pattern coupled with limited post-divergence gene flow from colonies on Anvers Island. Genetic diversity appeared to be maintained across colonies during the historical dispersal process, and range-edge populations are still growing. This suggests large numbers of migrants may provide a buffer against founder effects at the beginning of colonization events to maintain genetic diversity similar to that of the source populations before migration ceases post-divergence. These results coupled with a continued increase in effective population size since approximately 500-800 years ago distinguish gentoo penguins as a robust species that is highly adaptable and resilient to changing climate.
Collapse
Affiliation(s)
- Rachael W Herman
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, New York, USA
| | - Gemma Clucas
- Cornell Lab of Ornithology, Cornell University, Ithaca, New York, USA
| | - Jane Younger
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart, Tasmania, Australia
| | - John Bates
- Negaunee Integrative Research Center, The Field Museum of Natural History, Chicago, Illinois, USA
| | - Bryce Robinson
- Cornell Lab of Ornithology, Cornell University, Ithaca, New York, USA
| | - Sushma Reddy
- Bell Museum of Natural History and Department of Fisheries, Wildlife and Conservation Biology, University of Minnesota, St. Paul, Minnesota, USA
| | - Julia Stepanuk
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, New York, USA
| | - Katie O'Brien
- Milner Centre for Evolution, University of Bath, Bath, UK
| | - Krishna Veeramah
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, New York, USA
| | - Heather J Lynch
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, New York, USA
- Institute for Advanced Computational Sciences, Stony Brook University, Stony Brook, New York, USA
| |
Collapse
|
40
|
Liu S, Luo H, Zhang P, Li Y, Hao D, Zhang S, Song T, Xu T, He S. Adaptive Selection of Cis-regulatory Elements in the Han Chinese. Mol Biol Evol 2024; 41:msae034. [PMID: 38377343 PMCID: PMC10917166 DOI: 10.1093/molbev/msae034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 01/18/2024] [Accepted: 02/05/2024] [Indexed: 02/22/2024] Open
Abstract
Cis-regulatory elements have an important role in human adaptation to the living environment. However, the lag in population genomic cohort studies and epigenomic studies, hinders the research in the adaptive analysis of cis-regulatory elements in human populations. In this study, we collected 4,013 unrelated individuals and performed a comprehensive analysis of adaptive selection of genome-wide cis-regulatory elements in the Han Chinese. In total, 12.34% of genomic regions are under the influence of adaptive selection, where 1.00% of enhancers and 2.06% of promoters are under positive selection, and 0.06% of enhancers and 0.02% of promoters are under balancing selection. Gene ontology enrichment analysis of these cis-regulatory elements under adaptive selection reveals that many positive selections in the Han Chinese occur in pathways involved in cell-cell adhesion processes, and many balancing selections are related to immune processes. Two classes of adaptive cis-regulatory elements related to cell adhesion were in-depth analyzed, one is the adaptive enhancers derived from neanderthal introgression, leads to lower hyaluronidase level in skin, and brings better performance on UV-radiation resistance to the Han Chinese. Another one is the cis-regulatory elements regulating wound healing, and the results suggest the positive selection inhibits coagulation and promotes angiogenesis and wound healing in the Han Chinese. Finally, we found that many pathogenic alleles, such as risky alleles of type 2 diabetes or schizophrenia, remain in the population due to the hitchhiking effect of positive selections. Our findings will help deepen our understanding of the adaptive evolution of genome regulation in the Han Chinese.
Collapse
Affiliation(s)
- Shuai Liu
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Huaxia Luo
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Peng Zhang
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Yanyan Li
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Di Hao
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Sijia Zhang
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tingrui Song
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Tao Xu
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan 250117, Shandong, China
| | - Shunmin He
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
41
|
Aslett LJM, Christ RR. kalis: a modern implementation of the Li & Stephens model for local ancestry inference in R. BMC Bioinformatics 2024; 25:86. [PMID: 38418970 PMCID: PMC10900616 DOI: 10.1186/s12859-024-05688-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 02/01/2024] [Indexed: 03/02/2024] Open
Abstract
BACKGROUND Approximating the recent phylogeny of N phased haplotypes at a set of variants along the genome is a core problem in modern population genomics and central to performing genome-wide screens for association, selection, introgression, and other signals. The Li & Stephens (LS) model provides a simple yet powerful hidden Markov model for inferring the recent ancestry at a given variant, represented as an N × N distance matrix based on posterior decodings. RESULTS We provide a high-performance engine to make these posterior decodings readily accessible with minimal pre-processing via an easy to use package kalis, in the statistical programming language R. kalis enables investigators to rapidly resolve the ancestry at loci of interest and developers to build a range of variant-specific ancestral inference pipelines on top. kalis exploits both multi-core parallelism and modern CPU vector instruction sets to enable scaling to hundreds of thousands of genomes. CONCLUSIONS The resulting distance matrices accessible via kalis enable local ancestry, selection, and association studies in modern large scale genomic datasets.
Collapse
Affiliation(s)
- Louis J M Aslett
- Department of Mathematical Sciences, Durham University, Stockton Road, Durham, DH1 3LE, UK.
| | - Ryan R Christ
- Department of Genetics, Yale School of Medicine, 333 Cedar Street, New Haven, CT, 06520, USA
| |
Collapse
|
42
|
Pivirotto A, Peles N, Hey J. Allele age estimators designed for whole genome datasets show only a modest decrease in accuracy when applied to whole exome datasets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.01.578465. [PMID: 38370640 PMCID: PMC10871225 DOI: 10.1101/2024.02.01.578465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Personalized genomics in the healthcare system is becoming increasingly accessible as the costs of sequencing decreases. With the increase in number of genomes, larger numbers of rare variants are being discovered and much work is being done to identify their functional impacts in relation to disease phenotypes. One way to characterize these variants is to estimate the time the mutation entered the population. However, allele age estimators such as Relate, Genealogical Estimator of Variant Age, and time of coalescence, were developed based on the assumption that datasets include the entire genome. We examined the performance of each of these estimators on simulated exome data under a neutral constant population size model and found that each provides usable estimates of allele age from whole-exome datasets. To test the robustness of these methods, analyses were undertaken to simulate data under a population expansion model and background selection. Relate performs the best amongst all three estimators with Pearson coefficients of 0.64 and 0.68 (neutral constant and expansion population model) with a 17 percent and 15 percent drop in accuracy between whole genome and whole exome estimations. Of the three estimators, Relate is best able to parallelize to yield quick results with little resources, however even Relate is only able to scale to thousands of samples making it unable to match the hundreds of thousands of samples being currently released. While more work is needed to expand the capabilities of current methods of estimating allele age, these methods estimate the age of mutations with a modest decrease in performance.
Collapse
Affiliation(s)
- Alyssa Pivirotto
- Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA USA
| | - Noah Peles
- Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA USA
| | - Jody Hey
- Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA USA
| |
Collapse
|
43
|
Sinigaglia B, Escudero J, Biagini SA, Garcia-Calleja J, Moreno J, Dobon B, Acosta S, Mondal M, Walsh S, Aguileta G, Vallès M, Forrow S, Martin-Caballero J, Migliano AB, Bertranpetit J, Muñoz FJ, Bosch E. Exploring Adaptive Phenotypes for the Human Calcium-Sensing Receptor Polymorphism R990G. Mol Biol Evol 2024; 41:msae015. [PMID: 38285634 PMCID: PMC10859840 DOI: 10.1093/molbev/msae015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 01/23/2024] [Accepted: 01/23/2024] [Indexed: 01/31/2024] Open
Abstract
Rainforest hunter-gatherers from Southeast Asia are characterized by specific morphological features including a particularly dark skin color (D), short stature (S), woolly hair (W), and the presence of steatopygia (S)-fat accumulation localized in the hips (DSWS phenotype). Based on previous evidence in the Andamanese population, we first characterized signatures of adaptive natural selection around the calcium-sensing receptor gene in Southeast Asian rainforest groups presenting the DSWS phenotype and identified the R990G substitution (rs1042636) as a putative adaptive variant for experimental follow-up. Although the calcium-sensing receptor has a critical role in calcium homeostasis by directly regulating the parathyroid hormone secretion, it is expressed in different tissues and has been described to be involved in many biological functions. Previous works have also characterized the R990G substitution as an activating polymorphism of the calcium-sensing receptor associated with hypocalcemia. Therefore, we generated a knock-in mouse for this substitution and investigated organismal phenotypes that could have become adaptive in rainforest hunter-gatherers from Southeast Asia. Interestingly, we found that mouse homozygous for the derived allele show not only lower serum calcium concentration but also greater body weight and fat accumulation, probably because of enhanced preadipocyte differentiation and lipolysis impairment resulting from the calcium-sensing receptor activation mediated by R990G. We speculate that such differential features in humans could have facilitated the survival of hunter-gatherer groups during periods of nutritional stress in the challenging conditions of the Southeast Asian tropical rainforests.
Collapse
Affiliation(s)
- Barbara Sinigaglia
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| | - Jorge Escudero
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| | - Simone A Biagini
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| | - Jorge Garcia-Calleja
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| | - Josep Moreno
- PCB-PRBB Animal Facility Alliance, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| | - Begoña Dobon
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| | - Sandra Acosta
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
- UB Institute of Neuroscience, Department of Pathology and Experimental Therapeutics, Universitat de Barcelona, Barcelona 08007, Spain
| | - Mayukh Mondal
- Institute of Genomics, University of Tartu, Tartu 51010, Estonia
- Institute of Clinical Molecular Biology, Christian-Albrechts-Universität zu Kiel, Kiel 24118, Germany
| | - Sandra Walsh
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| | - Gabriela Aguileta
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| | - Mònica Vallès
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| | - Stephen Forrow
- Mouse Mutant Core Facility, Institute for Research in Biomedicine (IRB), Barcelona 08028, Spain
| | - Juan Martin-Caballero
- PCB-PRBB Animal Facility Alliance, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| | - Andrea Bamberg Migliano
- Human Evolutionary Ecology Group, Department of Evolutionary Anthropology, University of Zurich, Zurich 8057, Switzerland
| | - Jaume Bertranpetit
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| | - Francisco J Muñoz
- Laboratory of Molecular Physiology, Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| | - Elena Bosch
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| |
Collapse
|
44
|
Brandt DYC, Huber CD, Chiang CWK, Ortega-Del Vecchyo D. The Promise of Inferring the Past Using the Ancestral Recombination Graph. Genome Biol Evol 2024; 16:evae005. [PMID: 38242694 PMCID: PMC10834162 DOI: 10.1093/gbe/evae005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 12/11/2023] [Accepted: 12/17/2023] [Indexed: 01/21/2024] Open
Abstract
The ancestral recombination graph (ARG) is a structure that represents the history of coalescent and recombination events connecting a set of sequences (Hudson RR. In: Futuyma D, Antonovics J, editors. Gene genealogies and the coalescent process. In: Oxford Surveys in Evolutionary Biology; 1991. p. 1 to 44.). The full ARG can be represented as a set of genealogical trees at every locus in the genome, annotated with recombination events that change the topology of the trees between adjacent loci and the mutations that occurred along the branches of those trees (Griffiths RC, Marjoram P. An ancestral recombination graph. In: Donnelly P, Tavare S, editors. Progress in population genetics and human evolution. Springer; 1997. p. 257 to 270.). Valuable insights can be gained into past evolutionary processes, such as demographic events or the influence of natural selection, by studying the ARG. It is regarded as the "holy grail" of population genetics (Hubisz M, Siepel A. Inference of ancestral recombination graphs using ARGweaver. In: Dutheil JY, editors. Statistical population genomics. New York, NY: Springer US; 2020. p. 231-266.) since it encodes the processes that generate all patterns of allelic and haplotypic variation from which all commonly used summary statistics in population genetic research (e.g. heterozygosity and linkage disequilibrium) can be derived. Many previous evolutionary inferences relied on summary statistics extracted from the genotype matrix. Evolutionary inferences using the ARG represent a significant advancement as the ARG is a representation of the evolutionary history of a sample that shows the past history of recombination, coalescence, and mutation events across a particular sequence. This representation in theory contains as much information, if not more, than the combination of all independent summary statistics that could be derived from the genotype matrix. Consistent with this idea, some of the first ARG-based analyses have proven to be more powerful than summary statistic-based analyses (Speidel L, Forest M, Shi S, Myers SR. A method for genome-wide genealogy estimation for thousands of samples. Nat Genet. 2019:51(9):1321 to 1329.; Stern AJ, Wilton PR, Nielsen R. An approximate full-likelihood method for inferring selection and allele frequency trajectories from DNA sequence data. PLoS Genet. 2019:15(9):e1008384.; Hubisz MJ, Williams AL, Siepel A. Mapping gene flow between ancient hominins through demography-aware inference of the ancestral recombination graph. PLoS Genet. 2020:16(8):e1008895.; Fan C, Mancuso N, Chiang CWK. A genealogical estimate of genetic relationships. Am J Hum Genet. 2022:109(5):812-824.; Fan C, Cahoon JL, Dinh BL, Ortega-Del Vecchyo D, Huber C, Edge MD, Mancuso N, Chiang CWK. A likelihood-based framework for demographic inference from genealogical trees. bioRxiv. 2023.10.10.561787. 2023.; Hejase HA, Mo Z, Campagna L, Siepel A. A deep-learning approach for inference of selective sweeps from the ancestral recombination graph. Mol Biol Evol. 2022:39(1):msab332.; Link V, Schraiber JG, Fan C, Dinh B, Mancuso N, Chiang CWK, Edge MD. Tree-based QTL mapping with expected local genetic relatedness matrices. bioRxiv. 2023.04.07.536093. 2023.; Zhang BC, Biddanda A, Gunnarsson ÁF, Cooper F, Palamara PF. Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits. Nat Genet. 2023:55(5):768-776.). As such, there has been significant interest in the field to investigate 2 main problems related to the ARG: (i) How can we estimate the ARG based on genomic data, and (ii) how can we extract information of past evolutionary processes from the ARG? In this perspective, we highlight 3 topics that pertain to these main issues: The development of computational innovations that enable the estimation of the ARG; remaining challenges in estimating the ARG; and methodological advances for deducing evolutionary forces and mechanisms using the ARG. This perspective serves to introduce the readers to the types of questions that can be explored using the ARG and to highlight some of the most pressing issues that must be addressed in order to make ARG-based inference an indispensable tool for evolutionary research.
Collapse
Affiliation(s)
- Débora Y C Brandt
- Department of Genetics Evolution and Environment, University College London, London, UK
| | - Christian D Huber
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | - Charleston W K Chiang
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Diego Ortega-Del Vecchyo
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma De México, Querétaro, Querétaro, Mexico
| |
Collapse
|
45
|
Rivas-González I, Schierup MH, Wakeley J, Hobolth A. TRAILS: Tree reconstruction of ancestry using incomplete lineage sorting. PLoS Genet 2024; 20:e1010836. [PMID: 38330138 PMCID: PMC10880969 DOI: 10.1371/journal.pgen.1010836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 02/21/2024] [Accepted: 01/22/2024] [Indexed: 02/10/2024] Open
Abstract
Genome-wide genealogies of multiple species carry detailed information about demographic and selection processes on individual branches of the phylogeny. Here, we introduce TRAILS, a hidden Markov model that accurately infers time-resolved population genetics parameters, such as ancestral effective population sizes and speciation times, for ancestral branches using a multi-species alignment of three species and an outgroup. TRAILS leverages the information contained in incomplete lineage sorting fragments by modelling genealogies along the genome as rooted three-leaved trees, each with a topology and two coalescent events happening in discretized time intervals within the phylogeny. Posterior decoding of the hidden Markov model can be used to infer the ancestral recombination graph for the alignment and details on demographic changes within a branch. Since TRAILS performs posterior decoding at the base-pair level, genome-wide scans based on the posterior probabilities can be devised to detect deviations from neutrality. Using TRAILS on a human-chimp-gorilla-orangutan alignment, we recover speciation parameters and extract information about the topology and coalescent times at high resolution.
Collapse
Affiliation(s)
| | - Mikkel H. Schierup
- Bioinformatics Research Center (BiRC), Aarhus University, Aarhus, Denmark
| | - John Wakeley
- Department of Organismic and Evolutionary Biology, Harvard University, Massachusetts, United States of America
| | - Asger Hobolth
- Department of Mathematics, Aarhus University, Aarhus, Denmark
| |
Collapse
|
46
|
Ray DD, Flagel L, Schrider DR. IntroUNET: Identifying introgressed alleles via semantic segmentation. PLoS Genet 2024; 20:e1010657. [PMID: 38377104 PMCID: PMC10906877 DOI: 10.1371/journal.pgen.1010657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 03/01/2024] [Accepted: 01/29/2024] [Indexed: 02/22/2024] Open
Abstract
A growing body of evidence suggests that gene flow between closely related species is a widespread phenomenon. Alleles that introgress from one species into a close relative are typically neutral or deleterious, but sometimes confer a significant fitness advantage. Given the potential relevance to speciation and adaptation, numerous methods have therefore been devised to identify regions of the genome that have experienced introgression. Recently, supervised machine learning approaches have been shown to be highly effective for detecting introgression. One especially promising approach is to treat population genetic inference as an image classification problem, and feed an image representation of a population genetic alignment as input to a deep neural network that distinguishes among evolutionary models (i.e. introgression or no introgression). However, if we wish to investigate the full extent and fitness effects of introgression, merely identifying genomic regions in a population genetic alignment that harbor introgressed loci is insufficient-ideally we would be able to infer precisely which individuals have introgressed material and at which positions in the genome. Here we adapt a deep learning algorithm for semantic segmentation, the task of correctly identifying the type of object to which each individual pixel in an image belongs, to the task of identifying introgressed alleles. Our trained neural network is thus able to infer, for each individual in a two-population alignment, which of those individual's alleles were introgressed from the other population. We use simulated data to show that this approach is highly accurate, and that it can be readily extended to identify alleles that are introgressed from an unsampled "ghost" population, performing comparably to a supervised learning method tailored specifically to that task. Finally, we apply this method to data from Drosophila, showing that it is able to accurately recover introgressed haplotypes from real data. This analysis reveals that introgressed alleles are typically confined to lower frequencies within genic regions, suggestive of purifying selection, but are found at much higher frequencies in a region previously shown to be affected by adaptive introgression. Our method's success in recovering introgressed haplotypes in challenging real-world scenarios underscores the utility of deep learning approaches for making richer evolutionary inferences from genomic data.
Collapse
Affiliation(s)
- Dylan D. Ray
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Lex Flagel
- Division of Data Science, Gencove Inc., New York, New York, United States of America
- Department of Plant and Microbial Biology, University of Minnesota, Saint Paul, Minnesota, United States of America
| | - Daniel R. Schrider
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| |
Collapse
|
47
|
Ray DD, Flagel L, Schrider DR. IntroUNET: identifying introgressed alleles via semantic segmentation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.02.07.527435. [PMID: 36865105 PMCID: PMC9979274 DOI: 10.1101/2023.02.07.527435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
A growing body of evidence suggests that gene flow between closely related species is a widespread phenomenon. Alleles that introgress from one species into a close relative are typically neutral or deleterious, but sometimes confer a significant fitness advantage. Given the potential relevance to speciation and adaptation, numerous methods have therefore been devised to identify regions of the genome that have experienced introgression. Recently, supervised machine learning approaches have been shown to be highly effective for detecting introgression. One especially promising approach is to treat population genetic inference as an image classification problem, and feed an image representation of a population genetic alignment as input to a deep neural network that distinguishes among evolutionary models (i.e. introgression or no introgression). However, if we wish to investigate the full extent and fitness effects of introgression, merely identifying genomic regions in a population genetic alignment that harbor introgressed loci is insufficient-ideally we would be able to infer precisely which individuals have introgressed material and at which positions in the genome. Here we adapt a deep learning algorithm for semantic segmentation, the task of correctly identifying the type of object to which each individual pixel in an image belongs, to the task of identifying introgressed alleles. Our trained neural network is thus able to infer, for each individual in a two-population alignment, which of those individual's alleles were introgressed from the other population. We use simulated data to show that this approach is highly accurate, and that it can be readily extended to identify alleles that are introgressed from an unsampled "ghost" population, performing comparably to a supervised learning method tailored specifically to that task. Finally, we apply this method to data from Drosophila, showing that it is able to accurately recover introgressed haplotypes from real data. This analysis reveals that introgressed alleles are typically confined to lower frequencies within genic regions, suggestive of purifying selection, but are found at much higher frequencies in a region previously shown to be affected by adaptive introgression. Our method's success in recovering introgressed haplotypes in challenging real-world scenarios underscores the utility of deep learning approaches for making richer evolutionary inferences from genomic data.
Collapse
Affiliation(s)
- Dylan D. Ray
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Lex Flagel
- Division of Data Science, Gencove Inc., New York, NY 11101, USA
- Department of Plant and Microbial Biology, University of Minnesota, St Paul MN, 55108, USA
| | - Daniel R. Schrider
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
48
|
Cousins T, Tabin D, Patterson N, Reich D, Durvasula A. Accurate inference of population history in the presence of background selection. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.18.576291. [PMID: 38313273 PMCID: PMC10838404 DOI: 10.1101/2024.01.18.576291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/06/2024]
Abstract
All published methods for learning about demographic history make the simplifying assumption that the genome evolves neutrally, and do not seek to account for the effects of natural selection on patterns of variation. This is a major concern, as ample work has demonstrated the pervasive effects of natural selection and in particular background selection (BGS) on patterns of genetic variation in diverse species. Simulations and theoretical work have shown that methods to infer changes in effective population size over time (Ne(t)) become increasingly inaccurate as the strength of linked selection increases. Here, we introduce an extension to the Pairwise Sequentially Markovian Coalescent (PSMC) algorithm, PSMC+, which explicitly co-models demographic history and natural selection. We benchmark our method using forward-in-time simulations with BGS and find that our approach improves the accuracy of effective population size inference. Leveraging a high resolution map of BGS in humans, we infer considerable changes in the magnitude of inferred effective population size relative to previous reports. Finally, we separately infer Ne(t) on the X chromosome and on the autosomes in diverse great apes without making a correction for selection, and find that the inferred ratio fluctuates substantially through time in a way that differs across species, showing that uncorrected selection may be an important driver of signals of genetic difference on the X chromosome and autosomes.
Collapse
Affiliation(s)
- Trevor Cousins
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Daniel Tabin
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Nick Patterson
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Boston, MA, USA
| | - Arun Durvasula
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
49
|
Stankowski S, Zagrodzka ZB, Garlovsky MD, Pal A, Shipilina D, Castillo DG, Lifchitz H, Le Moan A, Leder E, Reeve J, Johannesson K, Westram AM, Butlin RK. The genetic basis of a recent transition to live-bearing in marine snails. Science 2024; 383:114-119. [PMID: 38175895 DOI: 10.1126/science.adi2982] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 10/25/2023] [Indexed: 01/06/2024]
Abstract
Key innovations are fundamental to biological diversification, but their genetic basis is poorly understood. A recent transition from egg-laying to live-bearing in marine snails (Littorina spp.) provides the opportunity to study the genetic architecture of an innovation that has evolved repeatedly across animals. Individuals do not cluster by reproductive mode in a genome-wide phylogeny, but local genealogical analysis revealed numerous small genomic regions where all live-bearers carry the same core haplotype. Candidate regions show evidence for live-bearer-specific positive selection and are enriched for genes that are differentially expressed between egg-laying and live-bearing reproductive systems. Ages of selective sweeps suggest that live-bearer-specific alleles accumulated over more than 200,000 generations. Our results suggest that new functions evolve through the recruitment of many alleles rather than in a single evolutionary step.
Collapse
Affiliation(s)
- Sean Stankowski
- Ecology and Evolutionary Biology, School of Biosciences, University of Sheffield, Sheffield S10 2TN, UK
- Institute of Science and Technology Austria (ISTA), 3400 Klosterneuburg, Austria
- Department of Ecology and Evolution, University of Sussex, Brighton BN1 9RH, UK
| | - Zuzanna B Zagrodzka
- Ecology and Evolutionary Biology, School of Biosciences, University of Sheffield, Sheffield S10 2TN, UK
| | - Martin D Garlovsky
- Department of Applied Zoology, Faculty of Biology, Technische Universität Dresden, 01069 Dresden, Germany
| | - Arka Pal
- Institute of Science and Technology Austria (ISTA), 3400 Klosterneuburg, Austria
| | - Daria Shipilina
- Institute of Science and Technology Austria (ISTA), 3400 Klosterneuburg, Austria
- Department of Ecology and Genetics, Program of Evolutionary Biology, Uppsala University, SE-752 36 Uppsala, Sweden
| | | | - Hila Lifchitz
- Institute of Science and Technology Austria (ISTA), 3400 Klosterneuburg, Austria
| | - Alan Le Moan
- CNRS and Sorbonne Université, Station Biologique de Roscoff, 29680 Roscoff, France
- Department of Marine Sciences, Tjärnö Marine Laboratory, University of Gothenburg, 452 96 Strömstad, Sweden
| | - Erica Leder
- Department of Marine Sciences, Tjärnö Marine Laboratory, University of Gothenburg, 452 96 Strömstad, Sweden
- Natural History Museum, University of Oslo, 0562 Oslo, Norway
| | - James Reeve
- Department of Marine Sciences, Tjärnö Marine Laboratory, University of Gothenburg, 452 96 Strömstad, Sweden
| | - Kerstin Johannesson
- Department of Marine Sciences, Tjärnö Marine Laboratory, University of Gothenburg, 452 96 Strömstad, Sweden
| | - Anja M Westram
- Institute of Science and Technology Austria (ISTA), 3400 Klosterneuburg, Austria
- Faculty of Biosciences and Aquaculture, Nord University, N-8049 Bodø, Norway
| | - Roger K Butlin
- Ecology and Evolutionary Biology, School of Biosciences, University of Sheffield, Sheffield S10 2TN, UK
- Department of Marine Sciences, Tjärnö Marine Laboratory, University of Gothenburg, 452 96 Strömstad, Sweden
| |
Collapse
|
50
|
Gao Z. Unveiling recent and ongoing adaptive selection in human populations. PLoS Biol 2024; 22:e3002469. [PMID: 38236800 PMCID: PMC10796035 DOI: 10.1371/journal.pbio.3002469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2024] Open
Abstract
Genome-wide scans for signals of selection have become a routine part of the analysis of population genomic variation datasets and have resulted in compelling evidence of selection during recent human evolution. This Essay spotlights methodological innovations that have enabled the detection of selection over very recent timescales, even in contemporary human populations. By harnessing large-scale genomic and phenotypic datasets, these new methods use different strategies to uncover connections between genotype, phenotype, and fitness. This Essay outlines the rationale and key findings of each strategy, discusses challenges in interpretation, and describes opportunities to improve detection and understanding of ongoing selection in human populations.
Collapse
Affiliation(s)
- Ziyue Gao
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|