1
|
Vaughn AH, Nielsen R. Fast and Accurate Estimation of Selection Coefficients and Allele Histories from Ancient and Modern DNA. Mol Biol Evol 2024; 41:msae156. [PMID: 39078618 PMCID: PMC11321360 DOI: 10.1093/molbev/msae156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Revised: 07/02/2024] [Accepted: 07/10/2024] [Indexed: 07/31/2024] Open
Abstract
We here present CLUES2, a full-likelihood method to infer natural selection from sequence data that is an extension of the method CLUES. We make several substantial improvements to the CLUES method that greatly increases both its applicability and its speed. We add the ability to use ancestral recombination graphs on ancient data as emissions to the underlying hidden Markov model, which enables CLUES2 to use both temporal and linkage information to make estimates of selection coefficients. We also fully implement the ability to estimate distinct selection coefficients in different epochs, which allows for the analysis of changes in selective pressures through time, as well as selection with dominance. In addition, we greatly increase the computational efficiency of CLUES2 over CLUES using several approximations to the forward-backward algorithms and develop a new way to reconstruct historic allele frequencies by integrating over the uncertainty in the estimation of the selection coefficients. We illustrate the accuracy of CLUES2 through extensive simulations and validate the importance sampling framework for integrating over the uncertainty in the inference of gene trees. We also show that CLUES2 is well-calibrated by showing that under the null hypothesis, the distribution of log-likelihood ratios follows a χ2 distribution with the appropriate degrees of freedom. We run CLUES2 on a set of recently published ancient human data from Western Eurasia and test for evidence of changing selection coefficients through time. We find significant evidence of changing selective pressures in several genes correlated with the introduction of agriculture to Europe and the ensuing dietary and demographic shifts of that time. In particular, our analysis supports previous hypotheses of strong selection on lactase persistence during periods of ancient famines and attenuated selection in more modern periods.
Collapse
Affiliation(s)
- Andrew H Vaughn
- Center for Computational Biology, University of California, Berkeley, CA 94720, USA
| | - Rasmus Nielsen
- Departments of Integrative Biology and Statistics, University of California, Berkeley, CA 94720, USA
- Center for GeoGenetics, University of Copenhagen, Copenhagen DK-1350, Denmark
| |
Collapse
|
2
|
Fine AG, Steinrücken M. A novel expectation-maximization approach to infer general diploid selection from time-series genetic data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.10.593575. [PMID: 38798346 PMCID: PMC11118272 DOI: 10.1101/2024.05.10.593575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Detecting and quantifying the strength of selection is a main objective in population genetics. Since selection acts over multiple generations, many approaches have been developed to detect and quantify selection using genetic data sampled at multiple points in time. Such time series genetic data is commonly analyzed using Hidden Markov Models, but in most cases, under the assumption of additive selection. However, many examples of genetic variation exhibiting non-additive mechanisms exist, making it critical to develop methods that can characterize selection in more general scenarios. Thus, we extend a previously introduced expectation-maximization algorithm for the inference of additive selection coefficients to the case of general diploid selection, in which heterozygote and homozygote fitnesses are parameterized independently. We furthermore introduce a framework to identify bespoke modes of diploid selection from given data, as well as a procedure for aggregating data across linked loci to increase power and robustness. Using extensive simulation studies, we find that our method accurately and efficiently estimates selection coefficients for different modes of diploid selection across a wide range of scenarios; however, power to classify the mode of selection is low unless selection is very strong. We apply our method to ancient DNA samples from Great Britain in the last 4,450 years, and detect evidence for selection in six genomic regions, including the well-characterized LCT locus. Our work is the first genome-wide scan characterizing signals of general diploid selection.
Collapse
Affiliation(s)
- Adam G Fine
- Department of Ecology and Evolution, University of Chicago
- Graduate Program in Biophysical Sciences, University of Chicago
| | - Matthias Steinrücken
- Department of Ecology and Evolution, University of Chicago
- Department of Human Genetics, University of Chicago
| |
Collapse
|
3
|
Harris M, Kim BY, Garud N. Enrichment of hard sweeps on the X chromosome compared to autosomes in six Drosophila species. Genetics 2024; 226:iyae019. [PMID: 38366786 PMCID: PMC10990427 DOI: 10.1093/genetics/iyae019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 01/17/2024] [Accepted: 01/18/2024] [Indexed: 02/18/2024] Open
Abstract
The X chromosome, being hemizygous in males, is exposed one-third of the time increasing the visibility of new mutations to natural selection, potentially leading to different evolutionary dynamics than autosomes. Recently, we found an enrichment of hard selective sweeps over soft selective sweeps on the X chromosome relative to the autosomes in a North American population of Drosophila melanogaster. To understand whether this enrichment is a universal feature of evolution on the X chromosome, we analyze diversity patterns across 6 commonly studied Drosophila species. We find an increased proportion of regions with steep reductions in diversity and elevated homozygosity on the X chromosome compared to autosomes. To assess if these signatures are consistent with positive selection, we simulate a wide variety of evolutionary scenarios spanning variations in demography, mutation rate, recombination rate, background selection, hard sweeps, and soft sweeps and find that the diversity patterns observed on the X are most consistent with hard sweeps. Our findings highlight the importance of sex chromosomes in driving evolutionary processes and suggest that hard sweeps have played a significant role in shaping diversity patterns on the X chromosome across multiple Drosophila species.
Collapse
Affiliation(s)
- Mariana Harris
- Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Bernard Y Kim
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| | - Nandita Garud
- Department of Ecology and Evolutionary Biology, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, University of California Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
4
|
Song H, Chu J, Li W, Li X, Fang L, Han J, Zhao S, Ma Y. A Novel Approach Utilizing Domain Adversarial Neural Networks for the Detection and Classification of Selective Sweeps. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2304842. [PMID: 38308186 PMCID: PMC11005742 DOI: 10.1002/advs.202304842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 01/10/2024] [Indexed: 02/04/2024]
Abstract
The identification and classification of selective sweeps are of great significance for improving the understanding of biological evolution and exploring opportunities for precision medicine and genetic improvement. Here, a domain adaptation sweep detection and classification (DASDC) method is presented to balance the alignment of two domains and the classification performance through a domain-adversarial neural network and its adversarial learning modules. DASDC effectively addresses the issue of mismatch between training data and real genomic data in deep learning models, leading to a significant improvement in its generalization capability, prediction robustness, and accuracy. The DASDC method demonstrates improved identification performance compared to existing methods and excels in classification performance, particularly in scenarios where there is a mismatch between application data and training data. The successful implementation of DASDC in real data of three distinct species highlights its potential as a useful tool for identifying crucial functional genes and investigating adaptive evolutionary mechanisms, particularly with the increasing availability of genomic data.
Collapse
Affiliation(s)
- Hui Song
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
| | - Jinyu Chu
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
| | - Wangjiao Li
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
| | - Xinyun Li
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
- Hubei Hongshan LaboratoryWuhan430070China
| | - Lingzhao Fang
- Center for Quantitative Genetics and GenomicsAarhus UniversityAarhus8000Denmark
| | - Jianlin Han
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
- CAAS‐ILRI Joint Laboratory on Livestock and Forage Genetic ResourcesInstitute of Animal ScienceChinese Academy of Agricultural Sciences (CAAS)Beijing100193China
- Livestock Genetics ProgramInternational Livestock Research Institute (ILRI)Nairobi00100Kenya
| | - Shuhong Zhao
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
- Hubei Hongshan LaboratoryWuhan430070China
- Lingnan Modern Agricultural Science and Technology Guangdong LaboratoryGuangzhou510642China
| | - Yunlong Ma
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
- Hubei Hongshan LaboratoryWuhan430070China
- Lingnan Modern Agricultural Science and Technology Guangdong LaboratoryGuangzhou510642China
| |
Collapse
|
5
|
Gao Z. Unveiling recent and ongoing adaptive selection in human populations. PLoS Biol 2024; 22:e3002469. [PMID: 38236800 PMCID: PMC10796035 DOI: 10.1371/journal.pbio.3002469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2024] Open
Abstract
Genome-wide scans for signals of selection have become a routine part of the analysis of population genomic variation datasets and have resulted in compelling evidence of selection during recent human evolution. This Essay spotlights methodological innovations that have enabled the detection of selection over very recent timescales, even in contemporary human populations. By harnessing large-scale genomic and phenotypic datasets, these new methods use different strategies to uncover connections between genotype, phenotype, and fitness. This Essay outlines the rationale and key findings of each strategy, discusses challenges in interpretation, and describes opportunities to improve detection and understanding of ongoing selection in human populations.
Collapse
Affiliation(s)
- Ziyue Gao
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
6
|
Harris M, Kim B, Garud N. Enrichment of hard sweeps on the X chromosome compared to autosomes in six Drosophila species. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.21.545888. [PMID: 38106201 PMCID: PMC10723260 DOI: 10.1101/2023.06.21.545888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
The X chromosome, being hemizygous in males, is exposed one third of the time increasing the visibility of new mutations to natural selection, potentially leading to different evolutionary dynamics than autosomes. Recently, we found an enrichment of hard selective sweeps over soft selective sweeps on the X chromosome relative to the autosomes in a North American population of Drosophila melanogaster. To understand whether this enrichment is a universal feature of evolution on the X chromosome, we analyze diversity patterns across six commonly studied Drosophila species. We find an increased proportion of regions with steep reductions in diversity and elevated homozygosity on the X chromosome compared to autosomes. To assess if these signatures are consistent with positive selection, we simulate a wide variety of evolutionary scenarios spanning variations in demography, mutation rate, recombination rate, background selection, hard sweeps, and soft sweeps, and find that the diversity patterns observed on the X are most consistent with hard sweeps. Our findings highlight the importance of sex chromosomes in driving evolutionary processes and suggest that hard sweeps have played a significant role in shaping diversity patterns on the X chromosome across multiple Drosophila species.
Collapse
Affiliation(s)
- Mariana Harris
- Department of Computational Medicine, University of California Los Angeles, Los Angeles California, United States of America
| | - Bernard Kim
- Department of Biology, Stanford University, Stanford, California, United States of America
| | - Nandita Garud
- Ecology and Evolutionary Biology, University of California Los Angeles, Los Angeles California, United States of America
- Department of Human Genetics, University of California, Los Angeles, California, United States of America
| |
Collapse
|
7
|
Kambal S, Tijjani A, Ibrahim SAE, Ahmed MKA, Mwacharo JM, Hanotte O. Candidate signatures of positive selection for environmental adaptation in indigenous African cattle: A review. Anim Genet 2023; 54:689-708. [PMID: 37697736 DOI: 10.1111/age.13353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 07/28/2023] [Accepted: 08/19/2023] [Indexed: 09/13/2023]
Abstract
Environmental adaptation traits of indigenous African cattle are increasingly being investigated to respond to the need for sustainable livestock production in the context of unpredictable climatic changes. Several studies have highlighted genomic regions under positive selection probably associated with adaptation to environmental challenges (e.g. heat stress, trypanosomiasis, tick and tick-borne diseases). However, little attention has focused on pinpointing the candidate causative variant(s) controlling the traits. This review compiled information from 22 studies on signatures of positive selection in indigenous African cattle breeds to identify regions under positive selection. We highlight some key candidate genome regions and genes of relevance to the challenges of living in extreme environments (high temperature, high altitude, high infectious disease prevalence). They include candidate genes involved in biological pathways relating to innate and adaptive immunity (e.g. BoLAs, SPAG11, IL1RL2 and GFI1B), heat stress (e.g. HSPs, SOD1 and PRLH) and hypoxia responses (e.g. BDNF and INPP4A). Notably, the highest numbers of candidate regions are found on BTA3, BTA5 and BTA7. They overlap with genes playing roles in several biological functions and pathways. These include but are not limited to growth and feed intake, cell stability, protein stability and sweat gland development. This review may further guide targeted genome studies aiming to assess the importance of candidate causative mutations, within regulatory and protein-coding genome regions, to further understand the biological mechanisms underlying African cattle's unique adaption.
Collapse
Affiliation(s)
- Sumaya Kambal
- Livestock Genetics, International Livestock Research Institute, Addis Ababa, Ethiopia
- Department of Genetics and Animal Breeding, Faculty of Animal Production, University of Khartoum, Khartoum, Sudan
- Department of Bioinformatics and Biostatistics, National University, Khartoum, Sudan
| | - Abdulfatai Tijjani
- Centre for Tropical Livestock Genetics and Health, International Livestock Research Institute, Addis Ababa, Ethiopia
- The Jackson Laboratory, Bar Harbor, Maine, USA
| | - Sabah A E Ibrahim
- Department of Bioinformatics and Biostatistics, National University, Khartoum, Sudan
| | - Mohamed-Khair A Ahmed
- Department of Genetics and Animal Breeding, Faculty of Animal Production, University of Khartoum, Khartoum, Sudan
| | - Joram M Mwacharo
- Scotland's Rural College and Centre for Tropical Livestock Genetics and Health, Edinburgh, UK
- Small Ruminant Genomics, International Centre for Agricultural Research in the Dry Areas, Addis Ababa, Ethiopia
| | - Olivier Hanotte
- Livestock Genetics, International Livestock Research Institute, Addis Ababa, Ethiopia
- Centre for Tropical Livestock Genetics and Health, International Livestock Research Institute, Addis Ababa, Ethiopia
- School of Life Sciences, University of Nottingham, Nottingham, UK
| |
Collapse
|
8
|
Tanaka T, Hayakawa T, Teshima KM. Power of neutrality tests for detecting natural selection. G3 (BETHESDA, MD.) 2023; 13:jkad161. [PMID: 37481468 PMCID: PMC10542275 DOI: 10.1093/g3journal/jkad161] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 06/09/2023] [Accepted: 07/19/2023] [Indexed: 07/24/2023]
Abstract
Detection of natural selection is one of the main interests in population genetics. Thus, many tests have been developed for detecting natural selection using genomic data. Although it is recognized that the utility of tests depends on several evolutionary factors, such as the timing of selection, strength of selection, frequency of selected alleles, demographic events, and initial frequency of selected allele when selection started acting (softness of selection), the relationships between such evolutionary factors and the power of tests are not yet entirely clear. In this study, we investigated the power of 4 tests: Tajiama's D, Fay and Wu's H, relative extended haplotype homozygosity (rEHH), and integrated haplotype score (iHS), under ranges of evolutionary parameters and demographic models to quantitatively expand the understanding of approaches for detecting selection. The results show that each test detects selection within a limited parameter range, and there are still wide ranges of parameters for which none of these tests work effectively. In addition, the parameter space in which each test shows the highest power overlaps the empirical results of previous research. These results indicate that our present perspective of adaptation is limited to only a part of actual adaptation.
Collapse
Affiliation(s)
- Tomotaka Tanaka
- Graduate School of System Life Science, Kyushu University, Fukuoka 819-0395, Japan
| | - Toshiyuki Hayakawa
- Graduate School of System Life Science, Kyushu University, Fukuoka 819-0395, Japan
- Faculty of Arts and Science, Kyushu University, Fukuoka 819-0395, Japan
| | - Kosuke M Teshima
- Department of Biology, Faculty of Science, Kyushu University, Fukuoka 819-0395, Japan
| |
Collapse
|
9
|
Bock DG, Cai Z, Elphinstone C, González-Segovia E, Hirabayashi K, Huang K, Keais GL, Kim A, Owens GL, Rieseberg LH. Genomics of plant speciation. PLANT COMMUNICATIONS 2023; 4:100599. [PMID: 37050879 PMCID: PMC10504567 DOI: 10.1016/j.xplc.2023.100599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 03/21/2023] [Accepted: 04/06/2023] [Indexed: 06/19/2023]
Abstract
Studies of plants have been instrumental for revealing how new species originate. For several decades, botanical research has complemented and, in some cases, challenged concepts on speciation developed via the study of other organisms while also revealing additional ways in which species can form. Now, the ability to sequence genomes at an unprecedented pace and scale has allowed biologists to settle decades-long debates and tackle other emerging challenges in speciation research. Here, we review these recent genome-enabled developments in plant speciation. We discuss complications related to identification of reproductive isolation (RI) loci using analyses of the landscape of genomic divergence and highlight the important role that structural variants have in speciation, as increasingly revealed by new sequencing technologies. Further, we review how genomics has advanced what we know of some routes to new species formation, like hybridization or whole-genome duplication, while casting doubt on others, like population bottlenecks and genetic drift. While genomics can fast-track identification of genes and mutations that confer RI, we emphasize that follow-up molecular and field experiments remain critical. Nonetheless, genomics has clarified the outsized role of ancient variants rather than new mutations, particularly early during speciation. We conclude by highlighting promising avenues of future study. These include expanding what we know so far about the role of epigenetic and structural changes during speciation, broadening the scope and taxonomic breadth of plant speciation genomics studies, and synthesizing information from extensive genomic data that have already been generated by the plant speciation community.
Collapse
Affiliation(s)
- Dan G Bock
- Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, Canada
| | - Zhe Cai
- Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, Canada
| | - Cassandra Elphinstone
- Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, Canada
| | - Eric González-Segovia
- Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, Canada
| | | | - Kaichi Huang
- Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, Canada
| | - Graeme L Keais
- Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, Canada
| | - Amy Kim
- Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, Canada
| | - Gregory L Owens
- Department of Biology, University of Victoria, Victoria, BC, Canada
| | - Loren H Rieseberg
- Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|
10
|
Iwasaki RL, Satta Y. Spatial and temporal diversity of positive selection on shared haplotypes at the PSCA locus among worldwide human populations. Heredity (Edinb) 2023; 131:156-169. [PMID: 37353592 PMCID: PMC10382566 DOI: 10.1038/s41437-023-00631-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 05/16/2023] [Accepted: 05/17/2023] [Indexed: 06/25/2023] Open
Abstract
Selection on standing genetic variation is important for rapid local genetic adaptation when the environment changes. We report that, for the prostate stem cell antigen (PSCA) gene, different populations have different target haplotypes, even though haplotypes are shared among populations. The C-C-A haplotype, whereby the first C is located at rs2294008 of PSCA and is a low risk allele for gastric cancer, has become a target of positive selection in Asia. Conversely, the C-A-G haplotype carrying the same C allele has become a selection target mainly in Africa. However, Asian and African share both haplotypes, consistent with the haplotype divergence time (170 kya) prior to the out-of-Africa dispersal. The frequency of C-C-A/C-A-G is 0.344/0.278 in Asia and 0.209/0.416 in Africa. Two-dimensional site frequency spectrum analysis revealed that the extent of intra-allelic variability of the target haplotype is extremely small in each local population, suggesting that C-C-A or C-A-G is under ongoing hard sweeps in local populations. From the time to the most recent common ancestor (TMRCA) of selected haplotypes, the onset times of positive selection were recent (3-55 kya), concurrently with population subdivision from a common ancestor. Additionally, estimated selection coefficients from ABC analysis were up to ~3%, similar to those at other loci under recent positive selection. Phylogeny of local populations and TMRCA of selected haplotypes revealed that spatial and temporal switching of positive selection targets is a unique and novel feature of ongoing selection at PSCA. This switching may reflect the potential of rapid adaptability to distinct environments.
Collapse
Affiliation(s)
- Risa L Iwasaki
- Department of Evolutionary Studies of Biosystems, School of Advanced Science, SOKENDAI (The Graduate University for Advanced Studies), Hayama, Kanagawa, 240-0193, Japan
- Research Center for Integrative Evolutionary Science, SOKENDAI, Hayama, Kanagawa, 240-0193, Japan
| | - Yoko Satta
- Department of Evolutionary Studies of Biosystems, School of Advanced Science, SOKENDAI (The Graduate University for Advanced Studies), Hayama, Kanagawa, 240-0193, Japan.
- Research Center for Integrative Evolutionary Science, SOKENDAI, Hayama, Kanagawa, 240-0193, Japan.
| |
Collapse
|
11
|
Zhang X, Kim B, Singh A, Sankararaman S, Durvasula A, Lohmueller KE. MaLAdapt Reveals Novel Targets of Adaptive Introgression From Neanderthals and Denisovans in Worldwide Human Populations. Mol Biol Evol 2023; 40:msad001. [PMID: 36617238 PMCID: PMC9887621 DOI: 10.1093/molbev/msad001] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 12/25/2022] [Accepted: 12/28/2022] [Indexed: 01/09/2023] Open
Abstract
Adaptive introgression (AI) facilitates local adaptation in a wide range of species. Many state-of-the-art methods detect AI with ad-hoc approaches that identify summary statistic outliers or intersect scans for positive selection with scans for introgressed genomic regions. Although widely used, approaches intersecting outliers are vulnerable to a high false-negative rate as the power of different methods varies, especially for complex introgression events. Moreover, population genetic processes unrelated to AI, such as background selection or heterosis, may create similar genomic signals to AI, compromising the reliability of methods that rely on neutral null distributions. In recent years, machine learning (ML) methods have been increasingly applied to population genetic questions. Here, we present a ML-based method called MaLAdapt for identifying AI loci from genome-wide sequencing data. Using an Extra-Trees Classifier algorithm, our method combines information from a large number of biologically meaningful summary statistics to capture a powerful composite signature of AI across the genome. In contrast to existing methods, MaLAdapt is especially well-powered to detect AI with mild beneficial effects, including selection on standing archaic variation, and is robust to non-AI selective sweeps, heterosis from deleterious mutations, and demographic misspecification. Furthermore, MaLAdapt outperforms existing methods for detecting AI based on the analysis of simulated data and the validation of empirical signals through visual inspection of haplotype patterns. We apply MaLAdapt to the 1000 Genomes Project human genomic data and discover novel AI candidate regions in non-African populations, including genes that are enriched in functionally important biological pathways regulating metabolism and immune responses.
Collapse
Affiliation(s)
- Xinjun Zhang
- Department of Ecology and Evolutionary Biology, UCLA, Los Angeles, CA
| | - Bernard Kim
- Department of Biology, Stanford University, Palo Alto, CA
| | - Armaan Singh
- Department of Computer Science, UCLA, Los Angeles, CA
| | - Sriram Sankararaman
- Department of Computer Science, UCLA, Los Angeles, CA
- Department of Computational Medicine, UCLA, Los Angeles, CA
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA
| | - Arun Durvasula
- Department of Genetics, Harvard Medical School, Boston, MA
| | - Kirk E Lohmueller
- Department of Ecology and Evolutionary Biology, UCLA, Los Angeles, CA
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA
| |
Collapse
|
12
|
Muktupavela RA, Petr M, Ségurel L, Korneliussen T, Novembre J, Racimo F. Modeling the spatiotemporal spread of beneficial alleles using ancient genomes. eLife 2022; 11:e73767. [PMID: 36537881 PMCID: PMC9767474 DOI: 10.7554/elife.73767] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Accepted: 11/21/2022] [Indexed: 12/24/2022] Open
Abstract
Ancient genome sequencing technologies now provide the opportunity to study natural selection in unprecedented detail. Rather than making inferences from indirect footprints left by selection in present-day genomes, we can directly observe whether a given allele was present or absent in a particular region of the world at almost any period of human history within the last 10,000 years. Methods for studying selection using ancient genomes often rely on partitioning individuals into discrete time periods or regions of the world. However, a complete understanding of natural selection requires more nuanced statistical methods which can explicitly model allele frequency changes in a continuum across space and time. Here we introduce a method for inferring the spread of a beneficial allele across a landscape using two-dimensional partial differential equations. Unlike previous approaches, our framework can handle time-stamped ancient samples, as well as genotype likelihoods and pseudohaploid sequences from low-coverage genomes. We apply the method to a panel of published ancient West Eurasian genomes to produce dynamic maps showcasing the inferred spread of candidate beneficial alleles over time and space. We also provide estimates for the strength of selection and diffusion rate for each of these alleles. Finally, we highlight possible avenues of improvement for accurately tracing the spread of beneficial alleles in more complex scenarios.
Collapse
Affiliation(s)
- Rasa A Muktupavela
- Lundbeck GeoGenetics Centre, GLOBE Institute, Faculty of HealthCopenhagenDenmark
| | - Martin Petr
- Lundbeck GeoGenetics Centre, GLOBE Institute, Faculty of HealthCopenhagenDenmark
| | - Laure Ségurel
- UMR5558 Biométrie et Biologie Evolutive, CNRS - Université Lyon 1VilleurbanneFrance
| | | | - John Novembre
- Department of Human Genetics, University of ChicagoChicagoUnited States
| | - Fernando Racimo
- Lundbeck GeoGenetics Centre, GLOBE Institute, Faculty of HealthCopenhagenDenmark
| |
Collapse
|
13
|
Souilmi Y, Tobler R, Johar A, Williams M, Grey ST, Schmidt J, Teixeira JC, Rohrlach A, Tuke J, Johnson O, Gower G, Turney C, Cox M, Cooper A, Huber CD. Admixture has obscured signals of historical hard sweeps in humans. Nat Ecol Evol 2022; 6:2003-2015. [PMID: 36316412 PMCID: PMC9715430 DOI: 10.1038/s41559-022-01914-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2021] [Accepted: 09/16/2022] [Indexed: 11/06/2022]
Abstract
The role of natural selection in shaping biological diversity is an area of intense interest in modern biology. To date, studies of positive selection have primarily relied on genomic datasets from contemporary populations, which are susceptible to confounding factors associated with complex and often unknown aspects of population history. In particular, admixture between diverged populations can distort or hide prior selection events in modern genomes, though this process is not explicitly accounted for in most selection studies despite its apparent ubiquity in humans and other species. Through analyses of ancient and modern human genomes, we show that previously reported Holocene-era admixture has masked more than 50 historic hard sweeps in modern European genomes. Our results imply that this canonical mode of selection has probably been underappreciated in the evolutionary history of humans and suggest that our current understanding of the tempo and mode of selection in natural populations may be inaccurate.
Collapse
Affiliation(s)
- Yassine Souilmi
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia.
| | - Raymond Tobler
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia.
- Evolution of Cultural Diversity Initiative, Australian National University, Canberra, Australian Capital Territory, Australia.
| | - Angad Johar
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia.
- Department of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA.
| | - Matthew Williams
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia
| | - Shane T Grey
- Transplantation Immunology Group, Immunology Division, Garvan Institute of Medical Research, Darlinghurst, New South Wales, Australia
- St Vincent's Clinical School, Faculty of Medicine, UNSW, Darlinghurst, New South Wales, Australia
| | - Joshua Schmidt
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia
| | - João C Teixeira
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia
| | - Adam Rohrlach
- ARC Centre of Excellence for Mathematical and Statistical Frontiers, The University of Adelaide, Adelaide, South Australia, Australia
- Department of Archaeogenetics, Max Planck Institute for the Science of Human History, Jena, Germany
| | - Jonathan Tuke
- ARC Centre of Excellence for Mathematical and Statistical Frontiers, The University of Adelaide, Adelaide, South Australia, Australia
- School of Mathematical Sciences, The University of Adelaide, Adelaide, South Australia, Australia
| | - Olivia Johnson
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia
| | - Graham Gower
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia
| | - Chris Turney
- Chronos 14Carbon-Cycle Facility and Earth and Sustainability Science Research Centre, University of New South Wales, Sydney, New South Wales, Australia
| | - Murray Cox
- Statistics and Bioinformatics Group, School of Fundamental Sciences, Massey University, Palmerston North, New Zealand
| | - Alan Cooper
- South Australian Museum, Adelaide, South Australia, Australia.
- BlueSky Genetics, Ashton, South Australia, Australia.
| | - Christian D Huber
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia.
- Department of Biology, Penn State University, University Park, PA, USA.
| |
Collapse
|
14
|
McQuillan MA, Ranciaro A, Hansen MEB, Fan S, Beggs W, Belay G, Woldemeskel D, Tishkoff SA. Signatures of Convergent Evolution and Natural Selection at the Alcohol Dehydrogenase Gene Region are Correlated with Agriculture in Ethnically Diverse Africans. Mol Biol Evol 2022; 39:msac183. [PMID: 36026493 PMCID: PMC9547508 DOI: 10.1093/molbev/msac183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
The alcohol dehydrogenase (ADH) family of genes encodes enzymes that catalyze the metabolism of ethanol into acetaldehyde. Nucleotide variation in ADH genes can affect the catalytic properties of these enzymes and is associated with a variety of traits, including alcoholism and cancer. Some ADH variants, including the ADH1B*48His (rs1229984) mutation in the ADH1B gene, reduce the risk of alcoholism and are under positive selection in multiple human populations. The advent of Neolithic agriculture and associated increase in fermented foods and beverages is hypothesized to have been a selective force acting on such variants. However, this hypothesis has not been tested in populations outside of Asia. Here, we use genome-wide selection scans to show that the ADH gene region is enriched for variants showing strong signals of positive selection in multiple Afroasiatic-speaking, agriculturalist populations from Ethiopia, and that this signal is unique among sub-Saharan Africans. We also observe strong selection signals at putatively functional variants in nearby lipid metabolism genes, which may influence evolutionary dynamics at the ADH region. Finally, we show that haplotypes carrying these selected variants were introduced into Northeast Africa from a West-Eurasian source within the last ∼2,000 years and experienced positive selection following admixture. These selection signals are not evident in nearby, genetically similar populations that practice hunting/gathering or pastoralist subsistence lifestyles, supporting the hypothesis that the emergence of agriculture shapes patterns of selection at ADH genes. Together, these results enhance our understanding of how adaptations to diverse environments and diets have influenced the African genomic landscape.
Collapse
Affiliation(s)
| | - Alessia Ranciaro
- Department of Genetics, University of Pennsylvania, Philadelphia, PA
| | | | - Shaohua Fan
- Human Phenome Institute, School of Life Sciences, Fudan University, Shanghai, China
| | - William Beggs
- Department of Genetics, University of Pennsylvania, Philadelphia, PA
| | - Gurja Belay
- Department of Microbial Cellular and Molecular Biology, Addis Ababa University, Addis Ababa, Ethiopia
| | - Dawit Woldemeskel
- Department of Biology, Addis Ababa University, Addis Ababa, Ethiopia
| | - Sarah A Tishkoff
- Department of Genetics, University of Pennsylvania, Philadelphia, PA
| |
Collapse
|
15
|
Tang J, Huang M, He S, Zeng J, Zhu H. Uncovering the extensive trade-off between adaptive evolution and disease susceptibility. Cell Rep 2022; 40:111351. [PMID: 36103812 DOI: 10.1016/j.celrep.2022.111351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 06/13/2022] [Accepted: 08/23/2022] [Indexed: 11/03/2022] Open
Abstract
Favored mutations in the human genome may make the carriers adapt to changing environments and lifestyles but also susceptible to specific diseases. The scale and details of the trade-off between adaptive evolution and disease susceptibility are unclear because most favored mutations in different populations remain unidentified. As no statistical test can discriminate favored mutations from nearby hitchhiking neutral ones, we report a deep-learning network (DeepFavored) to integrate multiple statistical tests and divide identifying favored mutations into two subtasks. We identify favored mutations in three human populations and analyzed the correlation between favored/hitchhiking mutations and genome-wide association study (GWAS) sites. Both favored and hitchhiking neutral mutations are enriched in GWAS sites with population-specific features, and the enrichment and population specificity are prominent in genes in specific Gene Ontology (GO) terms. These provide evidence for extensive and population-specific trade-offs between adaptive evolution and disease susceptibility. The unveiled scale helps understand and investigate differences and diseases of humans.
Collapse
Affiliation(s)
- Ji Tang
- Bioinformatics Section, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Maosheng Huang
- Bioinformatics Section, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China; School of Medical Information and Engineering, Guangdong Pharmaceutical University, Guangzhou 510006, China
| | - Sha He
- Bioinformatics Section, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Junxiang Zeng
- Bioinformatics Section, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Hao Zhu
- Bioinformatics Section, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China; Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Southern Medical University, Guangzhou 510515, China.
| |
Collapse
|
16
|
Somovilla P, Rodríguez-Moreno A, Arribas M, Manrubia S, Lázaro E. Standing Genetic Diversity and Transmission Bottleneck Size Drive Adaptation in Bacteriophage Qβ. Int J Mol Sci 2022; 23:ijms23168876. [PMID: 36012143 PMCID: PMC9408265 DOI: 10.3390/ijms23168876] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 08/03/2022] [Accepted: 08/07/2022] [Indexed: 01/15/2023] Open
Abstract
A critical issue to understanding how populations adapt to new selective pressures is the relative contribution of the initial standing genetic diversity versus that generated de novo. RNA viruses are an excellent model to study this question, as they form highly heterogeneous populations whose genetic diversity can be modulated by factors such as the number of generations, the size of population bottlenecks, or exposure to new environment conditions. In this work, we propagated at nonoptimal temperature (43 °C) two bacteriophage Qβ populations differing in their degree of heterogeneity. Deep sequencing analysis showed that, prior to the temperature change, the most heterogeneous population contained some low-frequency mutations that had previously been detected in the consensus sequences of other Qβ populations adapted to 43 °C. Evolved populations with origin in this ancestor reached similar growth rates, but the adaptive pathways depended on the frequency of these standing mutations and the transmission bottleneck size. In contrast, the growth rate achieved by populations with origin in the less heterogeneous ancestor did depend on the transmission bottleneck size. The conclusion is that viral diversification in a particular environment may lead to the emergence of mutants capable of accelerating adaptation when the environment changes.
Collapse
Affiliation(s)
- Pilar Somovilla
- Centro de Astrobiología (CAB), CSIC-INTA, Ctra. de Torrejón Km 4, Torrejón de Ardoz, 28850 Madrid, Spain
| | - Alicia Rodríguez-Moreno
- Centro de Astrobiología (CAB), CSIC-INTA, Ctra. de Torrejón Km 4, Torrejón de Ardoz, 28850 Madrid, Spain
| | - María Arribas
- Centro de Astrobiología (CAB), CSIC-INTA, Ctra. de Torrejón Km 4, Torrejón de Ardoz, 28850 Madrid, Spain
| | - Susanna Manrubia
- Centro Nacional de Biotecnología (CNB-CSIC), c/Darwin 3, 28049 Madrid, Spain
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain
| | - Ester Lázaro
- Centro de Astrobiología (CAB), CSIC-INTA, Ctra. de Torrejón Km 4, Torrejón de Ardoz, 28850 Madrid, Spain
- Correspondence:
| |
Collapse
|
17
|
Fu Q. Insights into evolutionary dynamics of East Asians through Ancient DNA. CHINESE SCIENCE BULLETIN-CHINESE 2022. [DOI: 10.1360/tb-2022-0569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
18
|
Labuda D, Harding T, Milot E, Vézina H. The effective family size of immigrant founders predicts their long-term demographic outcome: From Québec settlers to their 20th-century descendants. PLoS One 2022; 17:e0266079. [PMID: 35507549 PMCID: PMC9067642 DOI: 10.1371/journal.pone.0266079] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Accepted: 03/14/2022] [Indexed: 11/19/2022] Open
Abstract
Population history reconstruction, using extant genetic diversity data, routinely relies on simple demographic models to project the past through ascending genealogical-tree branches. Because genealogy and genetics are intimately related, we traced descending genealogies of the Québec founders to pursue their fate and to assess their contribution to the present-day population. Focusing on the female and male founder lines, we observed important sex-biased immigration in the early colony years and documented a remarkable impact of these early immigrants on the genetic make-up of 20th-century Québec. We estimated the immigrants’ survival ratio as a proportion of lineages found in the 1931–60 Québec to their number introduced within the immigration period. We assessed the effective family size, EFS, of all immigrant parents and their Québec-born descendants. The survival ratio of the earliest immigrants was the highest and declined over centuries in association with the immigrants’ EFS. Parents with high EFS left plentiful married descendants, putting EFS as the most important variable determining the parental demographic success throughout time for generations ahead. EFS of immigrant founders appears to predict their long-term demographic and, consequently, their genetic outcome. Genealogically inferred immigrants’ "autosomal" genetic contribution to 1931–60 Québec from consecutive immigration periods follow the same yearly pattern as the corresponding maternal and paternal lines. Québec genealogical data offer much broader information on the ancestral diversity distribution than genetic scrutiny of a limited population sample. Genealogically inferred population history could assist studies of evolutionary factors shaping population structure and provide tools to target specific health interventions.
Collapse
Affiliation(s)
- Damian Labuda
- Centre de Recherche, CHU Sainte-Justine, Université de Montréal, Montreal, Québec, Canada
- Département de Pédiatrie, Université de Montréal, Montreal, Québec, Canada
- * E-mail:
| | - Tommy Harding
- Centre de Recherche, CHU Sainte-Justine, Université de Montréal, Montreal, Québec, Canada
- Département de chimie, biochimie et physique, Université du Québec à Trois-Rivières, Trois-Rivières, Québec, Canada
| | - Emmanuel Milot
- Département de chimie, biochimie et physique, Université du Québec à Trois-Rivières, Trois-Rivières, Québec, Canada
| | - Hélène Vézina
- Projet BALSAC, Université du Québec à Chicoutimi, Chicoutimi, Québec, Canada
| |
Collapse
|
19
|
Cuadros-Espinoza S, Laval G, Quintana-Murci L, Patin E. The genomic signatures of natural selection in admixed human populations. Am J Hum Genet 2022; 109:710-726. [PMID: 35259336 DOI: 10.1016/j.ajhg.2022.02.011] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 02/14/2022] [Indexed: 12/15/2022] Open
Abstract
Admixture has been a pervasive phenomenon in human history, extensively shaping the patterns of population genetic diversity. There is increasing evidence to suggest that admixture can also facilitate genetic adaptation to local environments, i.e., admixed populations acquire beneficial mutations from source populations, a process that we refer to as "adaptive admixture." However, the role of adaptive admixture in human evolution and the power to detect it remain poorly characterized. Here, we use extensive computer simulations to evaluate the power of several neutrality statistics to detect natural selection in the admixed population, assuming multiple admixture scenarios. We show that statistics based on admixture proportions, Fadm and LAD, show high power to detect mutations that are beneficial in the admixed population, whereas other statistics, including iHS and FST, falsely detect neutral mutations that have been selected in the source populations only. By combining Fadm and LAD into a single, powerful statistic, we scanned the genomes of 15 worldwide, admixed populations for signatures of adaptive admixture. We confirm that lactase persistence and resistance to malaria have been under adaptive admixture in West Africans and in Malagasy, North Africans, and South Asians, respectively. Our approach also uncovers other cases of adaptive admixture, including APOL1 in Fulani nomads and PKN2 in East Indonesians, involved in resistance to infection and metabolism, respectively. Collectively, our study provides evidence that adaptive admixture has occurred in human populations whose genetic history is characterized by periods of isolation and spatial expansions resulting in increased gene flow.
Collapse
|
20
|
Schaschl H, Göllner T, Morris DL. Positive selection acts on regulatory genetic variants in populations of European ancestry that affect ALDH2 gene expression. Sci Rep 2022; 12:4563. [PMID: 35296751 PMCID: PMC8927298 DOI: 10.1038/s41598-022-08588-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 03/09/2022] [Indexed: 11/09/2022] Open
Abstract
ALDH2 is a key enzyme in alcohol metabolism that protects cells from acetaldehyde toxicity. Using iHS, iSAFE and FST statistics, we identified regulatory acting variants affecting ALDH2 gene expression under positive selection in populations of European ancestry. Several SNPs (rs3184504, rs4766578, rs10774625, rs597808, rs653178, rs847892, rs2013002) that function as eQTLs for ALDH2 in various tissues showed evidence of strong positive selection. Very large pairwise FST values indicated high genetic differentiation at these loci between populations of European ancestry and populations of other global ancestries. Estimating the timing of positive selection on the beneficial alleles suggests that these variants were recently adapted approximately 3000-3700 years ago. The derived beneficial alleles are in complete linkage disequilibrium with the derived ALDH2 promoter variant rs886205, which is associated with higher transcriptional activity. The SNPs rs4766578 and rs847892 are located in binding sequences for the transcription factor HNF4A, which is an important regulatory element of ALDH2 gene expression. In contrast to the missense variant ALDH2 rs671 (ALDH2*2), which is common only in East Asian populations and is associated with greatly reduced enzyme activity and alcohol intolerance, the beneficial alleles of the regulatory variants identified in this study are associated with increased expression of ALDH2. This suggests adaptation of Europeans to higher alcohol consumption.
Collapse
Affiliation(s)
- Helmut Schaschl
- Department of Evolutionary Anthropology, Faculty of Life Sciences, University of Vienna, Djerassiplatz 1, 1030, Vienna, Austria.
| | - Tobias Göllner
- Department of Evolutionary Anthropology, Faculty of Life Sciences, University of Vienna, Djerassiplatz 1, 1030, Vienna, Austria
| | - David L Morris
- Department of Medical and Molecular Genetics, Faculty of Life Sciences and Medicine, King's College London, Great Maze Pond, London, SE1 9RT, UK
| |
Collapse
|
21
|
Muleta KT, Felderhoff T, Winans N, Walstead R, Charles JR, Armstrong JS, Mamidi S, Plott C, Vogel JP, Lemaux PG, Mockler TC, Grimwood J, Schmutz J, Pressoir G, Morris GP. The recent evolutionary rescue of a staple crop depended on over half a century of global germplasm exchange. SCIENCE ADVANCES 2022; 8:eabj4633. [PMID: 35138897 PMCID: PMC8827733 DOI: 10.1126/sciadv.abj4633] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Rapid environmental change can lead to population extinction or evolutionary rescue. The global staple crop sorghum (Sorghum bicolor) has recently been threatened by a global outbreak of an aggressive new biotype of sugarcane aphid (SCA; Melanaphis sacchari). We characterized genomic signatures of adaptation in a Haitian breeding population that had rapidly adapted to SCA infestation, conducting evolutionary population genomics analyses on 296 Haitian lines versus 767 global accessions. Genome scans and geographic analyses suggest that SCA adaptation has been conferred by a globally rare East African allele of RMES1, which spread to breeding programs in Africa, Asia, and the Americas. De novo genome sequencing revealed potential causative variants at RMES1. Markers developed from the RMES1 sweep predicted resistance in eight independent commercial and public breeding programs. These findings demonstrate the value of evolutionary genomics to develop adaptive trait technology and highlight the benefits of global germplasm exchange to facilitate evolutionary rescue.
Collapse
Affiliation(s)
- Kebede T. Muleta
- Department of Agronomy, Kansas State University, Manhattan, KS 66502, USA
| | - Terry Felderhoff
- Department of Agronomy, Kansas State University, Manhattan, KS 66502, USA
| | - Noah Winans
- Department of Agronomy, Kansas State University, Manhattan, KS 66502, USA
| | - Rachel Walstead
- Genome Sequencing Center, HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA
| | - Jean Rigaud Charles
- Chibas and Faculty of Agriculture and Environmental Sciences, Quisqueya University, Port-au-Prince, Haiti
| | - J. Scott Armstrong
- U.S. Department of Agriculture, Agricultural Research Service, Wheat, Peanut and Other Field Crops Research Unit, 1301 North Western Rd., Stillwater, OK 74075, USA
| | - Sujan Mamidi
- Genome Sequencing Center, HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA
| | - Chris Plott
- Genome Sequencing Center, HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA
| | - John P. Vogel
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Peggy G. Lemaux
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Todd C. Mockler
- Donald Danforth Plant Science Center, St. Louis, MO 63132, USA
| | - Jane Grimwood
- Genome Sequencing Center, HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA
| | - Jeremy Schmutz
- Genome Sequencing Center, HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Gael Pressoir
- Chibas and Faculty of Agriculture and Environmental Sciences, Quisqueya University, Port-au-Prince, Haiti
| | - Geoffrey P. Morris
- Department of Agronomy, Kansas State University, Manhattan, KS 66502, USA
- Department of Soil and Crop Science, Colorado State University, Fort Collins, CO 80526, USA
- Corresponding author.
| |
Collapse
|
22
|
Hejase HA, Mo Z, Campagna L, Siepel A. A Deep-Learning Approach for Inference of Selective Sweeps from the Ancestral Recombination Graph. Mol Biol Evol 2022; 39:msab332. [PMID: 34888675 PMCID: PMC8789311 DOI: 10.1093/molbev/msab332] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Detecting signals of selection from genomic data is a central problem in population genetics. Coupling the rich information in the ancestral recombination graph (ARG) with a powerful and scalable deep-learning framework, we developed a novel method to detect and quantify positive selection: Selection Inference using the Ancestral recombination graph (SIA). Built on a Long Short-Term Memory (LSTM) architecture, a particular type of a Recurrent Neural Network (RNN), SIA can be trained to explicitly infer a full range of selection coefficients, as well as the allele frequency trajectory and time of selection onset. We benchmarked SIA extensively on simulations under a European human demographic model, and found that it performs as well or better as some of the best available methods, including state-of-the-art machine-learning and ARG-based methods. In addition, we used SIA to estimate selection coefficients at several loci associated with human phenotypes of interest. SIA detected novel signals of selection particular to the European (CEU) population at the MC1R and ABCC11 loci. In addition, it recapitulated signals of selection at the LCT locus and several pigmentation-related genes. Finally, we reanalyzed polymorphism data of a collection of recently radiated southern capuchino seedeater taxa in the genus Sporophila to quantify the strength of selection and improved the power of our previous methods to detect partial soft sweeps. Overall, SIA uses deep learning to leverage the ARG and thereby provides new insight into how selective sweeps shape genomic diversity.
Collapse
Affiliation(s)
- Hussein A Hejase
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Ziyi Mo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
- School of Biological Sciences, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Leonardo Campagna
- Fuller Evolutionary Biology Program, Cornell Lab of Ornithology, Ithaca, NY, USA
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, USA
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| |
Collapse
|
23
|
Mathieson I, Terhorst J. Direct detection of natural selection in Bronze Age Britain. Genome Res 2022; 32:2057-2067. [PMID: 36316157 PMCID: PMC9808619 DOI: 10.1101/gr.276862.122] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Accepted: 08/29/2022] [Indexed: 11/04/2022]
Abstract
We developed a novel method for efficiently estimating time-varying selection coefficients from genome-wide ancient DNA data. In simulations, our method accurately recovers selective trajectories and is robust to misspecification of population size. We applied it to a large data set of ancient and present-day human genomes from Britain and identified seven loci with genome-wide significant evidence of selection in the past 4500 yr. Almost all of them can be related to increased vitamin D or calcium levels, suggesting strong selective pressure on these or related phenotypes. However, the strength of selection on individual loci varied substantially over time, suggesting that cultural or environmental factors moderated the genetic response. Of 28 complex anthropometric and metabolic traits, skin pigmentation was the only one with significant evidence of polygenic selection, further underscoring the importance of phenotypes related to vitamin D. Our approach illustrates the power of ancient DNA to characterize selection in human populations and illuminates the recent evolutionary history of Britain.
Collapse
Affiliation(s)
- Iain Mathieson
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Jonathan Terhorst
- Department of Statistics, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
24
|
Laval G, Patin E, Boutillier P, Quintana-Murci L. Sporadic occurrence of recent selective sweeps from standing variation in humans as revealed by an approximate Bayesian computation approach. Genetics 2021; 219:6377789. [PMID: 34849862 DOI: 10.1093/genetics/iyab161] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2021] [Accepted: 09/01/2021] [Indexed: 12/14/2022] Open
Abstract
During their dispersals over the last 100,000 years, modern humans have been exposed to a large variety of environments, resulting in genetic adaptation. While genome-wide scans for the footprints of positive Darwinian selection have increased knowledge of genes and functions potentially involved in human local adaptation, they have globally produced evidence of a limited contribution of selective sweeps in humans. Conversely, studies based on machine learning algorithms suggest that recent sweeps from standing variation are widespread in humans, an observation that has been recently questioned. Here, we sought to formally quantify the number of recent selective sweeps in humans, by leveraging approximate Bayesian computation and whole-genome sequence data. Our computer simulations revealed suitable ABC estimations, regardless of the frequency of the selected alleles at the onset of selection and the completion of sweeps. Under a model of recent selection from standing variation, we inferred that an average of 68 (from 56 to 79) and 140 (from 94 to 198) sweeps occurred over the last 100,000 years of human history, in African and Eurasian populations, respectively. The former estimation is compatible with human adaptation rates estimated since divergence with chimps, and reveals numbers of sweeps per generation per site in the range of values estimated in Drosophila. Our results confirm the rarity of selective sweeps in humans and show a low contribution of sweeps from standing variation to recent human adaptation.
Collapse
Affiliation(s)
- Guillaume Laval
- Human Evolutionary Genetics Unit, Institut Pasteur, UMR 2000, CNRS, Paris 75015, France
| | - Etienne Patin
- Human Evolutionary Genetics Unit, Institut Pasteur, UMR 2000, CNRS, Paris 75015, France
| | - Pierre Boutillier
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Lluis Quintana-Murci
- Human Evolutionary Genetics Unit, Institut Pasteur, UMR 2000, CNRS, Paris 75015, France.,Human Genomics and Evolution, Collège de France, 75005 Paris, France
| |
Collapse
|
25
|
Semagn K, Iqbal M, Alachiotis N, N'Diaye A, Pozniak C, Spaner D. Genetic diversity and selective sweeps in historical and modern Canadian spring wheat cultivars using the 90K SNP array. Sci Rep 2021; 11:23773. [PMID: 34893626 PMCID: PMC8664822 DOI: 10.1038/s41598-021-02666-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 11/22/2021] [Indexed: 12/14/2022] Open
Abstract
Previous molecular characterization studies conducted in Canadian wheat cultivars shed some light on the impact of plant breeding on genetic diversity, but the number of varieties and markers used was small. Here, we used 28,798 markers of the wheat 90K single nucleotide polymorphisms to (a) assess the extent of genetic diversity, relationship, population structure, and divergence among 174 historical and modern Canadian spring wheat varieties registered from 1905 to 2018 and 22 unregistered lines (hereinafter referred to as cultivars), and (b) identify genomic regions that had undergone selection. About 91% of the pairs of cultivars differed by 20-40% of the scored alleles, but only 7% of the pairs had kinship coefficients of < 0.250, suggesting the presence of a high proportion of redundancy in allelic composition. Although the 196 cultivars represented eight wheat classes, our results from phylogenetic, principal component, and the model-based population structure analyses revealed three groups, with no clear structure among most wheat classes, breeding programs, and breeding periods. FST statistics computed among different categorical variables showed little genetic differentiation (< 0.05) among breeding periods and breeding programs, but a diverse level of genetic differentiation among wheat classes and predicted groups. Diversity indices were the highest and lowest among cultivars registered from 1970 to 1980 and from 2011 to 2018, respectively. Using two outlier detection methods, we identified from 524 to 2314 SNPs and 41 selective sweeps of which some are close to genes with known phenotype, including plant height, photoperiodism, vernalization, gluten strength, and disease resistance.
Collapse
Affiliation(s)
- Kassa Semagn
- Department of Agricultural, Food, and Nutritional Science, 4-10 Agriculture-Forestry Centre, University of Alberta, Edmonton, AB, T6G 2P5, Canada.
| | - Muhammad Iqbal
- Department of Agricultural, Food, and Nutritional Science, 4-10 Agriculture-Forestry Centre, University of Alberta, Edmonton, AB, T6G 2P5, Canada
| | - Nikolaos Alachiotis
- Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, 3230, Enschede, OV, The Netherlands
| | - Amidou N'Diaye
- Crop Development Centre and Department of Plant Sciences, University of Saskatchewan, 51 Campus Drive, Saskatoon, SK, S7N 5A8, Canada
| | - Curtis Pozniak
- Crop Development Centre and Department of Plant Sciences, University of Saskatchewan, 51 Campus Drive, Saskatoon, SK, S7N 5A8, Canada
| | - Dean Spaner
- Department of Agricultural, Food, and Nutritional Science, 4-10 Agriculture-Forestry Centre, University of Alberta, Edmonton, AB, T6G 2P5, Canada.
| |
Collapse
|
26
|
Luqman H, Widmer A, Fior S, Wegmann D. Identifying loci under selection via explicit demographic models. Mol Ecol Resour 2021; 21:2719-2737. [PMID: 33964107 PMCID: PMC8596768 DOI: 10.1111/1755-0998.13415] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 04/03/2021] [Accepted: 04/28/2021] [Indexed: 01/28/2023]
Abstract
Adaptive genetic variation is a function of both selective and neutral forces. To accurately identify adaptive loci, it is thus critical to account for demographic history. Theory suggests that signatures of selection can be inferred using the coalescent, following the premise that genealogies of selected loci deviate from neutral expectations. Here, we build on this theory to develop an analytical framework to identify loci under selection via explicit demographic models (LSD). Under this framework, signatures of selection are inferred through deviations in demographic parameters, rather than through summary statistics directly, and demographic history is accounted for explicitly. Leveraging the property of demographic models to incorporate directionality, we show that LSD can provide information on the environment in which selection acts on a population. This can prove useful in elucidating the selective processes underlying local adaptation, by characterizing genetic trade-offs and extending the concepts of antagonistic pleiotropy and conditional neutrality from ecological theory to practical application in genomic data. We implement LSD via approximate Bayesian computation and demonstrate, via simulations, that LSD (a) has high power to identify selected loci across a large range of demographic-selection regimes, (b) outperforms commonly applied genome-scan methods under complex demographies and (c) accurately infers the directionality of selection for identified candidates. Using the same simulations, we further characterize the behaviour of isolation-with-migration models conducive to the study of local adaptation under regimes of selection. Finally, we demonstrate an application of LSD by detecting loci and characterizing genetic trade-offs underlying flower colour in Antirrhinum majus.
Collapse
Affiliation(s)
- Hirzi Luqman
- Institute of Integrative BiologyETH ZurichZürichSwitzerland
| | - Alex Widmer
- Institute of Integrative BiologyETH ZurichZürichSwitzerland
| | - Simone Fior
- Institute of Integrative BiologyETH ZurichZürichSwitzerland
| | - Daniel Wegmann
- Department of BiologyUniversity of FribourgFribourgSwitzerland
- Swiss Institute of BioinformaticsFribourgSwitzerland
| |
Collapse
|
27
|
Abstract
Recent studies suggest that admixture with archaic hominins played an important role in facilitating biological adaptations to new environments. For example, interbreeding with Denisovans facilitated the adaptation to high-altitude environments on the Tibetan Plateau. Specifically, the EPAS1 gene, a transcription factor that regulates the response to hypoxia, exhibits strong signatures of both positive selection and introgression from Denisovans in Tibetan individuals. Interestingly, despite being geographically closer to the Denisova Cave, East Asian populations do not harbor as much Denisovan ancestry as populations from Melanesia. Recently, two studies have suggested two independent waves of Denisovan admixture into East Asians, one of which is shared with South Asians and Oceanians. Here, we leverage data from EPAS1 in 78 Tibetan individuals to interrogate which of these two introgression events introduced the EPAS1 beneficial sequence into the ancestral population of Tibetans, and we use the distribution of introgressed segment lengths at this locus to infer the timing of the introgression and selection event. We find that the introgression event unique to East Asians most likely introduced the beneficial haplotype into the ancestral population of Tibetans around 48,700 (16,000-59,500) y ago, and selection started around 9,000 (2,500-42,000) y ago. Our estimates suggest that one of the most convincing examples of adaptive introgression is in fact selection acting on standing archaic variation.
Collapse
|
28
|
The deep population history of northern East Asia from the Late Pleistocene to the Holocene. Cell 2021; 184:3256-3266.e13. [PMID: 34048699 DOI: 10.1016/j.cell.2021.04.040] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 01/20/2021] [Accepted: 04/23/2021] [Indexed: 11/22/2022]
Abstract
Northern East Asia was inhabited by modern humans as early as 40 thousand years ago (ka), as demonstrated by the Tianyuan individual. Using genome-wide data obtained from 25 individuals dated to 33.6-3.4 ka from the Amur region, we show that Tianyuan-related ancestry was widespread in northern East Asia before the Last Glacial Maximum (LGM). At the close of the LGM stadial, the earliest northern East Asian appeared in the Amur region, and this population is basal to ancient northern East Asians. Human populations in the Amur region have maintained genetic continuity from 14 ka, and these early inhabitants represent the closest East Asian source known for Ancient Paleo-Siberians. We also observed that EDAR V370A was likely to have been elevated to high frequency after the LGM, suggesting the possible timing for its selection. This study provides a deep look into the population dynamics of northern East Asia.
Collapse
|
29
|
Andras JP, Fields PD, Du Pasquier L, Fredericksen M, Ebert D. Genome-Wide Association Analysis Identifies a Genetic Basis of Infectivity in a Model Bacterial Pathogen. Mol Biol Evol 2021; 37:3439-3452. [PMID: 32658956 PMCID: PMC7743900 DOI: 10.1093/molbev/msaa173] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Revised: 06/22/2020] [Accepted: 07/08/2020] [Indexed: 12/22/2022] Open
Abstract
Knowledge of the genetic architecture of pathogen infectivity and host resistance is essential for a mechanistic understanding of coevolutionary processes, yet the genetic basis of these interacting traits remains unknown for most host-pathogen systems. We used a comparative genomic approach to explore the genetic basis of infectivity in Pasteuria ramosa, a Gram-positive bacterial pathogen of planktonic crustaceans that has been established as a model for studies of Red Queen host-pathogen coevolution. We sequenced the genomes of a geographically, phenotypically, and genetically diverse collection of P. ramosa strains and performed a genome-wide association study to identify genetic correlates of infection phenotype. We found multiple polymorphisms within a single gene, Pcl7, that correlate perfectly with one common and widespread infection phenotype. We then confirmed this perfect association via Sanger sequencing in a large and diverse sample set of P. ramosa clones. Pcl7 codes for a collagen-like protein, a class of adhesion proteins known or suspected to be involved in the infection mechanisms of a number of important bacterial pathogens. Consistent with expectations under Red Queen coevolution, sequence variation of Pcl7 shows evidence of balancing selection, including extraordinarily high diversity and absence of geographic structure. Based on structural homology with a collagen-like protein of Bacillus anthracis, we propose a hypothesis for the structure of Pcl7 and the physical location of the phenotype-associated polymorphisms. Our results offer strong evidence for a gene governing infectivity and provide a molecular basis for further study of Red Queen dynamics in this model host-pathogen system.
Collapse
Affiliation(s)
- Jason P Andras
- Department of Biological Sciences, Mount Holyoke College, South Hadley, MA
| | - Peter D Fields
- Division of Zoology, Department of Environmental Sciences, University of Basel, Basel, Switzerland
| | - Louis Du Pasquier
- Division of Zoology, Department of Environmental Sciences, University of Basel, Basel, Switzerland
| | - Maridel Fredericksen
- Division of Zoology, Department of Environmental Sciences, University of Basel, Basel, Switzerland
| | - Dieter Ebert
- Division of Zoology, Department of Environmental Sciences, University of Basel, Basel, Switzerland
| |
Collapse
|
30
|
Harris AM, DeGiorgio M. A Likelihood Approach for Uncovering Selective Sweep Signatures from Haplotype Data. Mol Biol Evol 2021; 37:3023-3046. [PMID: 32392293 PMCID: PMC7530616 DOI: 10.1093/molbev/msaa115] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Selective sweeps are frequent and varied signatures in the genomes of natural populations, and detecting them is consequently important in understanding mechanisms of adaptation by natural selection. Following a selective sweep, haplotypic diversity surrounding the site under selection decreases, and this deviation from the background pattern of variation can be applied to identify sweeps. Multiple methods exist to locate selective sweeps in the genome from haplotype data, but none leverages the power of a model-based approach to make their inference. Here, we propose a likelihood ratio test statistic T to probe whole-genome polymorphism data sets for selective sweep signatures. Our framework uses a simple but powerful model of haplotype frequency spectrum distortion to find sweeps and additionally make an inference on the number of presently sweeping haplotypes in a population. We found that the T statistic is suitable for detecting both hard and soft sweeps across a variety of demographic models, selection strengths, and ages of the beneficial allele. Accordingly, we applied the T statistic to variant calls from European and sub-Saharan African human populations, yielding primarily literature-supported candidates, including LCT, RSPH3, and ZNF211 in CEU, SYT1, RGS18, and NNT in YRI, and HLA genes in both populations. We also searched for sweep signatures in Drosophila melanogaster, finding expected candidates at Ace, Uhg1, and Pimet. Finally, we provide open-source software to compute the T statistic and the inferred number of presently sweeping haplotypes from whole-genome data.
Collapse
Affiliation(s)
- Alexandre M Harris
- Department of Biology, Pennsylvania State University, University Park, PA.,Molecular, Cellular, and Integrative Biosciences, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA
| | - Michael DeGiorgio
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL
| |
Collapse
|
31
|
Isildak U, Stella A, Fumagalli M. Distinguishing between recent balancing selection and incomplete sweep using deep neural networks. Mol Ecol Resour 2021; 21:2706-2718. [PMID: 33749134 DOI: 10.1111/1755-0998.13379] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 03/01/2021] [Accepted: 03/05/2021] [Indexed: 12/12/2022]
Abstract
Balancing selection is an important adaptive mechanism underpinning a wide range of phenotypes. Despite its relevance, the detection of recent balancing selection from genomic data is challenging as its signatures are qualitatively similar to those left by ongoing positive selection. In this study, we developed and implemented two deep neural networks and tested their performance to predict loci under recent selection, either due to balancing selection or incomplete sweep, from population genomic data. Specifically, we generated forward-in-time simulations to train and test an artificial neural network (ANN) and a convolutional neural network (CNN). ANN received as input multiple summary statistics calculated on the locus of interest, while CNN was applied directly on the matrix of haplotypes. We found that both architectures have high accuracy to identify loci under recent selection. CNN generally outperformed ANN to distinguish between signals of balancing selection and incomplete sweep and was less affected by incorrect training data. We deployed both trained networks on neutral genomic regions in European populations and demonstrated a lower false-positive rate for CNN than ANN. We finally deployed CNN within the MEFV gene region and identified several common variants predicted to be under incomplete sweep in a European population. Notably, two of these variants are functional changes and could modulate susceptibility to familial Mediterranean fever, possibly as a consequence of past adaptation to pathogens. In conclusion, deep neural networks were able to characterize signals of selection on intermediate frequency variants, an analysis currently inaccessible by commonly used strategies.
Collapse
Affiliation(s)
- Ulas Isildak
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Alessandro Stella
- Laboratory of Medical Genetics, Department of Biomedical Sciences and Human Oncology, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Matteo Fumagalli
- Department of Life Sciences, Silwood Park Campus, Imperial College London, London, UK
| |
Collapse
|
32
|
Garud NR, Messer PW, Petrov DA. Detection of hard and soft selective sweeps from Drosophila melanogaster population genomic data. PLoS Genet 2021; 17:e1009373. [PMID: 33635910 PMCID: PMC7946363 DOI: 10.1371/journal.pgen.1009373] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 03/10/2021] [Accepted: 01/17/2021] [Indexed: 12/12/2022] Open
Abstract
Whether hard sweeps or soft sweeps dominate adaptation has been a matter of much debate. Recently, we developed haplotype homozygosity statistics that (i) can detect both hard and soft sweeps with similar power and (ii) can classify the detected sweeps as hard or soft. The application of our method to population genomic data from a natural population of Drosophila melanogaster (DGRP) allowed us to rediscover three known cases of adaptation at the loci Ace, Cyp6g1, and CHKov1 known to be driven by soft sweeps, and detected additional candidate loci for recent and strong sweeps. Surprisingly, all of the top 50 candidates showed patterns much more consistent with soft rather than hard sweeps. Recently, Harris et al. 2018 criticized this work, suggesting that all the candidate loci detected by our haplotype statistics, including the positive controls, are unlikely to be sweeps at all and that instead these haplotype patterns can be more easily explained by complex neutral demographic models. They also claim that these neutral non-sweeps are likely to be hard instead of soft sweeps. Here, we reanalyze the DGRP data using a range of complex admixture demographic models and reconfirm our original published results suggesting that the majority of recent and strong sweeps in D. melanogaster are first likely to be true sweeps, and second, that they do appear to be soft. Furthermore, we discuss ways to take this work forward given that most demographic models employed in such analyses are necessarily too simple to capture the full demographic complexity, while more realistic models are unlikely to be inferred correctly because they require a large number of free parameters.
Collapse
Affiliation(s)
- Nandita R. Garud
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, United States of America
- Department of Human Genetics, University of California, Los Angeles, California, United States of America
| | - Philipp W. Messer
- Department of Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Dmitri A. Petrov
- Department of Biology, Stanford University, Stanford, California, United States of America
| |
Collapse
|
33
|
Werren EA, Garcia O, Bigham AW. Identifying adaptive alleles in the human genome: from selection mapping to functional validation. Hum Genet 2020; 140:241-276. [PMID: 32728809 DOI: 10.1007/s00439-020-02206-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2020] [Accepted: 07/07/2020] [Indexed: 12/19/2022]
Abstract
The suite of phenotypic diversity across geographically distributed human populations is the outcome of genetic drift, gene flow, and natural selection throughout human evolution. Human genetic variation underlying local biological adaptations to selective pressures is incompletely characterized. With the emergence of population genetics modeling of large-scale genomic data derived from diverse populations, scientists are able to map signatures of natural selection in the genome in a process known as selection mapping. Inferred selection signals further can be used to identify candidate functional alleles that underlie putative adaptive phenotypes. Phenotypic association, fine mapping, and functional experiments facilitate the identification of candidate adaptive alleles. Functional investigation of candidate adaptive variation using novel techniques in molecular biology is slowly beginning to unravel how selection signals translate to changes in biology that underlie the phenotypic spectrum of our species. In addition to informing evolutionary hypotheses of adaptation, the discovery and functional annotation of adaptive alleles also may be of clinical significance. While selection mapping efforts in non-European populations are growing, there remains a stark under-representation of diverse human populations in current public genomic databases, of both clinical and non-clinical cohorts. This lack of inclusion limits the study of human biological variation. Identifying and functionally validating candidate adaptive alleles in more global populations is necessary for understanding basic human biology and human disease.
Collapse
Affiliation(s)
- Elizabeth A Werren
- Department of Human Genetics, The University of Michigan, Ann Arbor, MI, USA
- Department of Anthropology, The University of Michigan, Ann Arbor, MI, USA
| | - Obed Garcia
- Department of Anthropology, The University of Michigan, Ann Arbor, MI, USA
| | - Abigail W Bigham
- Department of Anthropology, University of California Los Angeles, 341 Haines Hall, Los Angeles, CA, 90095, USA.
| |
Collapse
|
34
|
VolcanoFinder: Genomic scans for adaptive introgression. PLoS Genet 2020; 16:e1008867. [PMID: 32555579 PMCID: PMC7326285 DOI: 10.1371/journal.pgen.1008867] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 06/30/2020] [Accepted: 05/18/2020] [Indexed: 12/16/2022] Open
Abstract
Recent research shows that introgression between closely-related species is an important source of adaptive alleles for a wide range of taxa. Typically, detection of adaptive introgression from genomic data relies on comparative analyses that require sequence data from both the recipient and the donor species. However, in many cases, the donor is unknown or the data is not currently available. Here, we introduce a genome-scan method—VolcanoFinder—to detect recent events of adaptive introgression using polymorphism data from the recipient species only. VolcanoFinder detects adaptive introgression sweeps from the pattern of excess intermediate-frequency polymorphism they produce in the flanking region of the genome, a pattern which appears as a volcano-shape in pairwise genetic diversity. Using coalescent theory, we derive analytical predictions for these patterns. Based on these results, we develop a composite-likelihood test to detect signatures of adaptive introgression relative to the genomic background. Simulation results show that VolcanoFinder has high statistical power to detect these signatures, even for older sweeps and for soft sweeps initiated by multiple migrant haplotypes. Finally, we implement VolcanoFinder to detect archaic introgression in European and sub-Saharan African human populations, and uncovered interesting candidates in both populations, such as TSHR in Europeans and TCHH-RPTN in Africans. We discuss their biological implications and provide guidelines for identifying and circumventing artifactual signals during empirical applications of VolcanoFinder. The process by which beneficial alleles are introduced into a species from a closely-related species is termed adaptive introgression. We present an analytically-tractable model for the effects of adaptive introgression on non-adaptive genetic variation in the genomic region surrounding the beneficial allele. The result we describe is a characteristic volcano-shaped pattern of increased variability that arises around the positively-selected site, and we introduce an open-source method VolcanoFinder to detect this signal in genomic data. Importantly, VolcanoFinder is a population-genetic likelihood-based approach, rather than a comparative-genomic approach, and can therefore probe genomic variation data from a single population for footprints of adaptive introgression, even from a priori unknown and possibly extinct donor species.
Collapse
|
35
|
Rees JS, Castellano S, Andrés AM. The Genomics of Human Local Adaptation. Trends Genet 2020; 36:415-428. [DOI: 10.1016/j.tig.2020.03.006] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 03/16/2020] [Accepted: 03/18/2020] [Indexed: 01/23/2023]
|
36
|
Hejase HA, Dukler N, Siepel A. From Summary Statistics to Gene Trees: Methods for Inferring Positive Selection. Trends Genet 2020; 36:243-258. [PMID: 31954511 PMCID: PMC7177178 DOI: 10.1016/j.tig.2019.12.008] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 11/15/2019] [Accepted: 12/11/2019] [Indexed: 01/01/2023]
Abstract
Methods to detect signals of natural selection from genomic data have traditionally emphasized the use of simple summary statistics. Here, we review a new generation of methods that consider combinations of conventional summary statistics and/or richer features derived from inferred gene trees and ancestral recombination graphs (ARGs). We also review recent advances in methods for population genetic simulation and ARG reconstruction. Finally, we describe opportunities for future work on a variety of related topics, including the genetics of speciation, estimation of selection coefficients, and inference of selection on polygenic traits. Together, these emerging methods offer promising new directions in the study of natural selection.
Collapse
Affiliation(s)
- Hussein A Hejase
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
| | - Noah Dukler
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| |
Collapse
|
37
|
Hartfield M, Bataillon T. Selective Sweeps Under Dominance and Inbreeding. G3 (BETHESDA, MD.) 2020; 10:1063-1075. [PMID: 31974096 PMCID: PMC7056974 DOI: 10.1534/g3.119.400919] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Accepted: 01/18/2020] [Indexed: 12/26/2022]
Abstract
A major research goal in evolutionary genetics is to uncover loci experiencing positive selection. One approach involves finding 'selective sweeps' patterns, which can either be 'hard sweeps' formed by de novo mutation, or 'soft sweeps' arising from recurrent mutation or existing standing variation. Existing theory generally assumes outcrossing populations, and it is unclear how dominance affects soft sweeps. We consider how arbitrary dominance and inbreeding via self-fertilization affect hard and soft sweep signatures. With increased self-fertilization, they are maintained over longer map distances due to reduced effective recombination and faster beneficial allele fixation times. Dominance can affect sweep patterns in outcrossers if the derived variant originates from either a single novel allele, or from recurrent mutation. These models highlight the challenges in distinguishing hard and soft sweeps, and propose methods to differentiate between scenarios.
Collapse
Affiliation(s)
- Matthew Hartfield
- Department of Ecology and Evolutionary Biology, University of Toronto, Ontario M5S 3B2, Canada,
- Bioinformatics Research Centre, Aarhus University, Aarhus 8000, Denmark, and
- Institute of Evolutionary Biology, The University of Edinburgh, Edinburgh EH9 3FL, United Kingdom
| | - Thomas Bataillon
- Bioinformatics Research Centre, Aarhus University, Aarhus 8000, Denmark, and
| |
Collapse
|
38
|
Nakagome S, Hudson RR, Di Rienzo A. Inferring the model and onset of natural selection under varying population size from the site frequency spectrum and haplotype structure. Proc Biol Sci 2020; 286:20182541. [PMID: 30963935 DOI: 10.1098/rspb.2018.2541] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
A fundamental question about adaptation in a population is the time of onset of the selective pressure acting on beneficial alleles. Inferring this time, in turn, depends on the selection model. We develop a framework of approximate Bayesian computation (ABC) that enables the use of the full site frequency spectrum and haplotype structure to test the goodness-of-fit of selection models and estimate the timing of selection under varying population size scenarios. We show that our method has sufficient power to distinguish natural selection from neutrality even if relatively old selection increased the frequency of a pre-existing allele from 20% to 50% or from 40% to 80%. Our ABC can accurately estimate the time of onset of selection on a new mutation. However, estimates are prone to bias under the standing variation model, possibly due to the uncertainty in the allele frequency at the onset of selection. We further extend our approach to take advantage of ancient DNA data that provides information on the allele frequency path of the beneficial allele. Applying our ABC, including both modern and ancient human DNA data, to four pigmentation alleles in Europeans, we detected selection on standing variants that occurred after the dispersal from Africa even though models of selection on a new mutation were initially supported for two of these alleles without the ancient data.
Collapse
Affiliation(s)
- Shigeki Nakagome
- 1 Department of Human Genetics, University of Chicago , Chicago, IL , USA.,3 School of Medicine, Faculty of Health Sciences, Trinity College Dublin, the University of Dublin , Dublin , Ireland
| | - Richard R Hudson
- 1 Department of Human Genetics, University of Chicago , Chicago, IL , USA.,2 Department of Ecology & Evolution, University of Chicago , Chicago, IL , USA
| | - Anna Di Rienzo
- 1 Department of Human Genetics, University of Chicago , Chicago, IL , USA
| |
Collapse
|
39
|
Norris ET, Rishishwar L, Chande AT, Conley AB, Ye K, Valderrama-Aguirre A, Jordan IK. Admixture-enabled selection for rapid adaptive evolution in the Americas. Genome Biol 2020; 21:29. [PMID: 32028992 PMCID: PMC7006128 DOI: 10.1186/s13059-020-1946-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Accepted: 01/24/2020] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Admixture occurs when previously isolated populations come together and exchange genetic material. We hypothesize that admixture can enable rapid adaptive evolution in human populations by introducing novel genetic variants (haplotypes) at intermediate frequencies, and we test this hypothesis through the analysis of whole genome sequences sampled from admixed Latin American populations in Colombia, Mexico, Peru, and Puerto Rico. RESULTS Our screen for admixture-enabled selection relies on the identification of loci that contain more or less ancestry from a given source population than would be expected given the genome-wide ancestry frequencies. We employ a combined evidence approach to evaluate levels of ancestry enrichment at single loci across multiple populations and multiple loci that function together to encode polygenic traits. We find cross-population signals of African ancestry enrichment at the major histocompatibility locus on chromosome 6, consistent with admixture-enabled selection for enhanced adaptive immune response. Several of the human leukocyte antigen genes at this locus, such as HLA-A, HLA-DRB51, and HLA-DRB5, show independent evidence of positive selection prior to admixture, based on extended haplotype homozygosity in African populations. A number of traits related to inflammation, blood metabolites, and both the innate and adaptive immune system show evidence of admixture-enabled polygenic selection in Latin American populations. CONCLUSIONS The results reported here, considered together with the ubiquity of admixture in human evolution, suggest that admixture serves as a fundamental mechanism that drives rapid adaptive evolution in human populations.
Collapse
Affiliation(s)
- Emily T. Norris
- School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, Atlanta, GA 30332 USA
- IHRC-Georgia Tech Applied Bioinformatics Laboratory, Atlanta, GA USA
- PanAmerican Bioinformatics Institute, Cali, Valle del Cauca Colombia
| | - Lavanya Rishishwar
- School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, Atlanta, GA 30332 USA
- IHRC-Georgia Tech Applied Bioinformatics Laboratory, Atlanta, GA USA
- PanAmerican Bioinformatics Institute, Cali, Valle del Cauca Colombia
| | - Aroon T. Chande
- School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, Atlanta, GA 30332 USA
- IHRC-Georgia Tech Applied Bioinformatics Laboratory, Atlanta, GA USA
- PanAmerican Bioinformatics Institute, Cali, Valle del Cauca Colombia
| | - Andrew B. Conley
- IHRC-Georgia Tech Applied Bioinformatics Laboratory, Atlanta, GA USA
- PanAmerican Bioinformatics Institute, Cali, Valle del Cauca Colombia
| | - Kaixiong Ye
- Department of Genetics, University of Georgia, Athens, GA USA
- Institute of Bioinformatics, University of Georgia, Athens, GA USA
| | - Augusto Valderrama-Aguirre
- PanAmerican Bioinformatics Institute, Cali, Valle del Cauca Colombia
- Biomedical Research Institute (COL0082529), Cali, Colombia
- Universidad Santiago de Cali, Cali, Colombia
| | - I. King Jordan
- School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, Atlanta, GA 30332 USA
- IHRC-Georgia Tech Applied Bioinformatics Laboratory, Atlanta, GA USA
- PanAmerican Bioinformatics Institute, Cali, Valle del Cauca Colombia
| |
Collapse
|
40
|
Campbell MC, Ashong B, Teng S, Harvey J, Cross CN. Multiple selective sweeps of ancient polymorphisms in and around LTα located in the MHC class III region on chromosome 6. BMC Evol Biol 2019; 19:218. [PMID: 31791241 PMCID: PMC6889576 DOI: 10.1186/s12862-019-1516-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Accepted: 09/20/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Lymphotoxin-α (LTα), located in the Major Histocompatibility Complex (MHC) class III region on chromosome 6, encodes a cytotoxic protein that mediates a variety of antiviral responses among other biological functions. Furthermore, several genotypes at this gene have been implicated in the onset of a number of complex diseases, including myocardial infarction, autoimmunity, and various types of cancer. However, little is known about levels of nucleotide variation and linkage disequilibrium (LD) in and near LTα, which could also influence phenotypic variance. To address this gap in knowledge, we examined sequence variation across ~ 10 kilobases (kbs), encompassing LTα and the upstream region, in 2039 individuals from the 1000 Genomes Project originating from 21 global populations. RESULTS Here, we observed striking patterns of diversity, including an excess of intermediate-frequency alleles, the maintenance of multiple common haplotypes and a deep coalescence time for variation (dating > 1.0 million years ago), in global populations. While these results are generally consistent with a model of balancing selection, we also uncovered a signature of positive selection in the form of long-range LD on chromosomes with derived alleles primarily in Eurasian populations. To reconcile these findings, which appear to support different models of selection, we argue that selective sweeps (particularly, soft sweeps) of multiple derived alleles in and/or near LTα occurred in non-Africans after their ancestors left Africa. Furthermore, these targets of selection were predicted to alter transcription factor binding site affinity and protein stability, suggesting they play a role in gene function. Additionally, our data also showed that a subset of these functional adaptive variants are present in archaic hominin genomes. CONCLUSIONS Overall, this study identified candidate functional alleles in a biologically-relevant genomic region, and offers new insights into the evolutionary origins of these loci in modern human populations.
Collapse
Affiliation(s)
- Michael C. Campbell
- Department of Biology, College of Arts and Sciences, Howard University, Washington, DC 20059 USA
| | - Bryan Ashong
- Department of Biology, College of Arts and Sciences, Howard University, Washington, DC 20059 USA
| | - Shaolei Teng
- Department of Biology, College of Arts and Sciences, Howard University, Washington, DC 20059 USA
| | - Jayla Harvey
- Department of Biology, College of Arts and Sciences, Howard University, Washington, DC 20059 USA
| | - Christopher N. Cross
- Department of Anatomy, College of Medicine, Howard University, Washington, DC 20059 USA
| |
Collapse
|
41
|
Satta Y, Zheng W, Nishiyama KV, Iwasaki RL, Hayakawa T, Fujito NT, Takahata N. Two-dimensional site frequency spectrum for detecting, classifying and dating incomplete selective sweeps. Genes Genet Syst 2019; 94:283-300. [DOI: 10.1266/ggs.19-00012] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Affiliation(s)
- Yoko Satta
- School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies)
| | - Wanjing Zheng
- School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies)
| | - Kumiko V. Nishiyama
- School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies)
| | - Risa L. Iwasaki
- School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies)
| | - Toshiyuki Hayakawa
- Graduate School of Systems Life Sciences and Faculty of Arts and Science, Kyushu University
| | - Naoko T. Fujito
- Institute for Human Genetics and Department of Epidemiology and Biostatistics, University of California
| | - Naoyuki Takahata
- School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies)
| |
Collapse
|
42
|
Torada L, Lorenzon L, Beddis A, Isildak U, Pattini L, Mathieson S, Fumagalli M. ImaGene: a convolutional neural network to quantify natural selection from genomic data. BMC Bioinformatics 2019; 20:337. [PMID: 31757205 PMCID: PMC6873651 DOI: 10.1186/s12859-019-2927-x] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Accepted: 05/31/2019] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND The genetic bases of many complex phenotypes are still largely unknown, mostly due to the polygenic nature of the traits and the small effect of each associated mutation. An alternative approach to classic association studies to determining such genetic bases is an evolutionary framework. As sites targeted by natural selection are likely to harbor important functionalities for the carrier, the identification of selection signatures in the genome has the potential to unveil the genetic mechanisms underpinning human phenotypes. Popular methods of detecting such signals rely on compressing genomic information into summary statistics, resulting in the loss of information. Furthermore, few methods are able to quantify the strength of selection. Here we explored the use of deep learning in evolutionary biology and implemented a program, called ImaGene, to apply convolutional neural networks on population genomic data for the detection and quantification of natural selection. RESULTS ImaGene enables genomic information from multiple individuals to be represented as abstract images. Each image is created by stacking aligned genomic data and encoding distinct alleles into separate colors. To detect and quantify signatures of positive selection, ImaGene implements a convolutional neural network which is trained using simulations. We show how the method implemented in ImaGene can be affected by data manipulation and learning strategies. In particular, we show how sorting images by row and column leads to accurate predictions. We also demonstrate how the misspecification of the correct demographic model for producing training data can influence the quantification of positive selection. We finally illustrate an approach to estimate the selection coefficient, a continuous variable, using multiclass classification techniques. CONCLUSIONS While the use of deep learning in evolutionary genomics is in its infancy, here we demonstrated its potential to detect informative patterns from large-scale genomic data. We implemented methods to process genomic data for deep learning in a user-friendly program called ImaGene. The joint inference of the evolutionary history of mutations and their functional impact will facilitate mapping studies and provide novel insights into the molecular mechanisms associated with human phenotypes.
Collapse
Affiliation(s)
- Luis Torada
- Department of Life Sciences, Silwood Park campus, Imperial College London, Buckhurst Road, Ascot, SL5 7PY UK
| | - Lucrezia Lorenzon
- Department of Life Sciences, Silwood Park campus, Imperial College London, Buckhurst Road, Ascot, SL5 7PY UK
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, piazza Leonardo da Vinci 32, Milan, 20133 Italy
| | - Alice Beddis
- Department of Life Sciences, Silwood Park campus, Imperial College London, Buckhurst Road, Ascot, SL5 7PY UK
| | - Ulas Isildak
- Department of Biological Sciences, Middle East Technical University, METU Üniversiteler Mah. Dumlupınar Blv. No:1, Ankara, 06800 Çankaya Turkey
| | - Linda Pattini
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, piazza Leonardo da Vinci 32, Milan, 20133 Italy
| | - Sara Mathieson
- Department of Computer Science, Swarthmore College, 500 College Ave, Swarthmore, 19081 PA USA
| | - Matteo Fumagalli
- Department of Life Sciences, Silwood Park campus, Imperial College London, Buckhurst Road, Ascot, SL5 7PY UK
| |
Collapse
|
43
|
Wegary D, Teklewold A, Prasanna BM, Ertiro BT, Alachiotis N, Negera D, Awas G, Abakemal D, Ogugo V, Gowda M, Semagn K. Molecular diversity and selective sweeps in maize inbred lines adapted to African highlands. Sci Rep 2019; 9:13490. [PMID: 31530852 PMCID: PMC6748982 DOI: 10.1038/s41598-019-49861-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 08/28/2019] [Indexed: 11/08/2022] Open
Abstract
Little is known on maize germplasm adapted to the African highland agro-ecologies. In this study, we analyzed high-density genotyping by sequencing (GBS) data of 298 African highland adapted maize inbred lines to (i) assess the extent of genetic purity, genetic relatedness, and population structure, and (ii) identify genomic regions that have undergone selection (selective sweeps) in response to adaptation to highland environments. Nearly 91% of the pairs of inbred lines differed by 30-36% of the scored alleles, but only 32% of the pairs of the inbred lines had relative kinship coefficient <0.050, which suggests the presence of substantial redundancy in allelic composition that may be due to repeated use of fewer genetic backgrounds (source germplasm) during line development. Results from different genetic relatedness and population structure analyses revealed three different groups, which generally agrees with pedigree information and breeding history, but less so by heterotic groups and endosperm modification. We identified 944 single nucleotide polymorphic (SNP) markers that fell within 22 selective sweeps that harbored 265 protein-coding candidate genes of which some of the candidate genes had known functions. Details of the candidate genes with known functions and differences in nucleotide diversity among groups predicted based on multivariate methods have been discussed.
Collapse
Affiliation(s)
- Dagne Wegary
- International Maize and Wheat Improvement Center (CIMMYT) - Ethiopia Office, ILRI Campus, CMC Road, Gurd Sholla, P.O. Box 5689, Addis Ababa, Ethiopia
| | - Adefris Teklewold
- International Maize and Wheat Improvement Center (CIMMYT) - Ethiopia Office, ILRI Campus, CMC Road, Gurd Sholla, P.O. Box 5689, Addis Ababa, Ethiopia.
| | - Boddupalli M Prasanna
- International Maize and Wheat Improvement Center (CIMMYT), ICRAF House, United Nations Avenue, Gigiri, P.O. Box 1041-00621, Nairobi, Kenya
| | - Berhanu T Ertiro
- Bako National Maize Research Center, Ethiopian Institute of Agricultural Research (EIAR), Addis Ababa, Ethiopia
| | - Nikolaos Alachiotis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, Nikolaou Plastira 100, 70013, Heraklion, Crete, Greece
| | - Demewez Negera
- International Maize and Wheat Improvement Center (CIMMYT) - Ethiopia Office, ILRI Campus, CMC Road, Gurd Sholla, P.O. Box 5689, Addis Ababa, Ethiopia
| | - Geremew Awas
- International Maize and Wheat Improvement Center (CIMMYT) - Ethiopia Office, ILRI Campus, CMC Road, Gurd Sholla, P.O. Box 5689, Addis Ababa, Ethiopia
| | - Demissew Abakemal
- Ambo Agricultural Research Center, P.O. Box 37, West Shoa, Ambo, Ethiopia
| | - Veronica Ogugo
- International Maize and Wheat Improvement Center (CIMMYT), ICRAF House, United Nations Avenue, Gigiri, P.O. Box 1041-00621, Nairobi, Kenya
| | - Manje Gowda
- International Maize and Wheat Improvement Center (CIMMYT), ICRAF House, United Nations Avenue, Gigiri, P.O. Box 1041-00621, Nairobi, Kenya
| | - Kassa Semagn
- International Maize and Wheat Improvement Center (CIMMYT), ICRAF House, United Nations Avenue, Gigiri, P.O. Box 1041-00621, Nairobi, Kenya.
- Africa Rice Center (AfricaRice), M'bé Research Station, 01 B.P. 2551, Bouaké 01, Côte d'Ivoire.
| |
Collapse
|
44
|
Stern AJ, Wilton PR, Nielsen R. An approximate full-likelihood method for inferring selection and allele frequency trajectories from DNA sequence data. PLoS Genet 2019; 15:e1008384. [PMID: 31518343 PMCID: PMC6760815 DOI: 10.1371/journal.pgen.1008384] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 09/25/2019] [Accepted: 08/26/2019] [Indexed: 12/24/2022] Open
Abstract
Most current methods for detecting natural selection from DNA sequence data are limited in that they are either based on summary statistics or a composite likelihood, and as a consequence, do not make full use of the information available in DNA sequence data. We here present a new importance sampling approach for approximating the full likelihood function for the selection coefficient. Our method CLUES treats the ancestral recombination graph (ARG) as a latent variable that is integrated out using previously published Markov Chain Monte Carlo (MCMC) methods. The method can be used for detecting selection, estimating selection coefficients, testing models of changes in the strength of selection, estimating the time of the start of a selective sweep, and for inferring the allele frequency trajectory of a selected or neutral allele. We perform extensive simulations to evaluate the method and show that it uniformly improves power to detect selection compared to current popular methods such as nSL and SDS, and can provide reliable inferences of allele frequency trajectories under many conditions. We also explore the potential of our method to detect extremely recent changes in the strength of selection. We use the method to infer the past allele frequency trajectory for a lactase persistence SNP (MCM6) in Europeans. We also infer the trajectory of a SNP (EDAR) in Han Chinese, finding evidence that this allele's age is much older than previously claimed. We also study a set of 11 pigmentation-associated variants. Several genes show evidence of strong selection particularly within the last 5,000 years, including ASIP, KITLG, and TYR. However, selection on OCA2/HERC2 seems to be much older and, in contrast to previous claims, we find no evidence of selection on TYRP1.
Collapse
Affiliation(s)
- Aaron J. Stern
- Graduate Group in Computation Biology, University of California, Berkeley, Berkeley, California, United States of America
| | - Peter R. Wilton
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
| | - Rasmus Nielsen
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
- Department of Statistics, University of California, Berkeley, Berkeley, California, United States of America
| |
Collapse
|
45
|
Lee KM, Coop G. Population genomics perspectives on convergent adaptation. Philos Trans R Soc Lond B Biol Sci 2019; 374:20180236. [PMID: 31154979 PMCID: PMC6560269 DOI: 10.1098/rstb.2018.0236] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/21/2018] [Indexed: 01/12/2023] Open
Abstract
Convergent adaptation is the independent evolution of similar traits conferring a fitness advantage in two or more lineages. Cases of convergent adaptation inform our ideas about the ecological and molecular basis of adaptation. In judging the degree to which putative cases of convergent adaptation provide an independent replication of the process of adaptation, it is necessary to establish the degree to which the evolutionary change is unexpected under null models and to show that selection has repeatedly, independently driven these changes. Here, we discuss the issues that arise from these questions particularly for closely related populations, where gene flow and standing variation add additional layers of complexity. We outline a conceptual framework to guide intuition as to the extent to which evolutionary change represents the independent gain of information owing to selection and show that this is a measure of how surprised we should be by convergence. Additionally, we summarize the ways population and quantitative genetics and genomics may help us address questions related to convergent adaptation, as well as open new questions and avenues of research. This article is part of the theme issue 'Convergent evolution in the genomics era: new insights and directions'.
Collapse
Affiliation(s)
- Kristin M. Lee
- Center for Population Biology, University of California, Davis, CA 95616, USA
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
| | - Graham Coop
- Center for Population Biology, University of California, Davis, CA 95616, USA
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
| |
Collapse
|
46
|
Hufford MB, Berny Mier Y Teran JC, Gepts P. Crop Biodiversity: An Unfinished Magnum Opus of Nature. ANNUAL REVIEW OF PLANT BIOLOGY 2019; 70:727-751. [PMID: 31035827 DOI: 10.1146/annurev-arplant-042817-040240] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Crop biodiversity is one of the major inventions of humanity through the process of domestication. It is also an essential resource for crop improvement to adapt agriculture to ever-changing conditions like global climate change and consumer preferences. Domestication and the subsequent evolution under cultivation have profoundly shaped the genetic architecture of this biodiversity. In this review, we highlight recent advances in our understanding of crop biodiversity. Topics include the reduction of genetic diversity during domestication and counteracting factors, a discussion of the relationship between parallel phenotypic and genotypic evolution, the role of plasticity in genotype × environment interactions, and the important role subsistence farmers play in actively maintaining crop biodiversity and in participatory breeding. Linking genotype and phenotype remains the holy grail of crop biodiversity studies.
Collapse
Affiliation(s)
- Matthew B Hufford
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa 50011-1020, USA;
| | | | - Paul Gepts
- Department of Plant Sciences, University of California, Davis, California 95616-8780, USA; ,
| |
Collapse
|
47
|
Ndjiondjop MN, Alachiotis N, Pavlidis P, Goungoulou A, Kpeki SB, Zhao D, Semagn K. Comparisons of molecular diversity indices, selective sweeps and population structure of African rice with its wild progenitor and Asian rice. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2019; 132:1145-1158. [PMID: 30578434 PMCID: PMC6449321 DOI: 10.1007/s00122-018-3268-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Accepted: 12/11/2018] [Indexed: 05/20/2023]
Abstract
The extent of molecular diversity parameters across three rice species was compared using large germplasm collection genotyped with genomewide SNPs and SNPs that fell within selective sweep regions. Previous studies conducted on limited number of accessions have reported very low genetic variation in African rice (Oryza glaberrima Steud.) as compared to its wild progenitor (O. barthii A. Chev.) and to Asian rice (O. sativa L.). Here, we characterized a large collection of African rice and compared its molecular diversity indices and population structure with the two other species using genomewide single nucleotide polymorphisms (SNPs) and SNPs that mapped within selective sweeps. A total of 3245 samples representing African rice (2358), Asian rice (772) and O. barthii (115) were genotyped with 26,073 physically mapped SNPs. Using all SNPs, the level of marker polymorphism, average genetic distance and nucleotide diversity in African rice accounted for 59.1%, 63.2% and 37.1% of that of O. barthii, respectively. SNP polymorphism and overall nucleotide diversity of the African rice accounted for 20.1-32.1 and 16.3-37.3% of that of the Asian rice, respectively. We identified 780 SNPs that fell within 37 candidate selective sweeps in African rice, which were distributed across all 12 rice chromosomes. Nucleotide diversity of the African rice estimated from the 780 SNPs was 8.3 × 10-4, which is not only 20-fold smaller than the value estimated from all genomewide SNPs (π = 1.6 × 10-2), but also accounted for just 4.1%, 0.9% and 2.1% of that of O. barthii, lowland Asian rice and upland Asian rice, respectively. The genotype data generated for a large collection of rice accessions conserved at the AfricaRice genebank will be highly useful for the global rice community and promote germplasm use.
Collapse
Affiliation(s)
- Marie Noelle Ndjiondjop
- M'bé Research Station, Africa Rice Center (AfricaRice), 01 B.P. 2551, Bouaké 01, Côte d'Ivoire.
| | - Nikolaos Alachiotis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, Nikolaou Plastira 100, 70013, Heraklion, Crete, Greece
| | - Pavlos Pavlidis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, Nikolaou Plastira 100, 70013, Heraklion, Crete, Greece
| | - Alphonse Goungoulou
- M'bé Research Station, Africa Rice Center (AfricaRice), 01 B.P. 2551, Bouaké 01, Côte d'Ivoire
| | - Sèdjro Bienvenu Kpeki
- M'bé Research Station, Africa Rice Center (AfricaRice), 01 B.P. 2551, Bouaké 01, Côte d'Ivoire
| | - Dule Zhao
- M'bé Research Station, Africa Rice Center (AfricaRice), 01 B.P. 2551, Bouaké 01, Côte d'Ivoire
| | - Kassa Semagn
- M'bé Research Station, Africa Rice Center (AfricaRice), 01 B.P. 2551, Bouaké 01, Côte d'Ivoire.
| |
Collapse
|
48
|
Human Immunology through the Lens of Evolutionary Genetics. Cell 2019; 177:184-199. [DOI: 10.1016/j.cell.2019.02.033] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Revised: 02/19/2019] [Accepted: 02/20/2019] [Indexed: 01/04/2023]
|
49
|
Abstract
Identifying genomic locations of natural selection from sequence data is an ongoing challenge in population genetics. Current methods utilizing information combined from several summary statistics typically assume no correlation of summary statistics regardless of the genomic location from which they are calculated. However, due to linkage disequilibrium, summary statistics calculated at nearby genomic positions are highly correlated. We introduce an approach termed Trendsetter that accounts for the similarity of statistics calculated from adjacent genomic regions through trend filtering, while reducing the effects of multicollinearity through regularization. Our penalized regression framework has high power to detect sweeps, is capable of classifying sweep regions as either hard or soft, and can be applied to other selection scenarios as well. We find that Trendsetter is robust to both extensive missing data and strong background selection, and has comparable power to similar current approaches. Moreover, the model learned by Trendsetter can be viewed as a set of curves modeling the spatial distribution of summary statistics in the genome. Application to human genomic data revealed positively selected regions previously discovered such as LCT in Europeans and EDAR in East Asians. We also identified a number of novel candidates and show that populations with greater relatedness share more sweep signals.
Collapse
Affiliation(s)
- Mehreen R Mughal
- Bioinformatics and Genomics at the Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA
| | - Michael DeGiorgio
- Departments of Biology and Statistics, Pennsylvania State University,University Park, PA
- Institute for CyberScience, Pennsylvania State University, University Park, PA
| |
Collapse
|
50
|
Montalva N, Adhikari K, Liebert A, Mendoza-Revilla J, Flores SV, Mace R, Swallow DM. Adaptation to milking agropastoralism in Chilean goat herders and nutritional benefit of lactase persistence. Ann Hum Genet 2019; 83:11-22. [PMID: 30264486 PMCID: PMC6393766 DOI: 10.1111/ahg.12277] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Revised: 07/05/2018] [Accepted: 07/06/2018] [Indexed: 12/31/2022]
Abstract
The genetic trait of lactase persistence (LP) evolved as an adaptation to milking pastoralism in the Old World and is a well-known example of positive natural selection in humans. However, the specific mechanisms conferring this selective advantage are unknown. To understand the relationship between milk drinking, LP, growth, reproduction, and survival, communities of the Coquimbo Region in Chile, with recent adoption of milking agropastoralism, were used as a model population. DNA samples and data on stature, reproduction, and diet were collected from 451 participants. Lactose tolerance tests were done on 41 of them. The European -13,910*T (rs4988235) was the only LP causative variant found, showing strong association (99.6%) with LP phenotype. Models of associations of inferred LP status and milk consumption, with fertility, mortality, height, and weight were adjusted with measures of ancestry and relatedness to control for population structure. Although we found no statistically significant effect of LP on fertility, a significant effect (P = 0.002) was observed of LP on body mass index (BMI) in males and of BMI on fertility (P = 0.003). These results fail to support a causal relationship between LP and fertility yet suggest the idea of a nutritional advantage of LP. Furthermore, the proportion of European ancestry around the genetic region of -13,910*T is significantly higher (P = 0.008) than the proportion of European ancestry genome-wide, providing evidence of recent positive selection since European-Amerindian admixture. This signature was absent in nonpastoralist Latin American populations, supporting the hypothesis of specific adaptation to milking agropastoralism in the Coquimbo communities.
Collapse
Affiliation(s)
- Nicolás Montalva
- Research Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London, WC1E 6BT, United Kingdom
- Department of Anthropology, Human Evolutionary Ecology Group, University College London, 14 Taviton St, London, WC1H 0BW, United Kingdom
- Departamento de Antropología, Facultad de Ciencias Sociales y Jurídicas, Universidad de Tarapacá, 384 Calle Cardenal Caro, Arica, Chile
| | - Kaustubh Adhikari
- Research Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London, WC1E 6BT, United Kingdom
- Department of Cell & Developmental Biology, University College London, Anatomy Building, Gower Street, London, WC1E 6BT, United Kingdom
| | - Anke Liebert
- Research Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London, WC1E 6BT, United Kingdom
| | - Javier Mendoza-Revilla
- Research Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London, WC1E 6BT, United Kingdom
- Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, 430 Honorario Delgado, Lima 31, Perú
| | - Sergio V Flores
- Departamento de Antropología, Facultad de Ciencias Sociales, Universidad de Chile, 1045 Av. Capitan Ignacio Carrera Pinto, Nunoa, 7800284, Chile
| | - Ruth Mace
- Department of Anthropology, Human Evolutionary Ecology Group, University College London, 14 Taviton St, London, WC1H 0BW, United Kingdom
| | - Dallas M Swallow
- Research Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London, WC1E 6BT, United Kingdom
| |
Collapse
|