1
|
Schoville SD, Burke RL, Dong DY, Ginsberg HS, Maestas L, Paskewitz SM, Tsao JI. Genome resequencing reveals population divergence and local adaptation of blacklegged ticks in the United States. Mol Ecol 2024; 33:e17460. [PMID: 38963031 DOI: 10.1111/mec.17460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Revised: 03/12/2024] [Accepted: 04/15/2024] [Indexed: 07/05/2024]
Abstract
Tick vectors and tick-borne disease are increasingly impacting human populations globally. An important challenge is to understand tick movement patterns, as this information can be used to improve management and predictive modelling of tick population dynamics. Evolutionary analysis of genetic divergence, gene flow and local adaptation provides insight on movement patterns at large spatiotemporal scales. We develop low coverage, whole genome resequencing data for 92 blacklegged ticks, Ixodes scapularis, representing range-wide variation across the United States. Through analysis of population genomic data, we find that tick populations are structured geographically, with gradual isolation by distance separating three population clusters in the northern United States, southeastern United States and a unique cluster represented by a sample from Tennessee. Populations in the northern United States underwent population contractions during the last glacial period and diverged from southern populations at least 50 thousand years ago. Genome scans of selection provide strong evidence of local adaptation at genes responding to host defences, blood-feeding and environmental variation. In addition, we explore the potential of low coverage genome sequencing of whole-tick samples for documenting the diversity of microbial pathogens and recover important tick-borne pathogens such as Borrelia burgdorferi. The combination of isolation by distance and local adaptation in blacklegged ticks demonstrates that gene flow, including recent expansion, is limited to geographical scales of a few hundred kilometres.
Collapse
Affiliation(s)
- Sean D Schoville
- Department of Entomology, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Russell L Burke
- Department of Biology, Hofstra University, Hempstead, New York, USA
| | - Dahn-Young Dong
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Howard S Ginsberg
- United States Geological Survey, Eastern Ecological Science Center, Woodward Hall - PSE, Field Station at the University of Rhode Island, Kingston, Rhode Island, USA
| | - Lauren Maestas
- Cattle Fever Tick Research Laboratory, USDA, Agricultural Research Service, Edinburg, Texas, USA
| | - Susan M Paskewitz
- Department of Entomology, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Jean I Tsao
- Department of Fisheries and Wildlife, Michigan State University, East Lansing, Michigan, USA
- Department of Large Animal Clinical Sciences, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
2
|
Weng YM, Kavanaugh DH, Schoville SD. Evidence for Admixture and Rapid Evolution During Glacial Climate Change in an Alpine Specialist. Mol Biol Evol 2024; 41:msae130. [PMID: 38935588 PMCID: PMC11247348 DOI: 10.1093/molbev/msae130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 05/30/2024] [Accepted: 06/14/2024] [Indexed: 06/29/2024] Open
Abstract
The pace of current climate change is expected to be problematic for alpine flora and fauna, as their adaptive capacity may be limited by small population size. Yet, despite substantial genetic drift following post-glacial recolonization of alpine habitats, alpine species are notable for their success surviving in highly heterogeneous environments. Population genomic analyses demonstrating how alpine species have adapted to novel environments with limited genetic diversity remain rare, yet are important in understanding the potential for species to respond to contemporary climate change. In this study, we explored the evolutionary history of alpine ground beetles in the Nebria ingens complex, including the demographic and adaptive changes that followed the last glacier retreat. We first tested alternative models of evolutionary divergence in the species complex. Using millions of genome-wide SNP markers from hundreds of beetles, we found evidence that the N. ingens complex has been formed by past admixture of lineages responding to glacial cycles. Recolonization of alpine sites involved a distributional range shift to higher elevation, which was accompanied by a reduction in suitable habitat and the emergence of complex spatial genetic structure. We tested several possible genetic pathways involved in adaptation to heterogeneous local environments using genome scan and genotype-environment association approaches. From the identified genes, we found enriched functions associated with abiotic stress responses, with strong evidence for adaptation to hypoxia-related pathways. The results demonstrate that despite rapid demographic change, alpine beetles in the N. ingens complex underwent rapid physiological evolution.
Collapse
Affiliation(s)
- Yi-Ming Weng
- Department of Entomology, University of Wisconsin-Madison, Madison, WI, USA
- Okinawa Institute of Science and Technology, Graduate University, Okinawa, Japan
| | - David H Kavanaugh
- California Academy of Sciences, Department of Entomology, San Francisco, CA, USA
| | - Sean D Schoville
- Department of Entomology, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
3
|
Soni V, Johri P, Jensen JD. Evaluating power to detect recurrent selective sweeps under increasingly realistic evolutionary null models. Evolution 2023; 77:2113-2127. [PMID: 37395482 PMCID: PMC10547124 DOI: 10.1093/evolut/qpad120] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 06/15/2023] [Accepted: 06/30/2023] [Indexed: 07/04/2023]
Abstract
The detection of selective sweeps from population genomic data often relies on the premise that the beneficial mutations in question have fixed very near the sampling time. As it has been previously shown that the power to detect a selective sweep is strongly dependent on the time since fixation as well as the strength of selection, it is naturally the case that strong, recent sweeps leave the strongest signatures. However, the biological reality is that beneficial mutations enter populations at a rate, one that partially determines the mean wait time between sweep events and hence their age distribution. An important question thus remains about the power to detect recurrent selective sweeps when they are modeled by a realistic mutation rate and as part of a realistic distribution of fitness effects, as opposed to a single, recent, isolated event on a purely neutral background as is more commonly modeled. Here we use forward-in-time simulations to study the performance of commonly used sweep statistics, within the context of more realistic evolutionary baseline models incorporating purifying and background selection, population size change, and mutation and recombination rate heterogeneity. Results demonstrate the important interplay of these processes, necessitating caution when interpreting selection scans; specifically, false-positive rates are in excess of true-positive across much of the evaluated parameter space, and selective sweeps are often undetectable unless the strength of selection is exceptionally strong.
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| |
Collapse
|
4
|
Tanaka T, Hayakawa T, Teshima KM. Power of neutrality tests for detecting natural selection. G3 (BETHESDA, MD.) 2023; 13:jkad161. [PMID: 37481468 PMCID: PMC10542275 DOI: 10.1093/g3journal/jkad161] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 06/09/2023] [Accepted: 07/19/2023] [Indexed: 07/24/2023]
Abstract
Detection of natural selection is one of the main interests in population genetics. Thus, many tests have been developed for detecting natural selection using genomic data. Although it is recognized that the utility of tests depends on several evolutionary factors, such as the timing of selection, strength of selection, frequency of selected alleles, demographic events, and initial frequency of selected allele when selection started acting (softness of selection), the relationships between such evolutionary factors and the power of tests are not yet entirely clear. In this study, we investigated the power of 4 tests: Tajiama's D, Fay and Wu's H, relative extended haplotype homozygosity (rEHH), and integrated haplotype score (iHS), under ranges of evolutionary parameters and demographic models to quantitatively expand the understanding of approaches for detecting selection. The results show that each test detects selection within a limited parameter range, and there are still wide ranges of parameters for which none of these tests work effectively. In addition, the parameter space in which each test shows the highest power overlaps the empirical results of previous research. These results indicate that our present perspective of adaptation is limited to only a part of actual adaptation.
Collapse
Affiliation(s)
- Tomotaka Tanaka
- Graduate School of System Life Science, Kyushu University, Fukuoka 819-0395, Japan
| | - Toshiyuki Hayakawa
- Graduate School of System Life Science, Kyushu University, Fukuoka 819-0395, Japan
- Faculty of Arts and Science, Kyushu University, Fukuoka 819-0395, Japan
| | - Kosuke M Teshima
- Department of Biology, Faculty of Science, Kyushu University, Fukuoka 819-0395, Japan
| |
Collapse
|
5
|
He H, Yang H, Foo R, Chan W, Zhu F, Liu Y, Zhou X, Ma L, Wang LF, Zhai W. Population genomic analysis reveals distinct demographics and recent adaptation in the black flying fox (Pteropus alecto). J Genet Genomics 2023; 50:554-562. [PMID: 37182682 DOI: 10.1016/j.jgg.2023.05.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 05/03/2023] [Accepted: 05/03/2023] [Indexed: 05/16/2023]
Abstract
As the only mammalian group capable of powered flight, bats have many unique biological traits. Previous comparative genomic studies in bats have focused on long-term evolution. However, the micro-evolutionary processes driving recent evolution are largely under-explored. Using resequencing data from 50 black flying foxes (Pteropus alecto), one of the model species for bats, we find that black flying fox has much higher genetic diversity and lower levels of linkage disequilibrium than most of the mammalian species. Demographic inference reveals strong population fluctuations (>100 fold) coinciding with multiple historical events including the last glacial change and Toba super eruption, suggesting that the black flying fox is a very resilient species with strong recovery abilities. While long-term adaptation in the black flying fox is enriched in metabolic genes, recent adaptation in the black flying fox has a unique landscape where recently selected genes are not strongly enriched in any functional category. The demographic history and mode of adaptation suggest that black flying fox might be a well-adapted species with strong evolutionary resilience. Taken together, this study unravels a vibrant landscape of recent evolution for the black flying fox and sheds light on several unique evolutionary processes for bats comparing to other mammalian groups.
Collapse
Affiliation(s)
- Haopeng He
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hechuan Yang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Randy Foo
- Programme in Emerging Infectious Diseases, Duke-NUS Medical School, Singapore 169857, Singapore; Singhealth Duke-NUS Global Health Institute, Singapore 169857, Singapore
| | - Wharton Chan
- Programme in Emerging Infectious Diseases, Duke-NUS Medical School, Singapore 169857, Singapore; Singhealth Duke-NUS Global Health Institute, Singapore 169857, Singapore
| | - Feng Zhu
- Programme in Emerging Infectious Diseases, Duke-NUS Medical School, Singapore 169857, Singapore; Singhealth Duke-NUS Global Health Institute, Singapore 169857, Singapore
| | - Yunsong Liu
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xuming Zhou
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Liang Ma
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China.
| | - Lin-Fa Wang
- Programme in Emerging Infectious Diseases, Duke-NUS Medical School, Singapore 169857, Singapore; Singhealth Duke-NUS Global Health Institute, Singapore 169857, Singapore.
| | - Weiwei Zhai
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, Yunnan 650223, China.
| |
Collapse
|
6
|
Soni V, Johri P, Jensen JD. Evaluating power to detect recurrent selective sweeps under increasingly realistic evolutionary null models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.15.545166. [PMID: 37398347 PMCID: PMC10312679 DOI: 10.1101/2023.06.15.545166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
The detection of selective sweeps from population genomic data often relies on the premise that the beneficial mutations in question have fixed very near the sampling time. As it has been previously shown that the power to detect a selective sweep is strongly dependent on the time since fixation as well as the strength of selection, it is naturally the case that strong, recent sweeps leave the strongest signatures. However, the biological reality is that beneficial mutations enter populations at a rate, one that partially determines the mean wait time between sweep events and hence their age distribution. An important question thus remains about the power to detect recurrent selective sweeps when they are modelled by a realistic mutation rate and as part of a realistic distribution of fitness effects (DFE), as opposed to a single, recent, isolated event on a purely neutral background as is more commonly modelled. Here we use forward-in-time simulations to study the performance of commonly used sweep statistics, within the context of more realistic evolutionary baseline models incorporating purifying and background selection, population size change, and mutation and recombination rate heterogeneity. Results demonstrate the important interplay of these processes, necessitating caution when interpreting selection scans; specifically, false positive rates are in excess of true positive across much of the evaluated parameter space, and selective sweeps are often undetectable unless the strength of selection is exceptionally strong. Teaser Text Outlier-based genomic scans have proven a popular approach for identifying loci that have potentially experienced recent positive selection. However, it has previously been shown that an evolutionarily appropriate baseline model that incorporates non-equilibrium population histories, purifying and background selection, and variation in mutation and recombination rates is necessary to reduce often extreme false positive rates when performing genomic scans. Here we evaluate the power to detect recurrent selective sweeps using common SFS-based and haplotype-based methods under these increasingly realistic models. We find that while these appropriate evolutionary baselines are essential to reduce false positive rates, the power to accurately detect recurrent selective sweeps is generally low across much of the biologically relevant parameter space.
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
- Present address: Department of Biology, Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | | |
Collapse
|
7
|
Johri P, Gout JF, Doak TG, Lynch M. A Population-Genetic Lens into the Process of Gene Loss Following Whole-Genome Duplication. Mol Biol Evol 2022; 39:msac118. [PMID: 35639978 PMCID: PMC9206413 DOI: 10.1093/molbev/msac118] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
Whole-genome duplications (WGDs) have occurred in many eukaryotic lineages. However, the underlying evolutionary forces and molecular mechanisms responsible for the long-term retention of gene duplicates created by WGDs are not well understood. We employ a population-genomic approach to understand the selective forces acting on paralogs and investigate ongoing duplicate-gene loss in multiple species of Paramecium that share an ancient WGD. We show that mutations that abolish protein function are more likely to be segregating in retained WGD paralogs than in single-copy genes, most likely because of ongoing nonfunctionalization post-WGD. This relaxation of purifying selection occurs in only one WGD paralog, accompanied by the gradual fixation of nonsynonymous mutations and reduction in levels of expression, and occurs over a long period of evolutionary time, "marking" one locus for future loss. Concordantly, the fitness effects of new nonsynonymous mutations and frameshift-causing indels are significantly more deleterious in the highly expressed copy compared with their paralogs with lower expression. Our results provide a novel mechanistic model of gene duplicate loss following WGDs, wherein selection acts on the sum of functional activity of both duplicate genes, allowing the two to wander in expression and functional space, until one duplicate locus eventually degenerates enough in functional efficiency or expression that its contribution to total activity is too insignificant to be retained by purifying selection. Retention of duplicates by such mechanisms predicts long times to duplicate-gene loss, which should not be falsely attributed to retention due to gain/change in function.
Collapse
Affiliation(s)
- Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| | - Jean-Francois Gout
- Department of Biological Sciences, Mississippi State University, Mississippi State, MS 39762, USA
| | - Thomas G Doak
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
- National Center for Genome Analysis Support, Indiana University, Bloomington, IN 47405, USA
| | - Michael Lynch
- Center for Mechanisms of Evolution, The Biodesign Institute, Arizona State University, Tempe, AZ 85287, USA
| |
Collapse
|
8
|
Johri P, Aquadro CF, Beaumont M, Charlesworth B, Excoffier L, Eyre-Walker A, Keightley PD, Lynch M, McVean G, Payseur BA, Pfeifer SP, Stephan W, Jensen JD. Recommendations for improving statistical inference in population genomics. PLoS Biol 2022; 20:e3001669. [PMID: 35639797 PMCID: PMC9154105 DOI: 10.1371/journal.pbio.3001669] [Citation(s) in RCA: 48] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
The field of population genomics has grown rapidly in response to the recent advent of affordable, large-scale sequencing technologies. As opposed to the situation during the majority of the 20th century, in which the development of theoretical and statistical population genetic insights outpaced the generation of data to which they could be applied, genomic data are now being produced at a far greater rate than they can be meaningfully analyzed and interpreted. With this wealth of data has come a tendency to focus on fitting specific (and often rather idiosyncratic) models to data, at the expense of a careful exploration of the range of possible underlying evolutionary processes. For example, the approach of directly investigating models of adaptive evolution in each newly sequenced population or species often neglects the fact that a thorough characterization of ubiquitous nonadaptive processes is a prerequisite for accurate inference. We here describe the perils of these tendencies, present our consensus views on current best practices in population genomic data analysis, and highlight areas of statistical inference and theory that are in need of further attention. Thereby, we argue for the importance of defining a biologically relevant baseline model tuned to the details of each new analysis, of skepticism and scrutiny in interpreting model fitting results, and of carefully defining addressable hypotheses and underlying uncertainties.
Collapse
Affiliation(s)
- Parul Johri
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | - Charles F. Aquadro
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, United States of America
| | - Mark Beaumont
- School of Biological Sciences, University of Bristol, Bristol, United Kingdom
| | - Brian Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Laurent Excoffier
- Institute of Ecology and Evolution, University of Berne, Berne, Switzerland
| | - Adam Eyre-Walker
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Peter D. Keightley
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Michael Lynch
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | - Gil McVean
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Bret A. Payseur
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Susanne P. Pfeifer
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | | | - Jeffrey D. Jensen
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| |
Collapse
|
9
|
Moinet A, Schlichta F, Peischl S, Excoffier L. Strong neutral sweeps occurring during a population contraction. Genetics 2022; 220:6529544. [PMID: 35171980 PMCID: PMC8982045 DOI: 10.1093/genetics/iyac021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 01/22/2022] [Indexed: 11/14/2022] Open
Abstract
A strong reduction in diversity around a specific locus is often interpreted as a recent rapid fixation of a positively selected allele, a phenomenon called a selective sweep. Rapid fixation of neutral variants can however lead to a similar reduction in local diversity, especially when the population experiences changes in population size, e.g. bottlenecks or range expansions. The fact that demographic processes can lead to signals of nucleotide diversity very similar to signals of selective sweeps is at the core of an ongoing discussion about the roles of demography and natural selection in shaping patterns of neutral variation. Here, we quantitatively investigate the shape of such neutral valleys of diversity under a simple model of a single population size change, and we compare it to signals of a selective sweep. We analytically describe the expected shape of such "neutral sweeps" and show that selective sweep valleys of diversity are, for the same fixation time, wider than neutral valleys. On the other hand, it is always possible to parametrize our model to find a neutral valley that has the same width as a given selected valley. Our findings provide further insight into how simple demographic models can create valleys of genetic diversity similar to those attributed to positive selection.
Collapse
Affiliation(s)
- Antoine Moinet
- Interfaculty Bioinformatics Unit, University of Bern, Bern 3012, Switzerland,Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland,Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Baltzerstrasse 6, 3012 Bern, Switzerland
| | - Flávia Schlichta
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland,Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Baltzerstrasse 6, 3012 Bern, Switzerland
| | - Stephan Peischl
- Interfaculty Bioinformatics Unit, University of Bern, Bern 3012, Switzerland,Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland,Corresponding author.
| | - Laurent Excoffier
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland,Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Baltzerstrasse 6, 3012 Bern, Switzerland
| |
Collapse
|
10
|
Klassmann A, Gautier M. Detecting selection using extended haplotype homozygosity (EHH)-based statistics in unphased or unpolarized data. PLoS One 2022; 17:e0262024. [PMID: 35041674 PMCID: PMC8765611 DOI: 10.1371/journal.pone.0262024] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 12/15/2021] [Indexed: 12/19/2022] Open
Abstract
Analysis of population genetic data often includes a search for genomic regions with signs of recent positive selection. One of such approaches involves the concept of extended haplotype homozygosity (EHH) and its associated statistics. These statistics typically require phased haplotypes, and some of them necessitate polarized variants. Here, we unify and extend previously proposed modifications to loosen these requirements. We compare the modified versions with the original ones by measuring the false discovery rate in simulated whole-genome scans and by quantifying the overlap of inferred candidate regions in empirical data. We find that phasing information is indispensable for accurate estimation of within-population statistics (for all but very large samples) and of cross-population statistics for small samples. Ancestry information, in contrast, is of lesser importance for both types of statistic. Our publicly available R package rehh incorporates the modified statistics presented here.
Collapse
Affiliation(s)
| | - Mathieu Gautier
- CBGP, Univ Montpellier, CIRAD, INRAE, IRD, Institut Agro, Montpellier, France
| |
Collapse
|
11
|
Semagn K, Iqbal M, Alachiotis N, N'Diaye A, Pozniak C, Spaner D. Genetic diversity and selective sweeps in historical and modern Canadian spring wheat cultivars using the 90K SNP array. Sci Rep 2021; 11:23773. [PMID: 34893626 PMCID: PMC8664822 DOI: 10.1038/s41598-021-02666-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 11/22/2021] [Indexed: 12/14/2022] Open
Abstract
Previous molecular characterization studies conducted in Canadian wheat cultivars shed some light on the impact of plant breeding on genetic diversity, but the number of varieties and markers used was small. Here, we used 28,798 markers of the wheat 90K single nucleotide polymorphisms to (a) assess the extent of genetic diversity, relationship, population structure, and divergence among 174 historical and modern Canadian spring wheat varieties registered from 1905 to 2018 and 22 unregistered lines (hereinafter referred to as cultivars), and (b) identify genomic regions that had undergone selection. About 91% of the pairs of cultivars differed by 20-40% of the scored alleles, but only 7% of the pairs had kinship coefficients of < 0.250, suggesting the presence of a high proportion of redundancy in allelic composition. Although the 196 cultivars represented eight wheat classes, our results from phylogenetic, principal component, and the model-based population structure analyses revealed three groups, with no clear structure among most wheat classes, breeding programs, and breeding periods. FST statistics computed among different categorical variables showed little genetic differentiation (< 0.05) among breeding periods and breeding programs, but a diverse level of genetic differentiation among wheat classes and predicted groups. Diversity indices were the highest and lowest among cultivars registered from 1970 to 1980 and from 2011 to 2018, respectively. Using two outlier detection methods, we identified from 524 to 2314 SNPs and 41 selective sweeps of which some are close to genes with known phenotype, including plant height, photoperiodism, vernalization, gluten strength, and disease resistance.
Collapse
Affiliation(s)
- Kassa Semagn
- Department of Agricultural, Food, and Nutritional Science, 4-10 Agriculture-Forestry Centre, University of Alberta, Edmonton, AB, T6G 2P5, Canada.
| | - Muhammad Iqbal
- Department of Agricultural, Food, and Nutritional Science, 4-10 Agriculture-Forestry Centre, University of Alberta, Edmonton, AB, T6G 2P5, Canada
| | - Nikolaos Alachiotis
- Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, 3230, Enschede, OV, The Netherlands
| | - Amidou N'Diaye
- Crop Development Centre and Department of Plant Sciences, University of Saskatchewan, 51 Campus Drive, Saskatoon, SK, S7N 5A8, Canada
| | - Curtis Pozniak
- Crop Development Centre and Department of Plant Sciences, University of Saskatchewan, 51 Campus Drive, Saskatoon, SK, S7N 5A8, Canada
| | - Dean Spaner
- Department of Agricultural, Food, and Nutritional Science, 4-10 Agriculture-Forestry Centre, University of Alberta, Edmonton, AB, T6G 2P5, Canada.
| |
Collapse
|
12
|
Johri P, Charlesworth B, Howell EK, Lynch M, Jensen JD. Revisiting the notion of deleterious sweeps. Genetics 2021; 219:iyab094. [PMID: 34125884 PMCID: PMC9101445 DOI: 10.1093/genetics/iyab094] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 06/08/2021] [Indexed: 11/14/2022] Open
Abstract
It has previously been shown that, conditional on its fixation, the time to fixation of a semi-dominant deleterious autosomal mutation in a randomly mating population is the same as that of an advantageous mutation. This result implies that deleterious mutations could generate selective sweep-like effects. Although their fixation probabilities greatly differ, the much larger input of deleterious relative to beneficial mutations suggests that this phenomenon could be important. We here examine how the fixation of mildly deleterious mutations affects levels and patterns of polymorphism at linked sites-both in the presence and absence of interference amongst deleterious mutations-and how this class of sites may contribute to divergence between-populations and species. We find that, while deleterious fixations are unlikely to represent a significant proportion of outliers in polymorphism-based genomic scans within populations, minor shifts in the frequencies of deleterious mutations can influence the proportions of private variants and the value of FST after a recent population split. As sites subject to deleterious mutations are necessarily found in functional genomic regions, interpretations in terms of recurrent positive selection may require reconsideration.
Collapse
Affiliation(s)
- Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| | - Brian Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Emma K Howell
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| | - Michael Lynch
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
- Center for Mechanisms of Evolution, The Biodesign Institute, Arizona State University, Tempe, AZ 85287, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| |
Collapse
|
13
|
Charlesworth B, Jensen JD. Effects of Selection at Linked Sites on Patterns of Genetic Variability. ANNUAL REVIEW OF ECOLOGY, EVOLUTION, AND SYSTEMATICS 2021; 52:177-197. [PMID: 37089401 PMCID: PMC10120885 DOI: 10.1146/annurev-ecolsys-010621-044528] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Patterns of variation and evolution at a given site in a genome can be strongly influenced by the effects of selection at genetically linked sites. In particular, the recombination rates of genomic regions correlate with their amount of within-population genetic variability, the degree to which the frequency distributions of DNA sequence variants differ from their neutral expectations, and the levels of adaptation of their functional components. We review the major population genetic processes that are thought to lead to these patterns, focusing on their effects on patterns of variability: selective sweeps, background selection, associative overdominance, and Hill–Robertson interference among deleterious mutations. We emphasize the difficulties in distinguishing among the footprints of these processes and disentangling them from the effects of purely demographic factors such as population size changes. We also discuss how interactions between selective and demographic processes can significantly affect patterns of variability within genomes.
Collapse
Affiliation(s)
- Brian Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3FL, United Kingdom
| | - Jeffrey D. Jensen
- School of Life Sciences, Arizona State University, Tempe, Arizona 85281, USA
| |
Collapse
|
14
|
Bisschop G, Lohse K, Setter D. Sweeps in time: leveraging the joint distribution of branch lengths. Genetics 2021; 219:iyab119. [PMID: 34849880 PMCID: PMC8633083 DOI: 10.1093/genetics/iyab119] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 07/10/2021] [Indexed: 11/14/2022] Open
Abstract
Current methods of identifying positively selected regions in the genome are limited in two key ways: the underlying models cannot account for the timing of adaptive events and the comparison between models of selective sweeps and sequence data is generally made via simple summaries of genetic diversity. Here, we develop a tractable method of describing the effect of positive selection on the genealogical histories in the surrounding genome, explicitly modeling both the timing and context of an adaptive event. In addition, our framework allows us to go beyond analyzing polymorphism data via the site frequency spectrum or summaries thereof and instead leverage information contained in patterns of linked variants. Tests on both simulations and a human data example, as well as a comparison to SweepFinder2, show that even with very small sample sizes, our analytic framework has higher power to identify old selective sweeps and to correctly infer both the time and strength of selection. Finally, we derived the marginal distribution of genealogical branch lengths at a locus affected by selection acting at a linked site. This provides a much-needed link between our analytic understanding of the effects of sweeps on sequence variation and recent advances in simulation and heuristic inference procedures that allow researchers to examine the sequence of genealogical histories along the genome.
Collapse
Affiliation(s)
- Gertjan Bisschop
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Konrad Lohse
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Derek Setter
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3FL, UK
| |
Collapse
|
15
|
Abstract
Drosophila melanogaster, a small dipteran of African origin, represents one of the best-studied model organisms. Early work in this system has uniquely shed light on the basic principles of genetics and resulted in a versatile collection of genetic tools that allow to uncover mechanistic links between genotype and phenotype. Moreover, given its worldwide distribution in diverse habitats and its moderate genome-size, Drosophila has proven very powerful for population genetics inference and was one of the first eukaryotes whose genome was fully sequenced. In this book chapter, we provide a brief historical overview of research in Drosophila and then focus on recent advances during the genomic era. After describing different types and sources of genomic data, we discuss mechanisms of neutral evolution including the demographic history of Drosophila and the effects of recombination and biased gene conversion. Then, we review recent advances in detecting genome-wide signals of selection, such as soft and hard selective sweeps. We further provide a brief introduction to background selection, selection of noncoding DNA and codon usage and focus on the role of structural variants, such as transposable elements and chromosomal inversions, during the adaptive process. Finally, we discuss how genomic data helps to dissect neutral and adaptive evolutionary mechanisms that shape genetic and phenotypic variation in natural populations along environmental gradients. In summary, this book chapter serves as a starting point to Drosophila population genomics and provides an introduction to the system and an overview to data sources, important population genetic concepts and recent advances in the field.
Collapse
|
16
|
Morales-Arce AY, Sabin SJ, Stone AC, Jensen JD. The population genomics of within-host Mycobacterium tuberculosis. Heredity (Edinb) 2020; 126:1-9. [PMID: 33060846 DOI: 10.1038/s41437-020-00377-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 10/02/2020] [Accepted: 10/03/2020] [Indexed: 11/09/2022] Open
Abstract
Recent progress in genomic sequencing from patient samples has allowed for the first detailed insight into the within-host genetic diversity of Mycobacterium tuberculosis (M.TB), revealing remarkably low levels of variation. While this has often been attributed to low mutation rates, other factors have been described, including resistance evolution (i.e., selective sweeps), widespread purifying and background selection, and, more recently, progeny skew. Here we review recent findings pertaining to the processes governing the evolutionary dynamics of M.TB, discuss their implications for improving our understanding of this important human pathogen, and make recommendations for future work. Significantly, this emerging evolutionary framework involving the joint estimation of demographic, selective, and reproductive processes is forming a new paradigm for the study of within-host pathogen evolution that will be widely applicable across organisms.
Collapse
Affiliation(s)
- Ana Y Morales-Arce
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA.
| | - Susanna J Sabin
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| | - Anne C Stone
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA.,School of Human Evolution and Social Change, Arizona State University, Tempe, AZ, USA
| | - Jeffrey D Jensen
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA. .,School of Life Sciences, Arizona State University, Tempe, AZ, USA.
| |
Collapse
|
17
|
Horscroft C, Ennis S, Pengelly RJ, Sluckin TJ, Collins A. Sequencing era methods for identifying signatures of selection in the genome. Brief Bioinform 2020; 20:1997-2008. [PMID: 30053138 DOI: 10.1093/bib/bby064] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Revised: 05/16/2018] [Indexed: 12/12/2022] Open
Abstract
Insights into genetic loci which are under selection and their functional roles contribute to increased understanding of the patterns of phenotypic variation we observe today. The availability of whole-genome sequence data, for humans and other species, provides opportunities to investigate adaptation and evolution at unprecedented resolution. Many analytical methods have been developed to interrogate these large data sets and characterize signatures of selection in the genome. We review here recently developed methods and consider the impact of increased computing power and data availability on the detection of selection signatures. Consideration of demography, recombination and other confounding factors is important, and use of a range of methods in combination is a powerful route to resolving different forms of selection in genome sequence data. Overall, a substantial improvement in methods for application to whole-genome sequencing is evident, although further work is required to develop robust and computationally efficient approaches which may increase reproducibility across studies.
Collapse
Affiliation(s)
- Clare Horscroft
- Genetic Epidemiology and Bioinformatics, Faculty of Medicine, University of Southampton, Duthie Building (808), Tremona Road, Southampton, UK.,Institute for Life Sciences, University of Southampton, Life Sciences Building (85), Highfield, Southampton, UK
| | - Sarah Ennis
- Genetic Epidemiology and Bioinformatics, Faculty of Medicine, University of Southampton, Duthie Building (808), Tremona Road, Southampton, UK.,Institute for Life Sciences, University of Southampton, Life Sciences Building (85), Highfield, Southampton, UK
| | - Reuben J Pengelly
- Genetic Epidemiology and Bioinformatics, Faculty of Medicine, University of Southampton, Duthie Building (808), Tremona Road, Southampton, UK.,Institute for Life Sciences, University of Southampton, Life Sciences Building (85), Highfield, Southampton, UK
| | - Timothy J Sluckin
- Institute for Life Sciences, University of Southampton, Life Sciences Building (85), Highfield, Southampton, UK.,Mathematical Sciences, University of Southampton, Highfield, Southampton, UK
| | - Andrew Collins
- Genetic Epidemiology and Bioinformatics, Faculty of Medicine, University of Southampton, Duthie Building (808), Tremona Road, Southampton, UK.,Institute for Life Sciences, University of Southampton, Life Sciences Building (85), Highfield, Southampton, UK
| |
Collapse
|
18
|
Harris RB, Jensen JD. Considering Genomic Scans for Selection as Coalescent Model Choice. Genome Biol Evol 2020; 12:871-877. [PMID: 32396636 PMCID: PMC7313662 DOI: 10.1093/gbe/evaa093] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/06/2020] [Indexed: 12/17/2022] Open
Abstract
First inspired by the seminal work of Lewontin and Krakauer (1973. Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics 74(1):175-195.) and Maynard Smith and Haigh (1974. The hitch-hiking effect of a favourable gene. Genet Res. 23(1):23-35.), genomic scans for positive selection remain a widely utilized tool in modern population genomic analysis. Yet, the relative frequency and genomic impact of selective sweeps have remained a contentious point in the field for decades, largely owing to an inability to accurately identify their presence and quantify their effects-with current methodologies generally being characterized by low true-positive rates and/or high false-positive rates under many realistic demographic models. Most of these approaches are based on Wright-Fisher assumptions and the Kingman coalescent and generally rely on detecting outlier regions which do not conform to these neutral expectations. However, previous theoretical results have demonstrated that selective sweeps are well characterized by an alternative class of model known as the multiple-merger coalescent. Taken together, this suggests the possibility of not simply identifying regions which reject the Kingman, but rather explicitly testing the relative fit of a genomic window to the multiple-merger coalescent. We describe the advantages of such an approach, which owe to the branching structure differentiating selective and neutral models, and demonstrate improved power under certain demographic scenarios relative to a commonly used approach. However, regions of the demographic parameter space continue to exist in which neither this approach nor existing methodologies have sufficient power to detect selective sweeps.
Collapse
|
19
|
Genome-Wide Association Study Reveals Novel Candidate Genes Associated with Productivity and Disease Resistance to Moniliophthora spp. in Cacao ( Theobroma cacao L.). G3-GENES GENOMES GENETICS 2020; 10:1713-1725. [PMID: 32169867 PMCID: PMC7202020 DOI: 10.1534/g3.120.401153] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Cacao (Theobroma cacao L.), the source of chocolate, is one of the most important commodity products worldwide that helps improve the economic livelihood of farmers. Diseases like frosty pod rot caused by Moniliophthora roreri and witches’ broom caused by Moniliophthora perniciosa limit the cacao productivity, this can be solved by using resistant varieties. In the current study, we sequenced 229 cacao accessions using genotyping-by-sequencing to examine the genetic diversity and population structure employing 9,003 and 8,131 single nucleotide polymorphisms recovered by mapping against two cacao genomes (Criollo B97-61/B2 v2 and Matina 1-6 v1.1). In the phenotypic evaluation, three promising accessions for productivity and 10 with good tolerance to the frosty pod rot and witches’ broom diseases were found. A genome-wide association study was performed on 102 accessions, discovering two genes associated with productivity and seven to disease resistance. The results enriched the knowledge of the genetic regions associated with important cacao traits that can have significant implications for conservation and breeding strategies like marker-assisted selection.
Collapse
|
20
|
Johri P, Charlesworth B, Jensen JD. Toward an Evolutionarily Appropriate Null Model: Jointly Inferring Demography and Purifying Selection. Genetics 2020; 215:173-192. [PMID: 32152045 PMCID: PMC7198275 DOI: 10.1534/genetics.119.303002] [Citation(s) in RCA: 89] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 03/05/2020] [Indexed: 01/27/2023] Open
Abstract
The question of the relative evolutionary roles of adaptive and nonadaptive processes has been a central debate in population genetics for nearly a century. While advances have been made in the theoretical development of the underlying models, and statistical methods for estimating their parameters from large-scale genomic data, a framework for an appropriate null model remains elusive. A model incorporating evolutionary processes known to be in constant operation, genetic drift (as modulated by the demographic history of the population) and purifying selection, is lacking. Without such a null model, the role of adaptive processes in shaping within- and between-population variation may not be accurately assessed. Here, we investigate how population size changes and the strength of purifying selection affect patterns of variation at "neutral" sites near functional genomic components. We propose a novel statistical framework for jointly inferring the contribution of the relevant selective and demographic parameters. By means of extensive performance analyses, we quantify the utility of the approach, identify the most important statistics for parameter estimation, and compare the results with existing methods. Finally, we reanalyze genome-wide population-level data from a Zambian population of Drosophila melanogaster, and find that it has experienced a much slower rate of population growth than was inferred when the effects of purifying selection were neglected. Our approach represents an appropriate null model, against which the effects of positive selection can be assessed.
Collapse
Affiliation(s)
- Parul Johri
- School of Life Sciences, Arizona State University, Tempe, Arizona 85287
| | - Brian Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, EH9 3FL, United Kingdom
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, Arizona 85287
| |
Collapse
|
21
|
Apata M, Pfeifer SP. Recent population genomic insights into the genetic basis of arsenic tolerance in humans: the difficulties of identifying positively selected loci in strongly bottlenecked populations. Heredity (Edinb) 2020; 124:253-262. [PMID: 31776483 PMCID: PMC6972707 DOI: 10.1038/s41437-019-0285-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Revised: 10/22/2019] [Accepted: 11/13/2019] [Indexed: 02/06/2023] Open
Abstract
Recent advances in genomics have enabled researchers to shed light on the evolutionary processes driving human adaptation, by revealing the genetic architectures underlying traits ranging from lactase persistence, to skin pigmentation, to hypoxic response, to arsenic tolerance. Complicating the identification of targets of positive selection in modern human populations is their complex demographic history, characterized by population bottlenecks and expansions, population structure, migration, and admixture. In particular, founder effects and recent strong population size reductions, such as those experienced by the indigenous peoples of the Americas, have severe impacts on genetic variation that can lead to the accumulation of large allele frequency differences between populations due to genetic drift rather than natural selection. While distinguishing the effects of demographic history from selection remains challenging, neglecting neutral processes can lead to the incorrect identification of candidate loci. We here review the recent population genomic insights into the genetic basis of arsenic tolerance in Andean populations, and utilize this example to highlight both the difficulties pertaining to the identification of local adaptations in strongly bottlenecked populations, as well as the importance of controlling for demographic history in selection scans.
Collapse
Affiliation(s)
- Mario Apata
- Center for Evolution & Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, 85821, USA
| | - Susanne P Pfeifer
- Center for Evolution & Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, 85821, USA.
| |
Collapse
|
22
|
Koropoulis A, Alachiotis N, Pavlidis P. Detecting Positive Selection in Populations Using Genetic Data. Methods Mol Biol 2020; 2090:87-123. [PMID: 31975165 DOI: 10.1007/978-1-0716-0199-0_5] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
High-throughput genomic sequencing allows to disentangle the evolutionary forces acting in populations. Among evolutionary forces, positive selection has received a lot of attention because it is related to the adaptation of populations in their environments, both biotic and abiotic. Positive selection, also known as Darwinian selection, occurs when an allele is favored by natural selection. The frequency of the favored allele increases in the population and, due to genetic hitchhiking, neighboring linked variation diminishes, creating so-called selective sweeps. Such a process leaves traces in genomes that can be detected in a future time point. Detecting traces of positive selection in genomes is achieved by searching for signatures introduced by selective sweeps, such as regions of reduced variation, a specific shift of the site frequency spectrum, and particular linkage disequilibrium (LD) patterns in the region. A variety of approaches can be used for detecting selective sweeps, ranging from simple implementations that compute summary statistics to more advanced statistical approaches, e.g., Bayesian approaches, maximum-likelihood-based methods, and machine learning methods. In this chapter, we discuss selective sweep detection methodologies on the basis of their capacity to analyze whole genomes or just subgenomic regions, and on the specific polymorphism patterns they exploit as selective sweep signatures. We also summarize the results of comparisons among five open-source software releases (SweeD, SweepFinder, SweepFinder2, OmegaPlus, and RAiSD) regarding sensitivity, specificity, and execution times. Furthermore, we test and discuss machine learning methods and present a thorough performance analysis. In equilibrium neutral models or mild bottlenecks, most methods are able to detect selective sweeps accurately. Methods and tools that rely on linkage disequilibrium (LD) rather than single SNPs exhibit higher true positive rates than the site frequency spectrum (SFS)-based methods under the model of a single sweep or recurrent hitchhiking. However, their false positive rate is elevated when a misspecified demographic model is used to build the distribution of the statistic under the null hypothesis. Both LD and SFS-based approaches suffer from decreased accuracy on localizing the true target of selection in bottleneck scenarios. Furthermore, we present an extensive analysis of the effects of gene flow on selective sweep detection, a problem that has been understudied in selective sweep literature.
Collapse
Affiliation(s)
- Angelos Koropoulis
- Institute of Computer Science, Foundation for Research and Technology Hellas, Heraklion, Greece
- Computer Science Department, University of Crete, Crete, Heraklion, Greece
| | - Nikolaos Alachiotis
- Institute of Computer Science, Foundation for Research and Technology Hellas, Heraklion, Greece
| | - Pavlos Pavlidis
- Institute of Computer Science, Foundation for Research and Technology Hellas, Heraklion, Greece.
| |
Collapse
|
23
|
Wegary D, Teklewold A, Prasanna BM, Ertiro BT, Alachiotis N, Negera D, Awas G, Abakemal D, Ogugo V, Gowda M, Semagn K. Molecular diversity and selective sweeps in maize inbred lines adapted to African highlands. Sci Rep 2019; 9:13490. [PMID: 31530852 PMCID: PMC6748982 DOI: 10.1038/s41598-019-49861-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 08/28/2019] [Indexed: 11/08/2022] Open
Abstract
Little is known on maize germplasm adapted to the African highland agro-ecologies. In this study, we analyzed high-density genotyping by sequencing (GBS) data of 298 African highland adapted maize inbred lines to (i) assess the extent of genetic purity, genetic relatedness, and population structure, and (ii) identify genomic regions that have undergone selection (selective sweeps) in response to adaptation to highland environments. Nearly 91% of the pairs of inbred lines differed by 30-36% of the scored alleles, but only 32% of the pairs of the inbred lines had relative kinship coefficient <0.050, which suggests the presence of substantial redundancy in allelic composition that may be due to repeated use of fewer genetic backgrounds (source germplasm) during line development. Results from different genetic relatedness and population structure analyses revealed three different groups, which generally agrees with pedigree information and breeding history, but less so by heterotic groups and endosperm modification. We identified 944 single nucleotide polymorphic (SNP) markers that fell within 22 selective sweeps that harbored 265 protein-coding candidate genes of which some of the candidate genes had known functions. Details of the candidate genes with known functions and differences in nucleotide diversity among groups predicted based on multivariate methods have been discussed.
Collapse
Affiliation(s)
- Dagne Wegary
- International Maize and Wheat Improvement Center (CIMMYT) - Ethiopia Office, ILRI Campus, CMC Road, Gurd Sholla, P.O. Box 5689, Addis Ababa, Ethiopia
| | - Adefris Teklewold
- International Maize and Wheat Improvement Center (CIMMYT) - Ethiopia Office, ILRI Campus, CMC Road, Gurd Sholla, P.O. Box 5689, Addis Ababa, Ethiopia.
| | - Boddupalli M Prasanna
- International Maize and Wheat Improvement Center (CIMMYT), ICRAF House, United Nations Avenue, Gigiri, P.O. Box 1041-00621, Nairobi, Kenya
| | - Berhanu T Ertiro
- Bako National Maize Research Center, Ethiopian Institute of Agricultural Research (EIAR), Addis Ababa, Ethiopia
| | - Nikolaos Alachiotis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, Nikolaou Plastira 100, 70013, Heraklion, Crete, Greece
| | - Demewez Negera
- International Maize and Wheat Improvement Center (CIMMYT) - Ethiopia Office, ILRI Campus, CMC Road, Gurd Sholla, P.O. Box 5689, Addis Ababa, Ethiopia
| | - Geremew Awas
- International Maize and Wheat Improvement Center (CIMMYT) - Ethiopia Office, ILRI Campus, CMC Road, Gurd Sholla, P.O. Box 5689, Addis Ababa, Ethiopia
| | - Demissew Abakemal
- Ambo Agricultural Research Center, P.O. Box 37, West Shoa, Ambo, Ethiopia
| | - Veronica Ogugo
- International Maize and Wheat Improvement Center (CIMMYT), ICRAF House, United Nations Avenue, Gigiri, P.O. Box 1041-00621, Nairobi, Kenya
| | - Manje Gowda
- International Maize and Wheat Improvement Center (CIMMYT), ICRAF House, United Nations Avenue, Gigiri, P.O. Box 1041-00621, Nairobi, Kenya
| | - Kassa Semagn
- International Maize and Wheat Improvement Center (CIMMYT), ICRAF House, United Nations Avenue, Gigiri, P.O. Box 1041-00621, Nairobi, Kenya.
- Africa Rice Center (AfricaRice), M'bé Research Station, 01 B.P. 2551, Bouaké 01, Côte d'Ivoire.
| |
Collapse
|
24
|
The population genetics of crypsis in vertebrates: recent insights from mice, hares, and lizards. Heredity (Edinb) 2019; 124:1-14. [PMID: 31399719 PMCID: PMC6906368 DOI: 10.1038/s41437-019-0257-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 07/16/2019] [Accepted: 07/25/2019] [Indexed: 12/22/2022] Open
Abstract
By combining well-established population genetic theory with high-throughput sequencing data from natural populations, major strides have recently been made in understanding how, why, and when vertebrate populations evolve crypsis. Here, we focus on background matching, a particular facet of crypsis that involves the ability of an organism to conceal itself through matching its color to the surrounding environment. While interesting in and of itself, the study of this phenotype has also provided fruitful population genetic insights into the interplay of strong positive selection with other evolutionary processes. Specifically, and predicated upon the findings of previous candidate gene association studies, a primary focus of this recent literature involves the realization that the inference of selection from DNA sequence data first requires a robust model of population demography in order to identify genomic regions which do not conform to neutral expectations. Moreover, these demographic estimates provide crucial information about the origin and timing of the onset of selective pressures associated with, for example, the colonization of a novel environment. Furthermore, such inference has revealed crypsis to be a particularly useful phenotype for investigating the interplay of migration and selection—with examples of gene flow constraining rates of adaptation, or alternatively providing the genetic variants that may ultimately sweep through the population. Here, we evaluate the underlying evidence, review the strengths and weaknesses of the many population genetic methodologies used in these studies, and discuss how these insights have aided our general understanding of the evolutionary process.
Collapse
|
25
|
Ndjiondjop MN, Alachiotis N, Pavlidis P, Goungoulou A, Kpeki SB, Zhao D, Semagn K. Comparisons of molecular diversity indices, selective sweeps and population structure of African rice with its wild progenitor and Asian rice. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2019; 132:1145-1158. [PMID: 30578434 PMCID: PMC6449321 DOI: 10.1007/s00122-018-3268-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Accepted: 12/11/2018] [Indexed: 05/20/2023]
Abstract
The extent of molecular diversity parameters across three rice species was compared using large germplasm collection genotyped with genomewide SNPs and SNPs that fell within selective sweep regions. Previous studies conducted on limited number of accessions have reported very low genetic variation in African rice (Oryza glaberrima Steud.) as compared to its wild progenitor (O. barthii A. Chev.) and to Asian rice (O. sativa L.). Here, we characterized a large collection of African rice and compared its molecular diversity indices and population structure with the two other species using genomewide single nucleotide polymorphisms (SNPs) and SNPs that mapped within selective sweeps. A total of 3245 samples representing African rice (2358), Asian rice (772) and O. barthii (115) were genotyped with 26,073 physically mapped SNPs. Using all SNPs, the level of marker polymorphism, average genetic distance and nucleotide diversity in African rice accounted for 59.1%, 63.2% and 37.1% of that of O. barthii, respectively. SNP polymorphism and overall nucleotide diversity of the African rice accounted for 20.1-32.1 and 16.3-37.3% of that of the Asian rice, respectively. We identified 780 SNPs that fell within 37 candidate selective sweeps in African rice, which were distributed across all 12 rice chromosomes. Nucleotide diversity of the African rice estimated from the 780 SNPs was 8.3 × 10-4, which is not only 20-fold smaller than the value estimated from all genomewide SNPs (π = 1.6 × 10-2), but also accounted for just 4.1%, 0.9% and 2.1% of that of O. barthii, lowland Asian rice and upland Asian rice, respectively. The genotype data generated for a large collection of rice accessions conserved at the AfricaRice genebank will be highly useful for the global rice community and promote germplasm use.
Collapse
Affiliation(s)
- Marie Noelle Ndjiondjop
- M'bé Research Station, Africa Rice Center (AfricaRice), 01 B.P. 2551, Bouaké 01, Côte d'Ivoire.
| | - Nikolaos Alachiotis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, Nikolaou Plastira 100, 70013, Heraklion, Crete, Greece
| | - Pavlos Pavlidis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, Nikolaou Plastira 100, 70013, Heraklion, Crete, Greece
| | - Alphonse Goungoulou
- M'bé Research Station, Africa Rice Center (AfricaRice), 01 B.P. 2551, Bouaké 01, Côte d'Ivoire
| | - Sèdjro Bienvenu Kpeki
- M'bé Research Station, Africa Rice Center (AfricaRice), 01 B.P. 2551, Bouaké 01, Côte d'Ivoire
| | - Dule Zhao
- M'bé Research Station, Africa Rice Center (AfricaRice), 01 B.P. 2551, Bouaké 01, Côte d'Ivoire
| | - Kassa Semagn
- M'bé Research Station, Africa Rice Center (AfricaRice), 01 B.P. 2551, Bouaké 01, Côte d'Ivoire.
| |
Collapse
|
26
|
Pfeifer SP, Laurent S, Sousa VC, Linnen CR, Foll M, Excoffier L, Hoekstra HE, Jensen JD. The Evolutionary History of Nebraska Deer Mice: Local Adaptation in the Face of Strong Gene Flow. Mol Biol Evol 2019; 35:792-806. [PMID: 29346646 PMCID: PMC5905656 DOI: 10.1093/molbev/msy004] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The interplay of gene flow, genetic drift, and local selective pressure is a dynamic process that has been well studied from a theoretical perspective over the last century. Wright and Haldane laid the foundation for expectations under an island-continent model, demonstrating that an island-specific beneficial allele may be maintained locally if the selection coefficient is larger than the rate of migration of the ancestral allele from the continent. Subsequent extensions of this model have provided considerably more insight. Yet, connecting theoretical results with empirical data has proven challenging, owing to a lack of information on the relationship between genotype, phenotype, and fitness. Here, we examine the demographic and selective history of deer mice in and around the Nebraska Sand Hills, a system in which variation at the Agouti locus affects cryptic coloration that in turn affects the survival of mice in their local habitat. We first genotyped 250 individuals from 11 sites along a transect spanning the Sand Hills at 660,000 single nucleotide polymorphisms across the genome. Using these genomic data, we found that deer mice first colonized the Sand Hills following the last glacial period. Subsequent high rates of gene flow have served to homogenize the majority of the genome between populations on and off the Sand Hills, with the exception of the Agouti pigmentation locus. Furthermore, mutations at this locus are strongly associated with the pigment traits that are strongly correlated with local soil coloration and thus responsible for cryptic coloration.
Collapse
Affiliation(s)
- Susanne P Pfeifer
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.,School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ
| | - Stefan Laurent
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Vitor C Sousa
- Institute of Ecology & Evolution, University of Berne, Berne, Switzerland.,Centre for Ecology, Evolution and Environmental Changes, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
| | | | - Matthieu Foll
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Laurent Excoffier
- Institute of Ecology & Evolution, University of Berne, Berne, Switzerland
| | - Hopi E Hoekstra
- Department of Organismic & Evolutionary Biology and Molecular & Cellular Biology, Museum of Comparative Zoology, Howard Hughes Medical Institute, Harvard University, Cambridge, MA
| | - Jeffrey D Jensen
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.,School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ
| |
Collapse
|
27
|
Harris RB, Sackman A, Jensen JD. On the unfounded enthusiasm for soft selective sweeps II: Examining recent evidence from humans, flies, and viruses. PLoS Genet 2018; 14:e1007859. [PMID: 30592709 PMCID: PMC6336318 DOI: 10.1371/journal.pgen.1007859] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2018] [Revised: 01/17/2019] [Accepted: 11/28/2018] [Indexed: 12/13/2022] Open
Abstract
Since the initial description of the genomic patterns expected under models of positive selection acting on standing genetic variation and on multiple beneficial mutations—so-called soft selective sweeps—researchers have sought to identify these patterns in natural population data. Indeed, over the past two years, large-scale data analyses have argued that soft sweeps are pervasive across organisms of very different effective population size and mutation rate—humans, Drosophila, and HIV. Yet, others have evaluated the relevance of these models to natural populations, as well as the identifiability of the models relative to other known population-level processes, arguing that soft sweeps are likely to be rare. Here, we look to reconcile these opposing results by carefully evaluating three recent studies and their underlying methodologies. Using population genetic theory, as well as extensive simulation, we find that all three examples are prone to extremely high false-positive rates, incorrectly identifying soft sweeps under both hard sweep and neutral models. Furthermore, we demonstrate that well-fit demographic histories combined with rare hard sweeps serve as the more parsimonious explanation. These findings represent a necessary response to the growing tendency of invoking parameter-heavy, assumption-laden models of pervasive positive selection, and neglecting best practices regarding the construction of proper demographic null models. A long-standing debate in evolutionary biology revolves around the role of selective vs. stochastic processes in driving molecular evolution and shaping genetic variation. With the advent of genomics, genome-wide polymorphism data have been utilized to characterize these processes, with a major interest in describing the fraction of genomic variation shaped by positive selection. These genomic scans were initially focused around a hard sweep model, in which selection acts upon rare, newly arising beneficial mutations. Recent years have seen the description of sweeps occurring from both standing and rapidly recurring beneficial mutations, collectively known as soft sweeps. However, common to both hard and soft sweeps is the difficulty in distinguishing these effects from neutral demographic patterns, and disentangling these processes has remained an important field of study within population genetics. Despite this, there is a recent and troubling tendency to neglect these demographic considerations, and to naively fit sweep models to genomic data. Recent realizations of such efforts have resulted in the claim that soft sweeps play a dominant role in shaping genomic variation and in driving adaptation across diverse branches of the tree of life. Here, we reanalyze these findings and demonstrate that a more careful consideration of neutral processes results in highly differing conclusions.
Collapse
Affiliation(s)
- Rebecca B. Harris
- School of Life Sciences, Arizona State University, Tempe, AZ, United States of America
| | - Andrew Sackman
- School of Life Sciences, Arizona State University, Tempe, AZ, United States of America
| | - Jeffrey D. Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ, United States of America
- * E-mail:
| |
Collapse
|
28
|
Mattle-Greminger MP, Bilgin Sonay T, Nater A, Pybus M, Desai T, de Valles G, Casals F, Scally A, Bertranpetit J, Marques-Bonet T, van Schaik CP, Anisimova M, Krützen M. Genomes reveal marked differences in the adaptive evolution between orangutan species. Genome Biol 2018; 19:193. [PMID: 30428903 PMCID: PMC6237011 DOI: 10.1186/s13059-018-1562-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Accepted: 10/09/2018] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Integrating demography and adaptive evolution is pivotal to understanding the evolutionary history and conservation of great apes. However, little is known about the adaptive evolution of our closest relatives, in particular if and to what extent adaptions to environmental differences have occurred. Here, we used whole-genome sequencing data from critically endangered orangutans from North Sumatra (Pongo abelii) and Borneo (P. pygmaeus) to investigate adaptive responses of each species to environmental differences during the Pleistocene. RESULTS Taking into account the markedly disparate demographic histories of each species after their split ~ 1 Ma ago, we show that persistent environmental differences on each island had a strong impact on the adaptive evolution of the genus Pongo. Across a range of tests for positive selection, we find a consistent pattern of between-island and species differences. In the more productive Sumatran environment, the most notable signals of positive selection involve genes linked to brain and neuronal development, learning, and glucose metabolism. On Borneo, however, positive selection comprised genes involved in lipid metabolism, as well as cardiac and muscle activities. CONCLUSIONS We find strikingly different sets of genes appearing to have evolved under strong positive selection in each species. In Sumatran orangutans, selection patterns were congruent with well-documented cognitive and behavioral differences between the species, such as a larger and more complex cultural repertoire and higher degrees of sociality. However, in Bornean orangutans, selective responses to fluctuating environmental conditions appear to have produced physiological adaptations to generally lower and temporally more unpredictable food supplies.
Collapse
Affiliation(s)
- Maja P. Mattle-Greminger
- Evolutionary Genetics Group, Department of Anthropology, University of Zurich, Winterthurerstrasse 190, 8057 Zürich, Switzerland
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Winterthurerstrasse 190, 8057 Zürich, Switzerland
| | - Tugce Bilgin Sonay
- Evolutionary Genetics Group, Department of Anthropology, University of Zurich, Winterthurerstrasse 190, 8057 Zürich, Switzerland
- Swiss Institute of Bioinformatics, Quartier Sorge - Batiment Genopode, 1015 Lausanne, Switzerland
| | - Alexander Nater
- Evolutionary Genetics Group, Department of Anthropology, University of Zurich, Winterthurerstrasse 190, 8057 Zürich, Switzerland
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Winterthurerstrasse 190, 8057 Zürich, Switzerland
- Lehrstuhl für Zoologie und Evolutionsbiologie, Department of Biology, University of Konstanz, Universitätsstrasse 10, 78457 Konstanz, Germany
| | - Marc Pybus
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Doctor Aiguader 88, 08003 Barcelona, Spain
| | - Tariq Desai
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH UK
| | - Guillem de Valles
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Doctor Aiguader 88, 08003 Barcelona, Spain
| | - Ferran Casals
- Servei de Genòmica, Universitat Pompeu Fabra, Doctor Aiguader 88, 08003 Barcelona, Spain
| | - Aylwyn Scally
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH UK
| | - Jaume Bertranpetit
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Doctor Aiguader 88, 08003 Barcelona, Spain
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra, Doctor Aiguader 88, Barcelona, Spain
- Catalan Institution of Research and Advanced Studies (ICREA), Passeig de Lluís Companys 23, Barcelona, Spain
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Edifici ICTA-ICP, c/ Columnes s/n, Cerdanyola del Vallès, Barcelona, Spain
| | - Carel P. van Schaik
- Evolutionary Genetics Group, Department of Anthropology, University of Zurich, Winterthurerstrasse 190, 8057 Zürich, Switzerland
| | - Maria Anisimova
- Swiss Institute of Bioinformatics, Quartier Sorge - Batiment Genopode, 1015 Lausanne, Switzerland
- Institute of Applied Simulations, School of Life Sciences and Facility Management, Zurich University of Applied Sciences ZHAW, Einsiedlerstrasse 31a, 8820 Wädenswil, Switzerland
| | - Michael Krützen
- Evolutionary Genetics Group, Department of Anthropology, University of Zurich, Winterthurerstrasse 190, 8057 Zürich, Switzerland
| |
Collapse
|
29
|
Complex Haplotypes of GSTM1 Gene Deletions Harbor Signatures of a Selective Sweep in East Asian Populations. G3-GENES GENOMES GENETICS 2018; 8:2953-2966. [PMID: 30061374 PMCID: PMC6118300 DOI: 10.1534/g3.118.200462] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
The deletion of the metabolizing Glutathione S-transferase Mu 1 (GSTM1) gene has been associated with multiple cancers, metabolic and autoimmune disorders, as well as drug response. It is unusually common, with allele frequency reaching up to 75% in some human populations. Such high allele frequency of a derived allele with apparent impact on an otherwise conserved gene is a rare phenomenon. To investigate the evolutionary history of this locus, we analyzed 310 genomes using population genetics tools. Our analysis revealed a surprising lack of linkage disequilibrium between the deletion and the flanking single nucleotide variants in this locus. Tests that measure extended homozygosity and rapid change in allele frequency revealed signatures of an incomplete sweep in the locus. Using empirical approaches, we identified the Tanuki haplogroup, which carries the GSTM1 deletion and is found in approximately 70% of East Asian chromosomes. This haplogroup has rapidly increased in frequency in East Asian populations, contributing to a high population differentiation among continental human groups. We showed that extended homozygosity and population differentiation for this haplogroup is incompatible with simulated neutral expectations in East Asian populations. In parallel, we found that the Tanuki haplogroup is significantly associated with the expression levels of other GSTM genes. Collectively, our results suggest that standing variation in this locus has likely undergone an incomplete sweep in East Asia with regulatory impact on multiple GSTM genes. Our study provides the necessary framework for further studies to elucidate the evolutionary reasons that maintain disease-susceptibility variants in the GSTM1 locus.
Collapse
|
30
|
Hartmann FE, McDonald BA, Croll D. Genome-wide evidence for divergent selection between populations of a major agricultural pathogen. Mol Ecol 2018; 27:2725-2741. [PMID: 29729657 PMCID: PMC6032900 DOI: 10.1111/mec.14711] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2017] [Revised: 04/05/2018] [Accepted: 04/17/2018] [Indexed: 12/30/2022]
Abstract
The genetic and environmental homogeneity in agricultural ecosystems is thought to impose strong and uniform selection pressures. However, the impact of this selection on plant pathogen genomes remains largely unknown. We aimed to identify the proportion of the genome and the specific gene functions under positive selection in populations of the fungal wheat pathogen Zymoseptoria tritici. First, we performed genome scans in four field populations that were sampled from different continents and on distinct wheat cultivars to test which genomic regions are under recent selection. Based on extended haplotype homozygosity and composite likelihood ratio tests, we identified 384 and 81 selective sweeps affecting 4% and 0.5% of the 35 Mb core genome, respectively. We found differences both in the number and the position of selective sweeps across the genome between populations. Using a XtX‐based outlier detection approach, we identified 51 extremely divergent genomic regions between the allopatric populations, suggesting that divergent selection led to locally adapted pathogen populations. We performed an outlier detection analysis between two sympatric populations infecting two different wheat cultivars to identify evidence for host‐driven selection. Selective sweep regions harboured genes that are likely to play a role in successfully establishing host infections. We also identified secondary metabolite gene clusters and an enrichment in genes encoding transporter and protein localization functions. The latter gene functions mediate responses to environmental stress, including interactions with the host. The distinct gene functions under selection indicate that both local host genotypes and abiotic factors contributed to local adaptation.
Collapse
Affiliation(s)
- Fanny E Hartmann
- Plant Pathology, Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland.,Ecologie Systématique Evolution, Univ. Paris-Sud, AgroParisTech, CNRS, Université Paris-Saclay, Orsay, France
| | - Bruce A McDonald
- Plant Pathology, Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| | - Daniel Croll
- Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchâtel, Neuchâtel, Switzerland
| |
Collapse
|
31
|
Ostrander EA, Wayne RK, Freedman AH, Davis BW. Demographic history, selection and functional diversity of the canine genome. Nat Rev Genet 2017; 18:705-720. [DOI: 10.1038/nrg.2017.67] [Citation(s) in RCA: 101] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
32
|
Range Expansion Compromises Adaptive Evolution in an Outcrossing Plant. Curr Biol 2017; 27:2544-2551.e4. [DOI: 10.1016/j.cub.2017.07.007] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Revised: 05/22/2017] [Accepted: 07/04/2017] [Indexed: 01/04/2023]
|
33
|
Refining the Use of Linkage Disequilibrium as a Robust Signature of Selective Sweeps. Genetics 2017; 203:1807-25. [PMID: 27516617 DOI: 10.1534/genetics.115.185900] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Accepted: 04/05/2016] [Indexed: 12/12/2022] Open
Abstract
During a selective sweep, characteristic patterns of linkage disequilibrium can arise in the genomic region surrounding a selected locus. These have been used to infer past selective sweeps. However, the recombination rate is known to vary substantially along the genome for many species. We here investigate the effectiveness of current (Kelly's [Formula: see text] and [Formula: see text]) and novel statistics at inferring hard selective sweeps based on linkage disequilibrium distortions under different conditions, including a human-realistic demographic model and recombination rate variation. When the recombination rate is constant, Kelly's [Formula: see text] offers high power, but is outperformed by a novel statistic that we test, which we call [Formula: see text] We also find this statistic to be effective at detecting sweeps from standing variation. When recombination rate fluctuations are included, there is a considerable reduction in power for all linkage disequilibrium-based statistics. However, this can largely be reversed by appropriately controlling for expected linkage disequilibrium using a genetic map. To further test these different methods, we perform selection scans on well-characterized HapMap data, finding that all three statistics-[Formula: see text] Kelly's [Formula: see text] and [Formula: see text]-are able to replicate signals at regions previously identified as selection candidates based on population differentiation or the site frequency spectrum. While [Formula: see text] replicates most candidates when recombination map data are not available, the [Formula: see text] and [Formula: see text] statistics are more successful when recombination rate variation is controlled for. Given both this and their higher power in simulations of selective sweeps, these statistics are preferred when information on local recombination rate variation is available.
Collapse
|
34
|
Carvajal-Rodríguez A. HacDivSel: Two new methods (haplotype-based and outlier-based) for the detection of divergent selection in pairs of populations. PLoS One 2017; 12:e0175944. [PMID: 28423003 PMCID: PMC5397020 DOI: 10.1371/journal.pone.0175944] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2016] [Accepted: 04/03/2017] [Indexed: 01/10/2023] Open
Abstract
The detection of genomic regions involved in local adaptation is an important topic in current population genetics. There are several detection strategies available depending on the kind of genetic and demographic information at hand. A common drawback is the high risk of false positives. In this study we introduce two complementary methods for the detection of divergent selection from populations connected by migration. Both methods have been developed with the aim of being robust to false positives. The first method combines haplotype information with inter-population differentiation (FST). Evidence of divergent selection is concluded only when both the haplotype pattern and the FST value support it. The second method is developed for independently segregating markers i.e. there is no haplotype information. In this case, the power to detect selection is attained by developing a new outlier test based on detecting a bimodal distribution. The test computes the FST outliers and then assumes that those of interest would have a different mode. We demonstrate the utility of the two methods through simulations and the analysis of real data. The simulation results showed power ranging from 60-95% in several of the scenarios whilst the false positive rate was controlled below the nominal level. The analysis of real samples consisted of phased data from the HapMap project and unphased data from intertidal marine snail ecotypes. The results illustrate that the proposed methods could be useful for detecting locally adapted polymorphisms. The software HacDivSel implements the methods explained in this manuscript.
Collapse
|
35
|
Pavlidis P, Alachiotis N. A survey of methods and tools to detect recent and strong positive selection. ACTA ACUST UNITED AC 2017; 24:7. [PMID: 28405579 PMCID: PMC5385031 DOI: 10.1186/s40709-017-0064-0] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Accepted: 03/29/2017] [Indexed: 01/25/2023]
Abstract
Positive selection occurs when an allele is favored by natural selection. The frequency of the favored allele increases in the population and due to genetic hitchhiking the neighboring linked variation diminishes, creating so-called selective sweeps. Detecting traces of positive selection in genomes is achieved by searching for signatures introduced by selective sweeps, such as regions of reduced variation, a specific shift of the site frequency spectrum, and particular LD patterns in the region. A variety of methods and tools can be used for detecting sweeps, ranging from simple implementations that compute summary statistics such as Tajima's D, to more advanced statistical approaches that use combinations of statistics, maximum likelihood, machine learning etc. In this survey, we present and discuss summary statistics and software tools, and classify them based on the selective sweep signature they detect, i.e., SFS-based vs. LD-based, as well as their capacity to analyze whole genomes or just subgenomic regions. Additionally, we summarize the results of comparisons among four open-source software releases (SweeD, SweepFinder, SweepFinder2, and OmegaPlus) regarding sensitivity, specificity, and execution times. In equilibrium neutral models or mild bottlenecks, both SFS- and LD-based methods are able to detect selective sweeps accurately. Methods and tools that rely on LD exhibit higher true positive rates than SFS-based ones under the model of a single sweep or recurrent hitchhiking. However, their false positive rate is elevated when a misspecified demographic model is used to represent the null hypothesis. When the correct (or similar to the correct) demographic model is used instead, the false positive rates are considerably reduced. The accuracy of detecting the true target of selection is decreased in bottleneck scenarios. In terms of execution time, LD-based methods are typically faster than SFS-based methods, due to the nature of required arithmetic.
Collapse
Affiliation(s)
- Pavlos Pavlidis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, 70013 Crete, Greece
| | - Nikolaos Alachiotis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, 70013 Crete, Greece
| |
Collapse
|
36
|
Freedman AH, Lohmueller KE, Wayne RK. Evolutionary History, Selective Sweeps, and Deleterious Variation in the Dog. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2016. [DOI: 10.1146/annurev-ecolsys-121415-032155] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The dog is our oldest domesticate and has experienced a wide variety of demographic histories, including a bottleneck associated with domestication and individual bottlenecks associated with the formation of modern breeds. Admixture with gray wolves, and among dog breeds and populations, has also occurred throughout its history. Likewise, the intensity and focus of selection have varied, from an initial focus on traits enhancing cohabitation with humans, to more directed selection on specific phenotypic characteristics and behaviors. In this review, we summarize and synthesize genetic findings from genome-wide and complete genome studies that document the genomic consequences of demography and selection, including the effects on adaptive and deleterious variation. Consistent with the evolutionary history of the dog, signals of natural and artificial selection are evident in the dog genome. However, conclusions from studies of positive selection are fraught with the problem of false positives given that demographic history is often not taken into account.
Collapse
Affiliation(s)
- Adam H. Freedman
- Informatics Group, Faculty of Arts and Sciences, Harvard University, Cambridge, Massachusetts 02138
| | - Kirk E. Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095
| | - Robert K. Wayne
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095
| |
Collapse
|
37
|
Freedman AH, Schweizer RM, Ortega-Del Vecchyo D, Han E, Davis BW, Gronau I, Silva PM, Galaverni M, Fan Z, Marx P, Lorente-Galdos B, Ramirez O, Hormozdiari F, Alkan C, Vilà C, Squire K, Geffen E, Kusak J, Boyko AR, Parker HG, Lee C, Tadigotla V, Siepel A, Bustamante CD, Harkins TT, Nelson SF, Marques-Bonet T, Ostrander EA, Wayne RK, Novembre J. Demographically-Based Evaluation of Genomic Regions under Selection in Domestic Dogs. PLoS Genet 2016; 12:e1005851. [PMID: 26943675 PMCID: PMC4778760 DOI: 10.1371/journal.pgen.1005851] [Citation(s) in RCA: 65] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2015] [Accepted: 01/18/2016] [Indexed: 12/31/2022] Open
Abstract
Controlling for background demographic effects is important for accurately identifying loci that have recently undergone positive selection. To date, the effects of demography have not yet been explicitly considered when identifying loci under selection during dog domestication. To investigate positive selection on the dog lineage early in the domestication, we examined patterns of polymorphism in six canid genomes that were previously used to infer a demographic model of dog domestication. Using an inferred demographic model, we computed false discovery rates (FDR) and identified 349 outlier regions consistent with positive selection at a low FDR. The signals in the top 100 regions were frequently centered on candidate genes related to brain function and behavior, including LHFPL3, CADM2, GRIK3, SH3GL2, MBP, PDE7B, NTAN1, and GLRA1. These regions contained significant enrichments in behavioral ontology categories. The 3rd top hit, CCRN4L, plays a major role in lipid metabolism, that is supported by additional metabolism related candidates revealed in our scan, including SCP2D1 and PDXC1. Comparing our method to an empirical outlier approach that does not directly account for demography, we found only modest overlaps between the two methods, with 60% of empirical outliers having no overlap with our demography-based outlier detection approach. Demography-aware approaches have lower-rates of false discovery. Our top candidates for selection, in addition to expanding the set of neurobehavioral candidate genes, include genes related to lipid metabolism, suggesting a dietary target of selection that was important during the period when proto-dogs hunted and fed alongside hunter-gatherers.
Collapse
Affiliation(s)
- Adam H. Freedman
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Rena M. Schweizer
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Diego Ortega-Del Vecchyo
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Eunjung Han
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Brian W. Davis
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Ilan Gronau
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | | | | | - Zhenxin Fan
- Key Laboratory of Bioresources and Ecoenvironment, Sichuan University, Chengdu, China
| | - Peter Marx
- Department of Measurement and Information Systems, Budapest University of Technology and Economics, Budapest, Hungary
| | - Belen Lorente-Galdos
- ICREA at Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona, Spain
| | - Oscar Ramirez
- ICREA at Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona, Spain
| | - Farhad Hormozdiari
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California, United States of America
| | | | - Carles Vilà
- Estación Biológia de Doñana EBD-CSIC, Sevilla, Spain
| | - Kevin Squire
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Eli Geffen
- Department of Zoology, Tel Aviv University, Tel Aviv, Israel
| | - Josip Kusak
- Department of Biology, University of Zagreb, Zagreb, Croatia
| | - Adam R. Boyko
- Department of Biomedical Sciences, Cornell University, Ithaca, New York, United States of America
| | - Heidi G. Parker
- ICREA at Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona, Spain
| | - Clarence Lee
- Life Technologies, Foster City, California, United States of America
| | - Vasisht Tadigotla
- Life Technologies, Foster City, California, United States of America
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | | | | | - Stanley F. Nelson
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Tomas Marques-Bonet
- ICREA at Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona, Spain
- Centro Nacional de Analisis Genomico (CNAG/PCB), Baldiri Reixach 4–8, Barcelona, Spain
| | - Elaine A. Ostrander
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Robert K. Wayne
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| | - John Novembre
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| |
Collapse
|
38
|
Alachiotis N, Pavlidis P. Scalable linkage-disequilibrium-based selective sweep detection: a performance guide. Gigascience 2016; 5:7. [PMID: 26862394 PMCID: PMC4746822 DOI: 10.1186/s13742-016-0114-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2015] [Accepted: 01/20/2016] [Indexed: 01/09/2023] Open
Abstract
Background Linkage disequilibrium is defined as the non-random associations of alleles at different loci, and it occurs when genotypes at the two loci depend on each other. The model of genetic hitchhiking predicts that strong positive selection affects the patterns of linkage disequilibrium around the site of a beneficial allele, resulting in specific motifs of correlation between neutral polymorphisms that surround the fixed beneficial allele. Increased levels of linkage disequilibrium are observed on the same side of a beneficial allele, and diminish between sites on different sides of a beneficial mutation. This specific pattern of linkage disequilibrium occurs more frequently when positive selection has acted on the population rather than under various neutral models. Thus, detecting such patterns could accurately reveal targets of positive selection along a recombining chromosome or a genome. Calculating linkage disequilibria in whole genomes is computationally expensive because allele correlations need to be evaluated for millions of pairs of sites. To analyze large datasets efficiently, algorithmic implementations used in modern population genetics need to exploit multiple cores of current workstations in a scalable way. However, population genomic datasets come in various types and shapes while typically showing SNP density heterogeneity, which makes the implementation of generally scalable parallel algorithms a challenging task. Findings Here we present a series of four parallelization strategies targeting shared-memory systems for the computationally intensive problem of detecting genomic regions that have contributed to the past adaptation of the species, also referred to as regions that have undergone a selective sweep, based on linkage disequilibrium patterns. We provide a thorough performance evaluation of the proposed parallel algorithms for computing linkage disequilibrium, and outline the benefits of each approach. Furthermore, we compare the accuracy of our open-source sweep-detection software OmegaPlus, which implements all four parallelization strategies presented here, with a variety of neutrality tests. Conclusions The computational demands of selective sweep detection algorithms depend greatly on the SNP density heterogeneity and the data representation. Choosing the right parallel algorithm for the analysis can lead to significant processing time reduction and major energy savings. However, determining which parallel algorithm will execute more efficiently on a specific processor architecture and number of available cores for a particular dataset is not straightforward. Electronic supplementary material The online version of this article (doi:10.1186/s13742-016-0114-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Nikolaos Alachiotis
- Department of Electrical and Computer Engineering, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, 15213 PA USA
| | - Pavlos Pavlidis
- Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology-Hellas, Crete, 70013 Greece
| |
Collapse
|
39
|
Schweizer RM, Robinson J, Harrigan R, Silva P, Galverni M, Musiani M, Green RE, Novembre J, Wayne RK. Targeted capture and resequencing of 1040 genes reveal environmentally driven functional variation in grey wolves. Mol Ecol 2015; 25:357-79. [DOI: 10.1111/mec.13467] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Revised: 11/04/2015] [Accepted: 11/06/2015] [Indexed: 12/29/2022]
Affiliation(s)
- Rena M. Schweizer
- Department of Ecology and Evolutionary Biology University of California, Los Angeles 610 Charles E Young Dr East Los Angeles CA 90095 USA
| | - Jacqueline Robinson
- Department of Ecology and Evolutionary Biology University of California, Los Angeles 610 Charles E Young Dr East Los Angeles CA 90095 USA
| | - Ryan Harrigan
- Center for Tropical Research Institute of the Environment and Sustainability University of California 619 Charles E. Young Drive East Los Angeles CA 90095 USA
| | - Pedro Silva
- CIBIO/InBio – Centro de Investigação em Biodiversidade e Recursos Genéticos Universidade do Porto Campus Agrário de Vairão 4485‐661 Vairão Portugal
- Departamento de Biologia Faculdade de Ciências Universidade do Porto Rua do Campo Alegre s/n. 4169‐007 Porto Portugal
| | - Marco Galverni
- Laboratory of Genetics ISPRA (Istituto Superiore per la Protezione e Ricerca Ambientale) Via Cà Fornacetta 9 40064 Ozzano dell'Emilia BO Italy
| | - Marco Musiani
- Faculties of Environmental Design and Veterinary Medicine (Joint Appointment) EVDS University of Calgary 2500 University Dr NW Calgary Alberta Canada T2N 1N4
| | - Richard E. Green
- Department of Biomolecular Engineering University of California Santa Cruz CA 95060 USA
| | - John Novembre
- Department of Human Genetics University of Chicago 920 E. 58th Street Chicago IL 60637 USA
| | - Robert K. Wayne
- Department of Ecology and Evolutionary Biology University of California, Los Angeles 610 Charles E Young Dr East Los Angeles CA 90095 USA
| |
Collapse
|
40
|
Laurent S, Pfeifer SP, Settles ML, Hunter SS, Hardwick KM, Ormond L, Sousa VC, Jensen JD, Rosenblum EB. The population genomics of rapid adaptation: disentangling signatures of selection and demography in white sands lizards. Mol Ecol 2015; 25:306-23. [PMID: 26363411 DOI: 10.1111/mec.13385] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2015] [Revised: 08/31/2015] [Accepted: 09/04/2015] [Indexed: 02/03/2023]
Abstract
Understanding the process of adaptation during rapid environmental change remains one of the central focal points of evolutionary biology. The recently formed White Sands system of southern New Mexico offers an outstanding example of rapid adaptation, with a variety of species having rapidly evolved blanched forms on the dunes that contrast with their close relatives in the surrounding dark soil habitat. In this study, we focus on two of the White Sands lizard species, Sceloporus cowlesi and Aspidoscelis inornata, for which previous research has linked mutations in the melanocortin-1 receptor gene (Mc1r) to blanched coloration. We sampled populations both on and off the dunes and used a custom sequence capture assay based on probed fosmid libraries to obtain >50 kb of sequence around Mc1r and hundreds of other random genomic locations. We then used model-based statistical inference methods to describe the demographic and adaptive history characterizing the colonization of White Sands. We identified a number of similarities between the two focal species, including strong evidence of selection in the blanched populations in the Mc1r region. We also found important differences between the species, suggesting different colonization times, different genetic architecture underlying the blanched phenotype and different ages of the beneficial alleles. Finally, the beneficial allele is dominant in S. cowlesi and recessive in A. inornata, allowing for a rare empirical test of theoretically expected patterns of selective sweeps under these differing models.
Collapse
Affiliation(s)
- Stefan Laurent
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), EPFL SV IBI-SV UPJENSEN, Station 15, CH-1015, Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Susanne P Pfeifer
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), EPFL SV IBI-SV UPJENSEN, Station 15, CH-1015, Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Matthew L Settles
- Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID, 83844, USA
| | - Samuel S Hunter
- Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID, 83844, USA
| | - Kayla M Hardwick
- Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID, 83844, USA
| | - Louise Ormond
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), EPFL SV IBI-SV UPJENSEN, Station 15, CH-1015, Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Vitor C Sousa
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.,Institute of Ecology and Evolution, University of Berne, Baltzerstrasse 6, CH-3012, Berne, Switzerland
| | - Jeffrey D Jensen
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), EPFL SV IBI-SV UPJENSEN, Station 15, CH-1015, Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Erica Bree Rosenblum
- Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID, 83844, USA.,Department of Environmental Sciences, Policy & Management, Berkeley, CA, 94720, USA
| |
Collapse
|
41
|
Huber CD, DeGiorgio M, Hellmann I, Nielsen R. Detecting recent selective sweeps while controlling for mutation rate and background selection. Mol Ecol 2015; 25:142-56. [PMID: 26290347 PMCID: PMC5082542 DOI: 10.1111/mec.13351] [Citation(s) in RCA: 85] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Revised: 07/31/2015] [Accepted: 08/17/2015] [Indexed: 12/19/2022]
Abstract
A composite likelihood ratio test implemented in the program sweepfinder is a commonly used method for scanning a genome for recent selective sweeps. sweepfinder uses information on the spatial pattern (along the chromosome) of the site frequency spectrum around the selected locus. To avoid confounding effects of background selection and variation in the mutation process along the genome, the method is typically applied only to sites that are variable within species. However, the power to detect and localize selective sweeps can be greatly improved if invariable sites are also included in the analysis. In the spirit of a Hudson–Kreitman–Aguadé test, we suggest adding fixed differences relative to an out‐group to account for variation in mutation rate, thereby facilitating more robust and powerful analyses. We also develop a method for including background selection, modelled as a local reduction in the effective population size. Using simulations, we show that these advances lead to a gain in power while maintaining robustness to mutation rate variation. Furthermore, the new method also provides more precise localization of the causative mutation than methods using the spatial pattern of segregating sites alone.
Collapse
Affiliation(s)
- Christian D Huber
- Max F. Perutz Laboratory, University of Vienna, Vienna, Austria.,Vienna Graduate School of Population Genetics, University of Veterinary Medicine, Vienna, Austria.,Department of Ecology and Evolutionary Biology, University of California, Los Angeles, 621 Charles E. Young Drive South, Los Angeles, CA, 90095-1606, USA
| | - Michael DeGiorgio
- Departments of Biology and Statistics, Pennsylvania State University, University Park, PA, USA.,Institute for CyberScience, Pennsylvania State University, University Park, PA, USA
| | - Ines Hellmann
- Department Biologie II, Ludwig-Maximilians-Universität München, Großhaderner Str. 2, 82152, Planegg-Martinsried, Germany
| | - Rasmus Nielsen
- Departments of Integrative Biology and Statistics, University of California, Berkeley, CA, USA
| |
Collapse
|
42
|
Renzette N, Kowalik TF, Jensen JD. On the relative roles of background selection and genetic hitchhiking in shaping human cytomegalovirus genetic diversity. Mol Ecol 2015. [PMID: 26211679 DOI: 10.1111/mec.13331] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
A central focus of population genetics has been examining the contribution of selective and neutral processes in shaping patterns of intraspecies diversity. In terms of selection specifically, surveys of higher organisms have shown considerable variation in the relative contributions of background selection and genetic hitchhiking in shaping the distribution of polymorphisms, although these analyses have rarely been extended to bacteria and viruses. Here, we study the evolution of a ubiquitous, viral pathogen, human cytomegalovirus (HCMV), by analysing the relationship among intraspecies diversity, interspecies divergence and rates of recombination. We show that there is a strong correlation between diversity and divergence, consistent with expectations of neutral evolution. However, after correcting for divergence, there remains a significant correlation between intraspecies diversity and recombination rates, with additional analyses suggesting that this correlation is largely due to the effects of background selection. In addition, a small number of loci, centred on long noncoding RNAs, also show evidence of selective sweeps. These data suggest that HCMV evolution is dominated by neutral mechanisms as well as background selection, expanding our understanding of linked selection to a novel class of organisms.
Collapse
Affiliation(s)
- Nicholas Renzette
- Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, 368 Plantation Street, Worcester, MA, 01655, USA
| | - Timothy F Kowalik
- Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, 368 Plantation Street, Worcester, MA, 01655, USA.,Immunology and Microbiology Program, University of Massachusetts Medical School, 368 Plantation Street, Worcester, MA, 01655, USA
| | - Jeffrey D Jensen
- Swiss Institute of Bioinformatics (SIB), Lausanne, CH-1015, Switzerland.,School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, CH-1015, Switzerland
| |
Collapse
|
43
|
Mathew LA, Jensen JD. Evaluating the ability of the pairwise joint site frequency spectrum to co-estimate selection and demography. Front Genet 2015; 6:268. [PMID: 26347771 PMCID: PMC4538300 DOI: 10.3389/fgene.2015.00268] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2015] [Accepted: 08/03/2015] [Indexed: 12/23/2022] Open
Abstract
The ability to infer the parameters of positive selection from genomic data has many important implications, from identifying drug-resistance mutations in viruses to increasing crop yield by genetically integrating favorable alleles. Although it has been well-described that selection and demography may result in similar patterns of diversity, the ability to jointly estimate these two processes has remained elusive. Here, we use simulation to explore the utility of the joint site frequency spectrum to estimate selection and demography simultaneously, including developing an extension of the previously proposed Jaatha program (Mathew et al., 2013). We evaluate both complete and incomplete selective sweeps under an isolation-with-migration model with and without population size change (both population growth and bottlenecks). Results suggest that while it may not be possible to precisely estimate the strength of selection, it is possible to infer the presence of selection while estimating accurate demographic parameters. We further demonstrate that the common assumption of selective neutrality when estimating demographic models may lead to severe biases. Finally, we apply the approach we have developed to better characterize the within-host demographic and selective history of human cytomegalovirus (HCMV) infection using published next generation sequencing data.
Collapse
Affiliation(s)
- Lisha A Mathew
- School of Life Sciences, École Polytechnique Fédérale de Lausanne Lausanne, Switzerland
| | - Jeffrey D Jensen
- School of Life Sciences, École Polytechnique Fédérale de Lausanne Lausanne, Switzerland
| |
Collapse
|
44
|
Worldwide Population Structure, Long-Term Demography, and Local Adaptation of Helicobacter pylori. Genetics 2015; 200:947-63. [PMID: 25995212 DOI: 10.1534/genetics.115.176404] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Accepted: 05/15/2015] [Indexed: 12/15/2022] Open
Abstract
Helicobacter pylori is an important human pathogen associated with serious gastric diseases. Owing to its medical importance and close relationship with its human host, understanding genomic patterns of global and local adaptation in H. pylori may be of particular significance for both clinical and evolutionary studies. Here we present the first such whole genome analysis of 60 globally distributed strains, from which we inferred worldwide population structure and demographic history and shed light on interesting global and local events of positive selection, with particular emphasis on the evolution of San-associated lineages. Our results indicate a more ancient origin for the association of humans and H. pylori than previously thought. We identify several important perspectives for future clinical research on candidate selected regions that include both previously characterized genes (e.g., transcription elongation factor NusA and tumor necrosis factor alpha-inducing protein Tipα) and hitherto unknown functional genes.
Collapse
|
45
|
Poh YP, Domingues VS, Hoekstra HE, Jensen JD. On the prospect of identifying adaptive loci in recently bottlenecked populations. PLoS One 2014; 9:e110579. [PMID: 25383711 PMCID: PMC4226487 DOI: 10.1371/journal.pone.0110579] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2014] [Accepted: 09/16/2014] [Indexed: 12/14/2022] Open
Abstract
Identifying adaptively important loci in recently bottlenecked populations – be it natural selection acting on a population following the colonization of novel habitats in the wild, or artificial selection during the domestication of a breed – remains a major challenge. Here we report the results of a simulation study examining the performance of available population-genetic tools for identifying genomic regions under selection. To illustrate our findings, we examined the interplay between selection and demography in two species of Peromyscus mice, for which we have independent evidence of selection acting on phenotype as well as functional evidence identifying the underlying genotype. With this unusual information, we tested whether population-genetic-based approaches could have been utilized to identify the adaptive locus. Contrary to published claims, we conclude that the use of the background site frequency spectrum as a null model is largely ineffective in bottlenecked populations. Results are quantified both for site frequency spectrum and linkage disequilibrium-based predictions, and are found to hold true across a large parameter space that encompasses many species and populations currently under study. These results suggest that the genomic footprint left by selection on both new and standing variation in strongly bottlenecked populations will be difficult, if not impossible, to find using current approaches.
Collapse
Affiliation(s)
- Yu-Ping Poh
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, United States of America
- Howard Hughes Medical Institute, Department of Organismic & Evolutionary Biology, Department of Molecular & Cellular Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, United States of America
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- * E-mail:
| | - Vera S. Domingues
- Howard Hughes Medical Institute, Department of Organismic & Evolutionary Biology, Department of Molecular & Cellular Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, United States of America
| | - Hopi E. Hoekstra
- Howard Hughes Medical Institute, Department of Organismic & Evolutionary Biology, Department of Molecular & Cellular Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, United States of America
| | - Jeffrey D. Jensen
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, United States of America
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
46
|
On the unfounded enthusiasm for soft selective sweeps. Nat Commun 2014; 5:5281. [DOI: 10.1038/ncomms6281] [Citation(s) in RCA: 111] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Accepted: 09/17/2014] [Indexed: 11/09/2022] Open
|
47
|
Fagny M, Patin E, Enard D, Barreiro LB, Quintana-Murci L, Laval G. Exploring the occurrence of classic selective sweeps in humans using whole-genome sequencing data sets. Mol Biol Evol 2014; 31:1850-68. [PMID: 24694833 DOI: 10.1093/molbev/msu118] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Genome-wide scans for selection have identified multiple regions of the human genome as being targeted by positive selection. However, only a small proportion has been replicated across studies, and the prevalence of positive selection as a mechanism of adaptive change in humans remains controversial. Here we explore the power of two haplotype-based statistics--the integrated haplotype score (iHS) and the Derived Intraallelic Nucleotide Diversity (DIND) test--in the context of next-generation sequencing data, and evaluate their robustness to demography and other selection modes. We show that these statistics are both powerful for the detection of recent positive selection, regardless of population history, and robust to variation in coverage, with DIND being insensitive to very low coverage. We apply these statistics to whole-genome sequence data sets from the 1000 Genomes Project and Complete Genomics. We found that putative targets of selection were highly significantly enriched in genic and nonsynonymous single nucleotide polymorphisms, and that DIND was more powerful than iHS in the context of small sample sizes, low-quality genotype calling, or poor coverage. As we excluded genomic confounders and alternative selection models, such as background selection, the observed enrichment attests to the action of recent, strong positive selection. Further support to the adaptive significance of these genomic regions came from their enrichment in functional variants detected by genome-wide association studies, informing the relationship between past selection and current benign and disease-related phenotypic variation. Our results indicate that hard sweeps targeting low-frequency standing variation have played a moderate, albeit significant, role in recent human evolution.
Collapse
Affiliation(s)
- Maud Fagny
- Institut Pasteur, Human Evolutionary Genetics, Department of Genomes and Genetics, Paris, FranceCentre National de la Recherche Scientifique, URA3012, Paris, FranceUniversité Pierre et Marie Curie, Cellule Pasteur UPMC, Paris, France
| | - Etienne Patin
- Institut Pasteur, Human Evolutionary Genetics, Department of Genomes and Genetics, Paris, FranceCentre National de la Recherche Scientifique, URA3012, Paris, France
| | | | - Luis B Barreiro
- Department of Pediatrics, Sainte-Justine Hospital Research Center, University of Montreal, Montreal, Quebec, Canada
| | - Lluis Quintana-Murci
- Institut Pasteur, Human Evolutionary Genetics, Department of Genomes and Genetics, Paris, FranceCentre National de la Recherche Scientifique, URA3012, Paris, France
| | - Guillaume Laval
- Institut Pasteur, Human Evolutionary Genetics, Department of Genomes and Genetics, Paris, FranceCentre National de la Recherche Scientifique, URA3012, Paris, France
| |
Collapse
|