Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Pavlidis P, Jensen JD, Stephan W. Searching for footprints of positive selection in whole-genome SNP data from nonequilibrium populations. Genetics 2010;185:907-22. [PMID: 20407129 DOI: 10.1534/genetics.110.116459] [Citation(s) in RCA: 121] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open

For:	Pavlidis P, Jensen JD, Stephan W. Searching for footprints of positive selection in whole-genome SNP data from nonequilibrium populations. Genetics 2010;185:907-22. [PMID: 20407129 DOI: 10.1534/genetics.110.116459] [Citation(s) in RCA: 121] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open

Number

Cited by Other Article(s)

Soni V, Jensen JD. Temporal challenges in detecting balancing selection from population genomic data. G3 (BETHESDA, MD.) 2024;14:jkae069. [PMID: 38551137 DOI: 10.1093/g3journal/jkae069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 12/21/2023] [Accepted: 03/19/2024] [Indexed: 04/28/2024]

Abstract

The role of balancing selection in maintaining genetic variation remains an open question in population genetics. Recent years have seen numerous studies identifying candidate loci potentially experiencing balancing selection, most predominantly in human populations. There are however numerous alternative evolutionary processes that may leave similar patterns of variation, thereby potentially confounding inference, and the expected signatures of balancing selection additionally change in a temporal fashion. Here we use forward-in-time simulations to quantify expected statistical power to detect balancing selection using both site frequency spectrum- and linkage disequilibrium-based methods under a variety of evolutionarily realistic null models. We find that whilst site frequency spectrum-based methods have little power immediately after a balanced mutation begins segregating, power increases with time since the introduction of the balanced allele. Conversely, linkage disequilibrium-based methods have considerable power whilst the allele is young, and power dissipates rapidly as the time since introduction increases. Taken together, this suggests that site frequency spectrum-based methods are most effective at detecting long-term balancing selection (>25N generations since the introduction of the balanced allele) whilst linkage disequilibrium-based methods are effective over much shorter timescales (<1N generations), thereby leaving a large time frame over which current methods have little power to detect the action of balancing selection. Finally, we investigate the extent to which alternative evolutionary processes may mimic these patterns, and demonstrate the need for caution in attempting to distinguish the signatures of balancing selection from those of both neutral processes (e.g. population structure and admixture) as well as of alternative selective processes (e.g. partial selective sweeps).

Collapse

Dabi A, Schrider DR. Population size rescaling significantly biases outcomes of forward-in-time population genetic simulations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.07.588318. [PMID: 38645049 PMCID: PMC11030438 DOI: 10.1101/2024.04.07.588318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]

Abstract

Simulations are an essential tool in all areas of population genetic research, used in tasks such as the validation of theoretical analysis and the study of complex evolutionary models. Forward-in-time simulations are especially flexible, allowing for various types of natural selection, complex genetic architectures, and non-Wright-Fisher dynamics. However, their intense computational requirements can be prohibitive to simulating large populations and genomes. A popular method to alleviate this burden is to scale down the population size by some scaling factor while scaling up the mutation rate, selection coefficients, and recombination rate by the same factor. However, this rescaling approach may in some cases bias simulation results. To investigate the manner and degree to which rescaling impacts simulation outcomes, we carried out simulations with different demographic histories and distributions of fitness effects using several values of the rescaling factor, Q , and compared the deviation of key outcomes (fixation times, fixation probabilities, allele frequencies, and linkage disequilibrium) between the scaled and unscaled simulations. Our results indicate that scaling introduces substantial biases to each of these measured outcomes, even at small values of Q . Moreover, the nature of these effects depends on the evolutionary model and scaling factor being examined. While increasing the scaling factor tends to increase the observed biases, this relationship is not always straightforward, thus it may be difficult to know the impact of scaling on simulation outcomes a priori. However, it appears that for most models, only a small number of replicates was needed to accurately quantify the bias produced by rescaling for a given Q . In summary, while rescaling forward-in-time simulations may be necessary in many cases, researchers should be aware of the rescaling effect's impact on simulation outcomes and consider investigating its magnitude in smaller scale simulations of the desired model(s) before selecting an appropriate value of Q .

Collapse

Rajawat D, Ghildiyal K, Sonejita Nayak S, Sharma A, Parida S, Kumar S, Ghosh AK, Singh U, Sivalingam J, Bhushan B, Dutt T, Panigrahi M. Genome-wide mining of diversity and evolutionary signatures revealed selective hotspots in Indian Sahiwal cattle. Gene 2024;901:148178. [PMID: 38242377 DOI: 10.1016/j.gene.2024.148178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 01/10/2024] [Accepted: 01/16/2024] [Indexed: 01/21/2024]

Abstract

The Sahiwal cattle breed is the best indigenous dairy cattle breed, and it plays a pivotal role in the Indian dairy industry. This is due to its exceptional milk-producing potential, adaptability to local tropical conditions, and its resilience to ticks and diseases. The study aimed to identify selective sweeps and estimate intrapopulation genetic diversity parameters in Sahiwal cattle using ddRAD sequencing-based genotyping data from 82 individuals. After applying filtering criteria, 78,193 high-quality SNPs remained for further analysis. The population exhibited an average minor allele frequency of 0.221 ± 0.119. Genetic diversity metrics, including observed (0.597 ± 0.196) and expected heterozygosity (0.433 ± 0.096), nucleotide diversity (0.327 ± 0.114), the proportion of polymorphic SNPs (0.726), and allelic richness (1.323 ± 0.134), indicated ample genomic diversity within the breed. Furthermore, an effective population size of 74 was observed in the most recent generation. The overall mean linkage disequilibrium (r2) for pairwise SNPs was 0.269 ± 0.057. Moreover, a greater proportion of short Runs of Homozygosity (ROH) segments were observed suggesting that there may be low levels of recent inbreeding in this population. The genomic inbreeding coefficients, computed using different inbreeding estimates (FHOM, FUNI, FROH, and FGROM), ranged from -0.0289 to 0.0725. Subsequently, we found 146 regions undergoing selective sweeps using five distinct statistical tests: Tajima's D, CLR, |iHS|, |iHH12|, and ROH. These regions, located in non-overlapping 500 kb windows, were mapped and revealed various protein-coding genes associated with enhanced immune systems and disease resistance (IFNL3, IRF8, BLK), as well as production traits (NRXN1, PLCE1, GHR). Notably, we identified interleukin 2 (IL2) on Chr17: 35217075-35223276 as a gene linked to tick resistance and uncovered a cluster of genes (HSPA8, UBASH3B, ADAMTS18, CRTAM) associated with heat stress. These findings indicate the evolutionary impact of natural and artificial selection on the environmental adaptation of the Sahiwal cattle population.

Collapse

Song H, Chu J, Li W, Li X, Fang L, Han J, Zhao S, Ma Y. A Novel Approach Utilizing Domain Adversarial Neural Networks for the Detection and Classification of Selective Sweeps. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024;11:e2304842. [PMID: 38308186 PMCID: PMC11005742 DOI: 10.1002/advs.202304842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 01/10/2024] [Indexed: 02/04/2024]

Affiliation(s)

Hui Song Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
Jinyu Chu Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
Wangjiao Li Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
Xinyun Li Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China Hubei Hongshan LaboratoryWuhan430070China
Lingzhao Fang Center for Quantitative Genetics and GenomicsAarhus UniversityAarhus8000Denmark
Jianlin Han Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China CAAS‐ILRI Joint Laboratory on Livestock and Forage Genetic ResourcesInstitute of Animal ScienceChinese Academy of Agricultural Sciences (CAAS)Beijing100193China Livestock Genetics ProgramInternational Livestock Research Institute (ILRI)Nairobi00100Kenya
Shuhong Zhao Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China Hubei Hongshan LaboratoryWuhan430070China Lingnan Modern Agricultural Science and Technology Guangdong LaboratoryGuangzhou510642China
Yunlong Ma Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China Hubei Hongshan LaboratoryWuhan430070China Lingnan Modern Agricultural Science and Technology Guangdong LaboratoryGuangzhou510642China

Collapse

Ray DD, Flagel L, Schrider DR. IntroUNET: Identifying introgressed alleles via semantic segmentation. PLoS Genet 2024;20:e1010657. [PMID: 38377104 PMCID: PMC10906877 DOI: 10.1371/journal.pgen.1010657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 03/01/2024] [Accepted: 01/29/2024] [Indexed: 02/22/2024] Open

Abstract

A growing body of evidence suggests that gene flow between closely related species is a widespread phenomenon. Alleles that introgress from one species into a close relative are typically neutral or deleterious, but sometimes confer a significant fitness advantage. Given the potential relevance to speciation and adaptation, numerous methods have therefore been devised to identify regions of the genome that have experienced introgression. Recently, supervised machine learning approaches have been shown to be highly effective for detecting introgression. One especially promising approach is to treat population genetic inference as an image classification problem, and feed an image representation of a population genetic alignment as input to a deep neural network that distinguishes among evolutionary models (i.e. introgression or no introgression). However, if we wish to investigate the full extent and fitness effects of introgression, merely identifying genomic regions in a population genetic alignment that harbor introgressed loci is insufficient-ideally we would be able to infer precisely which individuals have introgressed material and at which positions in the genome. Here we adapt a deep learning algorithm for semantic segmentation, the task of correctly identifying the type of object to which each individual pixel in an image belongs, to the task of identifying introgressed alleles. Our trained neural network is thus able to infer, for each individual in a two-population alignment, which of those individual's alleles were introgressed from the other population. We use simulated data to show that this approach is highly accurate, and that it can be readily extended to identify alleles that are introgressed from an unsampled "ghost" population, performing comparably to a supervised learning method tailored specifically to that task. Finally, we apply this method to data from Drosophila, showing that it is able to accurately recover introgressed haplotypes from real data. This analysis reveals that introgressed alleles are typically confined to lower frequencies within genic regions, suggestive of purifying selection, but are found at much higher frequencies in a region previously shown to be affected by adaptive introgression. Our method's success in recovering introgressed haplotypes in challenging real-world scenarios underscores the utility of deep learning approaches for making richer evolutionary inferences from genomic data.

Collapse

Ray DD, Flagel L, Schrider DR. IntroUNET: identifying introgressed alleles via semantic segmentation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.02.07.527435. [PMID: 36865105 PMCID: PMC9979274 DOI: 10.1101/2023.02.07.527435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]

Abstract

Collapse

Gao Z. Unveiling recent and ongoing adaptive selection in human populations. PLoS Biol 2024;22:e3002469. [PMID: 38236800 PMCID: PMC10796035 DOI: 10.1371/journal.pbio.3002469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2024] Open

Panigrahi M, Rajawat D, Nayak SS, Ghildiyal K, Sharma A, Jain K, Lei C, Bhushan B, Mishra BP, Dutt T. Landmarks in the history of selective sweeps. Anim Genet 2023;54:667-688. [PMID: 37710403 DOI: 10.1111/age.13355] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 08/28/2023] [Indexed: 09/16/2023]

Soni V, Johri P, Jensen JD. Evaluating power to detect recurrent selective sweeps under increasingly realistic evolutionary null models. Evolution 2023;77:2113-2127. [PMID: 37395482 PMCID: PMC10547124 DOI: 10.1093/evolut/qpad120] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 06/15/2023] [Accepted: 06/30/2023] [Indexed: 07/04/2023]

Mascarenhas R, Meirelles PM, Batalha-Filho H. Urbanization drives adaptive evolution in a Neotropical bird. Curr Zool 2023;69:607-619. [PMID: 37637315 PMCID: PMC10449428 DOI: 10.1093/cz/zoac066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 08/16/2022] [Indexed: 08/29/2023] Open

Soni V, Johri P, Jensen JD. Evaluating power to detect recurrent selective sweeps under increasingly realistic evolutionary null models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.15.545166. [PMID: 37398347 PMCID: PMC10312679 DOI: 10.1101/2023.06.15.545166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]

Abstract

The detection of selective sweeps from population genomic data often relies on the premise that the beneficial mutations in question have fixed very near the sampling time. As it has been previously shown that the power to detect a selective sweep is strongly dependent on the time since fixation as well as the strength of selection, it is naturally the case that strong, recent sweeps leave the strongest signatures. However, the biological reality is that beneficial mutations enter populations at a rate, one that partially determines the mean wait time between sweep events and hence their age distribution. An important question thus remains about the power to detect recurrent selective sweeps when they are modelled by a realistic mutation rate and as part of a realistic distribution of fitness effects (DFE), as opposed to a single, recent, isolated event on a purely neutral background as is more commonly modelled. Here we use forward-in-time simulations to study the performance of commonly used sweep statistics, within the context of more realistic evolutionary baseline models incorporating purifying and background selection, population size change, and mutation and recombination rate heterogeneity. Results demonstrate the important interplay of these processes, necessitating caution when interpreting selection scans; specifically, false positive rates are in excess of true positive across much of the evaluated parameter space, and selective sweeps are often undetectable unless the strength of selection is exceptionally strong.

Teaser Text

Outlier-based genomic scans have proven a popular approach for identifying loci that have potentially experienced recent positive selection. However, it has previously been shown that an evolutionarily appropriate baseline model that incorporates non-equilibrium population histories, purifying and background selection, and variation in mutation and recombination rates is necessary to reduce often extreme false positive rates when performing genomic scans. Here we evaluate the power to detect recurrent selective sweeps using common SFS-based and haplotype-based methods under these increasingly realistic models. We find that while these appropriate evolutionary baselines are essential to reduce false positive rates, the power to accurately detect recurrent selective sweeps is generally low across much of the biologically relevant parameter space.

Collapse

Ahlquist KD, Sugden LA, Ramachandran S. Enabling interpretable machine learning for biological data with reliability scores. PLoS Comput Biol 2023;19:e1011175. [PMID: 37235578 PMCID: PMC10249903 DOI: 10.1371/journal.pcbi.1011175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 06/08/2023] [Accepted: 05/10/2023] [Indexed: 05/28/2023] Open

Abstract

Machine learning tools have proven useful across biological disciplines, allowing researchers to draw conclusions from large datasets, and opening up new opportunities for interpreting complex and heterogeneous biological data. Alongside the rapid growth of machine learning, there have also been growing pains: some models that appear to perform well have later been revealed to rely on features of the data that are artifactual or biased; this feeds into the general criticism that machine learning models are designed to optimize model performance over the creation of new biological insights. A natural question arises: how do we develop machine learning models that are inherently interpretable or explainable? In this manuscript, we describe the SWIF(r) reliability score (SRS), a method building on the SWIF(r) generative framework that reflects the trustworthiness of the classification of a specific instance. The concept of the reliability score has the potential to generalize to other machine learning methods. We demonstrate the utility of the SRS when faced with common challenges in machine learning including: 1) an unknown class present in testing data that was not present in training data, 2) systemic mismatch between training and testing data, and 3) instances of testing data that have missing values for some attributes. We explore these applications of the SRS using a range of biological datasets, from agricultural data on seed morphology, to 22 quantitative traits in the UK Biobank, and population genetic simulations and 1000 Genomes Project data. With each of these examples, we demonstrate how the SRS can allow researchers to interrogate their data and training approach thoroughly, and to pair their domain-specific knowledge with powerful machine-learning frameworks. We also compare the SRS to related tools for outlier and novelty detection, and find that it has comparable performance, with the advantage of being able to operate when some data are missing. The SRS, and the broader discussion of interpretable scientific machine learning, will aid researchers in the biological machine learning space as they seek to harness the power of machine learning without sacrificing rigor and biological insight.

Collapse

Korfmann K, Gaggiotti OE, Fumagalli M. Deep Learning in Population Genetics. Genome Biol Evol 2023;15:6997869. [PMID: 36683406 PMCID: PMC9897193 DOI: 10.1093/gbe/evad008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 12/19/2022] [Accepted: 01/16/2023] [Indexed: 01/24/2023] Open

Schlichta F, Moinet A, Peischl S, Excoffier L. The Impact of Genetic Surfing on Neutral Genomic Diversity. Mol Biol Evol 2022;39:msac249. [PMID: 36403964 PMCID: PMC9703594 DOI: 10.1093/molbev/msac249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open

Burny C, Nolte V, Dolezal M, Schlötterer C. Genome-wide selection signatures reveal widespread synergistic effects of two different stressors in Drosophila melanogaster. Proc Biol Sci 2022;289:20221857. [PMID: 36259211 PMCID: PMC9579754 DOI: 10.1098/rspb.2022.1857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Gómez-Espejo AL, Sansaloni CP, Burgueño J, Toledo FH, Benavides-Mendoza A, Reyes-Valdés MH. Worldwide Selection Footprints for Drought and Heat in Bread Wheat (Triticum aestivum L.). PLANTS 2022;11:plants11172289. [PMID: 36079671 PMCID: PMC9460392 DOI: 10.3390/plants11172289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 08/18/2022] [Accepted: 08/29/2022] [Indexed: 11/16/2022]

Kumar H, Panigrahi M, Panwar A, Rajawat D, Nayak SS, Saravanan KA, Kaisa K, Parida S, Bhushan B, Dutt T. Machine-Learning Prospects for Detecting Selection Signatures Using Population Genomics Data. J Comput Biol 2022;29:943-960. [PMID: 35639362 DOI: 10.1089/cmb.2021.0447] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open

Pettie N, Llopart A, Comeron JM. Meiotic, genomic and evolutionary properties of crossover distribution in Drosophila yakuba. PLoS Genet 2022;18:e1010087. [PMID: 35320272 PMCID: PMC8979470 DOI: 10.1371/journal.pgen.1010087] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 04/04/2022] [Accepted: 02/09/2022] [Indexed: 12/14/2022] Open

Abstract

The number and location of crossovers across genomes are highly regulated during meiosis, yet the key components controlling them are fast evolving, hindering our understanding of the mechanistic causes and evolutionary consequences of changes in crossover rates. Drosophila melanogaster has been a model species to study meiosis for more than a century, with an available high-resolution crossover map that is, nonetheless, missing for closely related species, thus preventing evolutionary context. Here, we applied a novel and highly efficient approach to generate whole-genome high-resolution crossover maps in D. yakuba to tackle multiple questions that benefit from being addressed collectively within an appropriate phylogenetic framework, in our case the D. melanogaster species subgroup. The genotyping of more than 1,600 individual meiotic events allowed us to identify several key distinct properties relative to D. melanogaster. We show that D. yakuba, in addition to higher crossover rates than D. melanogaster, has a stronger centromere effect and crossover assurance than any Drosophila species analyzed to date. We also report the presence of an active crossover-associated meiotic drive mechanism for the X chromosome that results in the preferential inclusion in oocytes of chromatids with crossovers. Our evolutionary and genomic analyses suggest that the genome-wide landscape of crossover rates in D. yakuba has been fairly stable and captures a significant signal of the ancestral crossover landscape for the whole D. melanogaster subgroup, even informative for the D. melanogaster lineage. Contemporary crossover rates in D. melanogaster, on the other hand, do not recapitulate ancestral crossovers landscapes. As a result, the temporal stability of crossover landscapes observed in D. yakuba makes this species an ideal system for applying population genetic models of selection and linkage, given that these models assume temporal constancy in linkage effects. Our studies emphasize the importance of generating multiple high-resolution crossover rate maps within a coherent phylogenetic context to broaden our understanding of crossover control during meiosis and to improve studies on the evolutionary consequences of variable crossover rates across genomes and time.

Collapse

Fine human genetic map based on UK10K data set. Hum Genet 2022;141:273-281. [DOI: 10.1007/s00439-021-02415-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2021] [Accepted: 12/03/2021] [Indexed: 11/04/2022]

Vecchyo DOD, Lohmueller KE, Novembre J. Haplotype-based inference of the distribution of fitness effects. Genetics 2022;220:6501446. [PMID: 35100400 PMCID: PMC8982047 DOI: 10.1093/genetics/iyac002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 12/18/2021] [Indexed: 11/13/2022] Open

Stephan W. The classical hitchhiking model with continuous mutational pressure and purifying selection. Ecol Evol 2021;11:15896-15904. [PMID: 34824798 PMCID: PMC8601925 DOI: 10.1002/ece3.8259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Revised: 08/24/2021] [Accepted: 10/08/2021] [Indexed: 11/14/2022] Open

Isildak U, Stella A, Fumagalli M. Distinguishing between recent balancing selection and incomplete sweep using deep neural networks. Mol Ecol Resour 2021;21:2706-2718. [PMID: 33749134 DOI: 10.1111/1755-0998.13379] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 03/01/2021] [Accepted: 03/05/2021] [Indexed: 12/12/2022]

Xue AT, Schrider DR, Kern AD. Discovery of Ongoing Selective Sweeps within Anopheles Mosquito Populations Using Deep Learning. Mol Biol Evol 2021;38:1168-1183. [PMID: 33022051 PMCID: PMC7947845 DOI: 10.1093/molbev/msaa259] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open

Abstract

Identification of partial sweeps, which include both hard and soft sweeps that have not currently reached fixation, provides crucial information about ongoing evolutionary responses. To this end, we introduce partialS/HIC, a deep learning method to discover selective sweeps from population genomic data. partialS/HIC uses a convolutional neural network for image processing, which is trained with a large suite of summary statistics derived from coalescent simulations incorporating population-specific history, to distinguish between completed versus partial sweeps, hard versus soft sweeps, and regions directly affected by selection versus those merely linked to nearby selective sweeps. We perform several simulation experiments under various demographic scenarios to demonstrate partialS/HIC's performance, which exhibits excellent resolution for detecting partial sweeps. We also apply our classifier to whole genomes from eight mosquito populations sampled across sub-Saharan Africa by the Anopheles gambiae 1000 Genomes Consortium, elucidating both continent-wide patterns as well as sweeps unique to specific geographic regions. These populations have experienced intense insecticide exposure over the past two decades, and we observe a strong overrepresentation of sweeps at insecticide resistance loci. Our analysis thus provides a list of candidate adaptive loci that may be relevant to mosquito control efforts. More broadly, our supervised machine learning approach introduces a method to distinguish between completed and partial sweeps, as well as between hard and soft sweeps, under a variety of demographic scenarios. As whole-genome data rapidly accumulate for a greater diversity of organisms, partialS/HIC addresses an increasing demand for useful selection scan tools that can track in-progress evolutionary dynamics.

Collapse

Population Genomics on the Fly: Recent Advances in Drosophila. Methods Mol Biol 2021;2090:357-396. [PMID: 31975175 DOI: 10.1007/978-1-0716-0199-0_15] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Demographic analyses of a new sample of haploid genomes from a Swedish population of Drosophila melanogaster. Sci Rep 2020;10:22415. [PMID: 33376238 PMCID: PMC7772335 DOI: 10.1038/s41598-020-79720-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Accepted: 12/11/2020] [Indexed: 01/27/2023] Open

Choi JY, Purugganan M, Stacy EA. Divergent Selection and Primary Gene Flow Shape Incipient Speciation of a Riparian Tree on Hawaii Island. Mol Biol Evol 2020;37:695-710. [PMID: 31693149 PMCID: PMC7038655 DOI: 10.1093/molbev/msz259] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open

Marchi N, Excoffier L. Gene flow as a simple cause for an excess of high-frequency-derived alleles. Evol Appl 2020;13:2254-2263. [PMID: 33005222 PMCID: PMC7513730 DOI: 10.1111/eva.12998] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Revised: 04/30/2020] [Accepted: 05/04/2020] [Indexed: 01/19/2023] Open

Animal domestication in the era of ancient genomics. Nat Rev Genet 2020;21:449-460. [PMID: 32265525 DOI: 10.1038/s41576-020-0225-0] [Citation(s) in RCA: 78] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/04/2020] [Indexed: 01/09/2023]

Nakagome S, Hudson RR, Di Rienzo A. Inferring the model and onset of natural selection under varying population size from the site frequency spectrum and haplotype structure. Proc Biol Sci 2020;286:20182541. [PMID: 30963935 DOI: 10.1098/rspb.2018.2541] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open

Moest M, Van Belleghem SM, James JE, Salazar C, Martin SH, Barker SL, Moreira GRP, Mérot C, Joron M, Nadeau NJ, Steiner FM, Jiggins CD. Selective sweeps on novel and introgressed variation shape mimicry loci in a butterfly adaptive radiation. PLoS Biol 2020;18:e3000597. [PMID: 32027643 PMCID: PMC7029882 DOI: 10.1371/journal.pbio.3000597] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 02/19/2020] [Accepted: 01/15/2020] [Indexed: 11/21/2022] Open

Luo S, Zhang H, Duan Y, Yao X, Clark AG, Lu J. The evolutionary arms race between transposable elements and piRNAs in Drosophila melanogaster. BMC Evol Biol 2020;20:14. [PMID: 31992188 PMCID: PMC6988346 DOI: 10.1186/s12862-020-1580-3] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2019] [Accepted: 01/13/2020] [Indexed: 01/05/2023] Open

Koropoulis A, Alachiotis N, Pavlidis P. Detecting Positive Selection in Populations Using Genetic Data. Methods Mol Biol 2020;2090:87-123. [PMID: 31975165 DOI: 10.1007/978-1-0716-0199-0_5] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Abstract

High-throughput genomic sequencing allows to disentangle the evolutionary forces acting in populations. Among evolutionary forces, positive selection has received a lot of attention because it is related to the adaptation of populations in their environments, both biotic and abiotic. Positive selection, also known as Darwinian selection, occurs when an allele is favored by natural selection. The frequency of the favored allele increases in the population and, due to genetic hitchhiking, neighboring linked variation diminishes, creating so-called selective sweeps. Such a process leaves traces in genomes that can be detected in a future time point. Detecting traces of positive selection in genomes is achieved by searching for signatures introduced by selective sweeps, such as regions of reduced variation, a specific shift of the site frequency spectrum, and particular linkage disequilibrium (LD) patterns in the region. A variety of approaches can be used for detecting selective sweeps, ranging from simple implementations that compute summary statistics to more advanced statistical approaches, e.g., Bayesian approaches, maximum-likelihood-based methods, and machine learning methods. In this chapter, we discuss selective sweep detection methodologies on the basis of their capacity to analyze whole genomes or just subgenomic regions, and on the specific polymorphism patterns they exploit as selective sweep signatures. We also summarize the results of comparisons among five open-source software releases (SweeD, SweepFinder, SweepFinder2, OmegaPlus, and RAiSD) regarding sensitivity, specificity, and execution times. Furthermore, we test and discuss machine learning methods and present a thorough performance analysis. In equilibrium neutral models or mild bottlenecks, most methods are able to detect selective sweeps accurately. Methods and tools that rely on linkage disequilibrium (LD) rather than single SNPs exhibit higher true positive rates than the site frequency spectrum (SFS)-based methods under the model of a single sweep or recurrent hitchhiking. However, their false positive rate is elevated when a misspecified demographic model is used to build the distribution of the statistic under the null hypothesis. Both LD and SFS-based approaches suffer from decreased accuracy on localizing the true target of selection in bottleneck scenarios. Furthermore, we present an extensive analysis of the effects of gene flow on selective sweep detection, a problem that has been understudied in selective sweep literature.

Collapse

Introgression drives repeated evolution of winter coat color polymorphism in hares. Proc Natl Acad Sci U S A 2019;116:24150-24156. [PMID: 31712446 DOI: 10.1073/pnas.1910471116] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Kapopoulou A, Pfeifer SP, Jensen JD, Laurent S. The Demographic History of African Drosophila melanogaster. Genome Biol Evol 2019;10:2338-2342. [PMID: 30169784 PMCID: PMC6363051 DOI: 10.1093/gbe/evy185] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/27/2018] [Indexed: 11/14/2022] Open

Selective Sweeps. Genetics 2019;211:5-13. [PMID: 30626638 DOI: 10.1534/genetics.118.301319] [Citation(s) in RCA: 95] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Accepted: 07/10/2018] [Indexed: 11/18/2022] Open

Flagel L, Brandvain Y, Schrider DR. The Unreasonable Effectiveness of Convolutional Neural Networks in Population Genetic Inference. Mol Biol Evol 2019;36:220-238. [PMID: 30517664 PMCID: PMC6367976 DOI: 10.1093/molbev/msy224] [Citation(s) in RCA: 95] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open

Chan J, Perrone V, Spence JP, Jenkins PA, Mathieson S, Song YS. A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2018;31:8594-8605. [PMID: 33244210 PMCID: PMC7687905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Detection and Classification of Hard and Soft Sweeps from Unphased Genotypes by Multilocus Genotype Identity. Genetics 2018;210:1429-1452. [PMID: 30315068 DOI: 10.1534/genetics.118.301502] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2018] [Accepted: 10/08/2018] [Indexed: 11/18/2022] Open

Abstract

Positive natural selection can lead to a decrease in genomic diversity at the selected site and at linked sites, producing a characteristic signature of elevated expected haplotype homozygosity. These selective sweeps can be hard or soft. In the case of a hard selective sweep, a single adaptive haplotype rises to high population frequency, whereas multiple adaptive haplotypes sweep through the population simultaneously in a soft sweep, producing distinct patterns of genetic variation in the vicinity of the selected site. Measures of expected haplotype homozygosity have previously been used to detect sweeps in multiple study systems. However, these methods are formulated for phased haplotype data, typically unavailable for nonmodel organisms, and some may have reduced power to detect soft sweeps due to their increased genetic diversity relative to hard sweeps. To address these limitations, we applied the H12 and H2/H1 statistics proposed in 2015 by Garud et al., which have power to detect both hard and soft sweeps, to unphased multilocus genotypes, denoting them as G12 and G2/G1. G12 (and the more direct expected homozygosity analog to H12, denoted G123) has comparable power to H12 for detecting both hard and soft sweeps. G2/G1 can be used to classify hard and soft sweeps analogously to H2/H1, conditional on a genomic region having high G12 or G123 values. The reason for this power is that, under random mating, the most frequent haplotypes will yield the most frequent multilocus genotypes. Simulations based on parameters compatible with our recent understanding of human demographic history suggest that expected homozygosity methods are best suited for detecting recent sweeps, and increase in power under recent population expansions. Finally, we find candidates for selective sweeps within the 1000 Genomes CEU, YRI, GIH, and CHB populations, which corroborate and complement existing studies.

Collapse

Jones MR, Mills LS, Alves PC, Callahan CM, Alves JM, Lafferty DJR, Jiggins FM, Jensen JD, Melo-Ferreira J, Good JM. Adaptive introgression underlies polymorphic seasonal camouflage in snowshoe hares. Science 2018;360:1355-1358. [DOI: 10.1126/science.aar5273] [Citation(s) in RCA: 182] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Accepted: 05/01/2018] [Indexed: 12/14/2022]

Kern AD, Schrider DR. diploS/HIC: An Updated Approach to Classifying Selective Sweeps. G3 (BETHESDA, MD.) 2018;8:1959-1970. [PMID: 29626082 PMCID: PMC5982824 DOI: 10.1534/g3.118.200262] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 04/04/2018] [Indexed: 11/18/2022]

Schrider DR, Ayroles J, Matute DR, Kern AD. Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia. PLoS Genet 2018;14:e1007341. [PMID: 29684059 PMCID: PMC5933812 DOI: 10.1371/journal.pgen.1007341] [Citation(s) in RCA: 69] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2017] [Revised: 05/03/2018] [Accepted: 03/28/2018] [Indexed: 12/30/2022] Open

Abstract

Hybridization and gene flow between species appears to be common. Even though it is clear that hybridization is widespread across all surveyed taxonomic groups, the magnitude and consequences of introgression are still largely unknown. Thus it is crucial to develop the statistical machinery required to uncover which genomic regions have recently acquired haplotypes via introgression from a sister population. We developed a novel machine learning framework, called FILET (Finding Introgressed Loci via Extra-Trees) capable of revealing genomic introgression with far greater power than competing methods. FILET works by combining information from a number of population genetic summary statistics, including several new statistics that we introduce, that capture patterns of variation across two populations. We show that FILET is able to identify loci that have experienced gene flow between related species with high accuracy, and in most situations can correctly infer which population was the donor and which was the recipient. Here we describe a data set of outbred diploid Drosophila sechellia genomes, and combine them with data from D. simulans to examine recent introgression between these species using FILET. Although we find that these populations may have split more recently than previously appreciated, FILET confirms that there has indeed been appreciable recent introgression (some of which might have been adaptive) between these species, and reveals that this gene flow is primarily in the direction of D. simulans to D. sechellia.

Understanding the extent to which species or diverged populations hybridize in nature is crucially important if we are to understand the speciation process. Accordingly numerous research groups have developed methodology for finding the genetic evidence of such introgression. In this report we develop a supervised machine learning approach for uncovering loci which have introgressed across species boundaries. We show that our method, FILET, has greater accuracy and power than competing methods in discovering introgression, and in addition can detect the directionality associated with the gene flow between species. Using whole genome sequences from Drosophila simulans and Drosophila sechellia we show that FILET discovers quite extensive introgression between these species that has occurred mostly from D. simulans to D. sechellia. Our work highlights the complex process of speciation even within a well-studied system and points to the growing importance of supervised machine learning in population genetics.

Collapse

Weigand H, Leese F. Detecting signatures of positive selection in non-model species using genomic data. Zool J Linn Soc 2018. [DOI: 10.1093/zoolinnean/zly007] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Schrider DR, Kern AD. Supervised Machine Learning for Population Genetics: A New Paradigm. Trends Genet 2018;34:301-312. [PMID: 29331490 PMCID: PMC5905713 DOI: 10.1016/j.tig.2017.12.005] [Citation(s) in RCA: 201] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Revised: 11/29/2017] [Accepted: 12/08/2017] [Indexed: 01/21/2023]

Lee KS, Chatterjee P, Choi EY, Sung MK, Oh J, Won H, Park SM, Kim YJ, Yi SV, Choi JK. Selection on the regulation of sympathetic nervous activity in humans and chimpanzees. PLoS Genet 2018;14:e1007311. [PMID: 29672586 PMCID: PMC5908061 DOI: 10.1371/journal.pgen.1007311] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Accepted: 03/17/2018] [Indexed: 12/31/2022] Open

Sugden LA, Atkinson EG, Fischer AP, Rong S, Henn BM, Ramachandran S. Localization of adaptive variants in human genomes using averaged one-dependence estimation. Nat Commun 2018;9:703. [PMID: 29459739 PMCID: PMC5818606 DOI: 10.1038/s41467-018-03100-7] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Accepted: 01/19/2018] [Indexed: 12/19/2022] Open

Detecting Recent Positive Selection with a Single Locus Test Bipartitioning the Coalescent Tree. Genetics 2017;208:791-805. [PMID: 29217523 DOI: 10.1534/genetics.117.300401] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2017] [Accepted: 12/01/2017] [Indexed: 01/09/2023] Open

Harris SE, Munshi-South J. Signatures of positive selection and local adaptation to urbanization in white-footed mice (Peromyscus leucopus). Mol Ecol 2017;26:6336-6350. [PMID: 28980357 DOI: 10.1111/mec.14369] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2016] [Accepted: 09/25/2017] [Indexed: 02/06/2023]

Range Expansion Compromises Adaptive Evolution in an Outcrossing Plant. Curr Biol 2017;27:2544-2551.e4. [DOI: 10.1016/j.cub.2017.07.007] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Revised: 05/22/2017] [Accepted: 07/04/2017] [Indexed: 01/04/2023]

Schrider DR, Kern AD. Soft Sweeps Are the Dominant Mode of Adaptation in the Human Genome. Mol Biol Evol 2017;34:1863-1877. [PMID: 28482049 PMCID: PMC5850737 DOI: 10.1093/molbev/msx154] [Citation(s) in RCA: 100] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Refining the Use of Linkage Disequilibrium as a Robust Signature of Selective Sweeps. Genetics 2017;203:1807-25. [PMID: 27516617 DOI: 10.1534/genetics.115.185900] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Accepted: 04/05/2016] [Indexed: 12/12/2022] Open

Abstract

During a selective sweep, characteristic patterns of linkage disequilibrium can arise in the genomic region surrounding a selected locus. These have been used to infer past selective sweeps. However, the recombination rate is known to vary substantially along the genome for many species. We here investigate the effectiveness of current (Kelly's [Formula: see text] and [Formula: see text]) and novel statistics at inferring hard selective sweeps based on linkage disequilibrium distortions under different conditions, including a human-realistic demographic model and recombination rate variation. When the recombination rate is constant, Kelly's [Formula: see text] offers high power, but is outperformed by a novel statistic that we test, which we call [Formula: see text] We also find this statistic to be effective at detecting sweeps from standing variation. When recombination rate fluctuations are included, there is a considerable reduction in power for all linkage disequilibrium-based statistics. However, this can largely be reversed by appropriately controlling for expected linkage disequilibrium using a genetic map. To further test these different methods, we perform selection scans on well-characterized HapMap data, finding that all three statistics-[Formula: see text] Kelly's [Formula: see text] and [Formula: see text]-are able to replicate signals at regions previously identified as selection candidates based on population differentiation or the site frequency spectrum. While [Formula: see text] replicates most candidates when recombination map data are not available, the [Formula: see text] and [Formula: see text] statistics are more successful when recombination rate variation is controlled for. Given both this and their higher power in simulations of selective sweeps, these statistics are preferred when information on local recombination rate variation is available.

Collapse