Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kern AD, Schrider DR. diploS/HIC: An Updated Approach to Classifying Selective Sweeps. G3 (Bethesda) 2018;8:1959-1970. [PMID: 29626082 PMCID: PMC5982824 DOI: 10.1534/g3.118.200262] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 04/04/2018] [Indexed: 11/18/2022]

For:	Kern AD, Schrider DR. diploS/HIC: An Updated Approach to Classifying Selective Sweeps. G3 (Bethesda) 2018;8:1959-1970. [PMID: 29626082 PMCID: PMC5982824 DOI: 10.1534/g3.118.200262] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 04/04/2018] [Indexed: 11/18/2022]

Number

Cited by Other Article(s)

Daron J, Bouafou L, Tennessen JA, Rahola N, Makanga B, Akone-Ella O, Ngangue MF, Longo Pendy NM, Paupy C, Neafsey DE, Fontaine MC, Ayala D. Genomic Signatures of Microgeographic Adaptation in Anopheles coluzzii Along an Anthropogenic Gradient in Gabon. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.16.594472. [PMID: 38798379 PMCID: PMC11118577 DOI: 10.1101/2024.05.16.594472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]

Abstract

Species distributed across heterogeneous environments often evolve locally adapted populations, but understanding how these persist in the presence of homogenizing gene flow remains puzzling. In Gabon, Anopheles coluzzii, a major African malaria mosquito is found along an ecological gradient, including a sylvatic population, away of any human presence. This study identifies into the genomic signatures of local adaptation in populations from distinct environments including the urban area of Libreville, and two proximate sites 10km apart in the La Lopé National Park (LLP), a village and its sylvatic neighborhood. Whole genome re-sequencing of 96 mosquitoes unveiled ∼ 5.7millions high-quality single nucleotide polymorphisms. Coalescent-based demographic analyses suggest an ∼ 8,000-year-old divergence between Libreville and La Lopé populations, followed by a secondary contact ( ∼ 4,000 ybp) resulting in asymmetric effective gene flow. The urban population displayed reduced effective size, evidence of inbreeding, and strong selection pressures for adaptation to urban settings, as suggested by the hard selective sweeps associated with genes involved in detoxification and insecticide resistance. In contrast, the two geographically proximate LLP populations showed larger effective sizes, and distinctive genomic differences in selective signals, notably soft-selective sweeps on the standing genetic variation. Although neutral loci and chromosomal inversions failed to discriminate between LLP populations, our findings support that microgeographic adaptation can swiftly emerge through selection on standing genetic variation despite high gene flow. This study contributes to the growing understanding of evolution of populations in heterogeneous environments amid ongoing gene flow and how major malaria mosquitoes adapt to human.

Significance

Anopheles coluzzii , a major African malaria vector, thrives from humid rainforests to dry savannahs and coastal areas. This ecological success is linked to its close association with domestic settings, with human playing significant roles in driving the recent urban evolution of this mosquito. Our research explores the assumption that these mosquitoes are strictly dependent on human habitats, by conducting whole-genome sequencing on An. coluzzii specimens from urban, rural, and sylvatic sites in Gabon. We found that urban mosquitoes show de novo genetic signatures of human-driven vector control, while rural and sylvatic mosquitoes exhibit distinctive genetic evidence of local adaptations derived from standing genetic variation. Understanding adaptation mechanisms of this mosquito is therefore crucial to predict evolution of vector control strategies.

Collapse

Song H, Chu J, Li W, Li X, Fang L, Han J, Zhao S, Ma Y. A Novel Approach Utilizing Domain Adversarial Neural Networks for the Detection and Classification of Selective Sweeps. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024;11:e2304842. [PMID: 38308186 PMCID: PMC11005742 DOI: 10.1002/advs.202304842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 01/10/2024] [Indexed: 02/04/2024]

Affiliation(s)

Hui Song Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
Jinyu Chu Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
Wangjiao Li Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
Xinyun Li Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China Hubei Hongshan LaboratoryWuhan430070China
Lingzhao Fang Center for Quantitative Genetics and GenomicsAarhus UniversityAarhus8000Denmark
Jianlin Han Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China CAAS‐ILRI Joint Laboratory on Livestock and Forage Genetic ResourcesInstitute of Animal ScienceChinese Academy of Agricultural Sciences (CAAS)Beijing100193China Livestock Genetics ProgramInternational Livestock Research Institute (ILRI)Nairobi00100Kenya
Shuhong Zhao Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China Hubei Hongshan LaboratoryWuhan430070China Lingnan Modern Agricultural Science and Technology Guangdong LaboratoryGuangzhou510642China
Yunlong Ma Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China Hubei Hongshan LaboratoryWuhan430070China Lingnan Modern Agricultural Science and Technology Guangdong LaboratoryGuangzhou510642China

Collapse

Smith CCR, Patterson G, Ralph PL, Kern AD. Estimation of spatial demographic maps from polymorphism data using a neural network. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.15.585300. [PMID: 38559192 PMCID: PMC10980082 DOI: 10.1101/2024.03.15.585300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]

Abstract

A fundamental goal in population genetics is to understand how variation is arrayed over natural landscapes. From first principles we know that common features such as heterogeneous population densities and source sink dynamics of dispersal should shape genetic variation over space, however there are few tools currently available that can deal with these ubiquitous complexities. Geographically referenced single nucleotide polymorphism (SNP) data are increasingly accessible, presenting an opportunity to study genetic variation across geographic space in myriad species. We present a new inference method that uses geo-referenced SNPs and a deep neural network to estimate spatially heterogeneous maps of population density and dispersal rate. Our neural network trains on simulated input and output pairings, where the input consists of genotypes and sampling locations generated from a continuous space population genetic simulator, and the output is a map of the true demographic parameters. We benchmark our tool against existing methods and discuss qualitative differences between the different approaches; in particular, our program is unique because it infers the magnitude of both dispersal and density as well as their variation over the landscape, and it does so using SNP data. Similar methods are constrained to estimating relative migration rates, or require identity by descent blocks as input. We applied our tool to empirical data from North American grey wolves, for which it estimated mostly reasonable demographic parameters, but was affected by incomplete spatial sampling. Genetic based methods like ours complement other, direct methods for estimating past and present demography, and we believe will serve as valuable tools for applications in conservation, ecology, and evolutionary biology. An open source software package implementing our method is available from https://github.com/kr-colab/mapNN.

Collapse

Ray DD, Flagel L, Schrider DR. IntroUNET: Identifying introgressed alleles via semantic segmentation. PLoS Genet 2024;20:e1010657. [PMID: 38377104 PMCID: PMC10906877 DOI: 10.1371/journal.pgen.1010657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 03/01/2024] [Accepted: 01/29/2024] [Indexed: 02/22/2024] Open

Abstract

A growing body of evidence suggests that gene flow between closely related species is a widespread phenomenon. Alleles that introgress from one species into a close relative are typically neutral or deleterious, but sometimes confer a significant fitness advantage. Given the potential relevance to speciation and adaptation, numerous methods have therefore been devised to identify regions of the genome that have experienced introgression. Recently, supervised machine learning approaches have been shown to be highly effective for detecting introgression. One especially promising approach is to treat population genetic inference as an image classification problem, and feed an image representation of a population genetic alignment as input to a deep neural network that distinguishes among evolutionary models (i.e. introgression or no introgression). However, if we wish to investigate the full extent and fitness effects of introgression, merely identifying genomic regions in a population genetic alignment that harbor introgressed loci is insufficient-ideally we would be able to infer precisely which individuals have introgressed material and at which positions in the genome. Here we adapt a deep learning algorithm for semantic segmentation, the task of correctly identifying the type of object to which each individual pixel in an image belongs, to the task of identifying introgressed alleles. Our trained neural network is thus able to infer, for each individual in a two-population alignment, which of those individual's alleles were introgressed from the other population. We use simulated data to show that this approach is highly accurate, and that it can be readily extended to identify alleles that are introgressed from an unsampled "ghost" population, performing comparably to a supervised learning method tailored specifically to that task. Finally, we apply this method to data from Drosophila, showing that it is able to accurately recover introgressed haplotypes from real data. This analysis reveals that introgressed alleles are typically confined to lower frequencies within genic regions, suggestive of purifying selection, but are found at much higher frequencies in a region previously shown to be affected by adaptive introgression. Our method's success in recovering introgressed haplotypes in challenging real-world scenarios underscores the utility of deep learning approaches for making richer evolutionary inferences from genomic data.

Collapse

Thom G, Moreira LR, Batista R, Gehara M, Aleixo A, Smith BT. Genomic Architecture Predicts Tree Topology, Population Structuring, and Demographic History in Amazonian Birds. Genome Biol Evol 2024;16:evae002. [PMID: 38236173 PMCID: PMC10823491 DOI: 10.1093/gbe/evae002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 10/26/2023] [Accepted: 12/12/2023] [Indexed: 01/19/2024] Open

Huang X, Rymbekova A, Dolgova O, Lao O, Kuhlwilm M. Harnessing deep learning for population genetic inference. Nat Rev Genet 2024;25:61-78. [PMID: 37666948 DOI: 10.1038/s41576-023-00636-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2023] [Indexed: 09/06/2023]

Cecil RM, Sugden LA. On convolutional neural networks for selection inference: Revealing the effect of preprocessing on model learning and the capacity to discover novel patterns. PLoS Comput Biol 2023;19:e1010979. [PMID: 38011281 PMCID: PMC10703409 DOI: 10.1371/journal.pcbi.1010979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 12/07/2023] [Accepted: 10/26/2023] [Indexed: 11/29/2023] Open

Mo Z, Siepel A. Domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data. PLoS Genet 2023;19:e1011032. [PMID: 37934781 PMCID: PMC10655966 DOI: 10.1371/journal.pgen.1011032] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 11/17/2023] [Accepted: 10/23/2023] [Indexed: 11/09/2023] Open

Amin MR, Hasan M, Arnab SP, DeGiorgio M. Tensor Decomposition-based Feature Extraction and Classification to Detect Natural Selection from Genomic Data. Mol Biol Evol 2023;40:msad216. [PMID: 37772983 PMCID: PMC10581699 DOI: 10.1093/molbev/msad216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 08/10/2023] [Accepted: 09/14/2023] [Indexed: 09/30/2023] Open

Mo Z, Siepel A. Domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.01.529396. [PMID: 36909514 PMCID: PMC10002701 DOI: 10.1101/2023.03.01.529396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]

Teterina AA, Willis JH, Lukac M, Jovelin R, Cutter AD, Phillips PC. Genomic diversity landscapes in outcrossing and selfing Caenorhabditis nematodes. PLoS Genet 2023;19:e1010879. [PMID: 37585484 PMCID: PMC10461856 DOI: 10.1371/journal.pgen.1010879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 08/28/2023] [Accepted: 07/21/2023] [Indexed: 08/18/2023] Open

Whitehouse LS, Schrider DR. Timesweeper: accurately identifying selective sweeps using population genomic time series. Genetics 2023;224:iyad084. [PMID: 37157914 PMCID: PMC10324941 DOI: 10.1093/genetics/iyad084] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 07/25/2022] [Accepted: 04/25/2023] [Indexed: 05/10/2023] Open

Abstract

Despite decades of research, identifying selective sweeps, the genomic footprints of positive selection, remains a core problem in population genetics. Of the myriad methods that have been developed to tackle this task, few are designed to leverage the potential of genomic time-series data. This is because in most population genetic studies of natural populations, only a single period of time can be sampled. Recent advancements in sequencing technology, including improvements in extracting and sequencing ancient DNA, have made repeated samplings of a population possible, allowing for more direct analysis of recent evolutionary dynamics. Serial sampling of organisms with shorter generation times has also become more feasible due to improvements in the cost and throughput of sequencing. With these advances in mind, here we present Timesweeper, a fast and accurate convolutional neural network-based tool for identifying selective sweeps in data consisting of multiple genomic samplings of a population over time. Timesweeper analyzes population genomic time-series data by first simulating training data under a demographic model appropriate for the data of interest, training a one-dimensional convolutional neural network on said simulations, and inferring which polymorphisms in this serialized data set were the direct target of a completed or ongoing selective sweep. We show that Timesweeper is accurate under multiple simulated demographic and sampling scenarios, identifies selected variants with high resolution, and estimates selection coefficients more accurately than existing methods. In sum, we show that more accurate inferences about natural selection are possible when genomic time-series data are available; such data will continue to proliferate in coming years due to both the sequencing of ancient samples and repeated samplings of extant populations with faster generation times, as well as experimentally evolved populations where time-series data are often generated. Methodological advances such as Timesweeper thus have the potential to help resolve the controversy over the role of positive selection in the genome. We provide Timesweeper as a Python package for use by the community.

Collapse

Zhao H, Souilljee M, Pavlidis P, Alachiotis N. Genome-wide scans for selective sweeps using convolutional neural networks. Bioinformatics 2023;39:i194-i203. [PMID: 37387128 DOI: 10.1093/bioinformatics/btad265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open

Xiao H, Liu Z, Wang N, Long Q, Cao S, Huang G, Liu W, Peng Y, Riaz S, Walker AM, Gaut BS, Zhou Y. Adaptive and maladaptive introgression in grapevine domestication. Proc Natl Acad Sci U S A 2023;120:e2222041120. [PMID: 37276420 PMCID: PMC10268302 DOI: 10.1073/pnas.2222041120] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 04/24/2023] [Indexed: 06/07/2023] Open

Affiliation(s)

Hua Xiao State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518000, China The State Key Laboratory of Genetic Improvement and Germplasm Innovation of Crop Resistance in Arid Desert Regions, Key Laboratory of Genome Research and Genetic Improvement of Xinjiang Characteristic Fruits and Vegetables, Institute of Horticultural Crops, Xinjiang Academy of Agricultural Sciences, Urumqi830091, China
Zhongjie Liu State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518000, China
Nan Wang State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518000, China
Qiming Long State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518000, China
Shuo Cao State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518000, China
Guizhou Huang State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518000, China
Wenwen Liu State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518000, China
Yanling Peng State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518000, China
Summaira Riaz Department of Viticulture and Enology, University of California, Davis, CA95616
Andrew M. Walker Department of Viticulture and Enology, University of California, Davis, CA95616
Brandon S. Gaut Department of Ecology and Evolutionary Biology, University of California, Irvine, CA92697
Yongfeng Zhou State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518000, China State Key Laboratory of Tropical Crop Breeding, Tropical Crops Genetic Resources Institute, Chinese Academy of Tropical Agricultural Sciences, Haikou571101, China

Collapse

Smith CCR, Tittes S, Ralph PL, Kern AD. Dispersal inference from population genetic variation using a convolutional neural network. Genetics 2023;224:iyad068. [PMID: 37052957 PMCID: PMC10213498 DOI: 10.1093/genetics/iyad068] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 02/08/2023] [Accepted: 04/07/2023] [Indexed: 04/14/2023] Open

Abstract

The geographic nature of biological dispersal shapes patterns of genetic variation over landscapes, making it possible to infer properties of dispersal from genetic variation data. Here, we present an inference tool that uses geographically distributed genotype data in combination with a convolutional neural network to estimate a critical population parameter: the mean per-generation dispersal distance. Using extensive simulation, we show that our deep learning approach is competitive with or outperforms state-of-the-art methods, particularly at small sample sizes. In addition, we evaluate varying nuisance parameters during training-including population density, demographic history, habitat size, and sampling area-and show that this strategy is effective for estimating dispersal distance when other model parameters are unknown. Whereas competing methods depend on information about local population density or accurate inference of identity-by-descent tracts, our method uses only single-nucleotide-polymorphism data and the spatial scale of sampling as input. Strikingly, and unlike other methods, our method does not use the geographic coordinates of the genotyped individuals. These features make our method, which we call "disperseNN," a potentially valuable new tool for estimating dispersal distance in nonmodel systems with whole genome data or reduced representation data. We apply disperseNN to 12 different species with publicly available data, yielding reasonable estimates for most species. Importantly, our method estimated consistently larger dispersal distances than mark-recapture calculations in the same species, which may be due to the limited geographic sampling area covered by some mark-recapture studies. Thus genetic tools like ours complement direct methods for improving our understanding of dispersal.

Collapse

Booker WW, Ray DD, Schrider DR. This population does not exist: learning the distribution of evolutionary histories with generative adversarial networks. Genetics 2023;224:iyad063. [PMID: 37067864 PMCID: PMC10213497 DOI: 10.1093/genetics/iyad063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 02/23/2023] [Accepted: 04/05/2023] [Indexed: 04/18/2023] Open

Abstract

Numerous studies over the last decade have demonstrated the utility of machine learning methods when applied to population genetic tasks. More recent studies show the potential of deep-learning methods in particular, which allow researchers to approach problems without making prior assumptions about how the data should be summarized or manipulated, instead learning their own internal representation of the data in an attempt to maximize inferential accuracy. One type of deep neural network, called Generative Adversarial Networks (GANs), can even be used to generate new data, and this approach has been used to create individual artificial human genomes free from privacy concerns. In this study, we further explore the application of GANs in population genetics by designing and training a network to learn the statistical distribution of population genetic alignments (i.e. data sets consisting of sequences from an entire population sample) under several diverse evolutionary histories-the first GAN capable of performing this task. After testing multiple different neural network architectures, we report the results of a fully differentiable Deep-Convolutional Wasserstein GAN with gradient penalty that is capable of generating artificial examples of population genetic alignments that successfully mimic key aspects of the training data, including the site-frequency spectrum, differentiation between populations, and patterns of linkage disequilibrium. We demonstrate consistent training success across various evolutionary models, including models of panmictic and subdivided populations, populations at equilibrium and experiencing changes in size, and populations experiencing either no selection or positive selection of various strengths, all without the need for extensive hyperparameter tuning. Overall, our findings highlight the ability of GANs to learn and mimic population genetic data and suggest future areas where this work can be applied in population genetics research that we discuss herein.

Collapse

Moran RL, Richards EJ, Ornelas-García CP, Gross JB, Donny A, Wiese J, Keene AC, Kowalko JE, Rohner N, McGaugh SE. Selection-driven trait loss in independently evolved cavefish populations. Nat Commun 2023;14:2557. [PMID: 37137902 PMCID: PMC10156726 DOI: 10.1038/s41467-023-37909-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Accepted: 04/03/2023] [Indexed: 05/05/2023] Open

Pandey D, Harris M, Garud NR, Narasimhan VM. Understanding natural selection in Holocene Europe using multi-locus genotype identity scans. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.24.538113. [PMID: 37163039 PMCID: PMC10168228 DOI: 10.1101/2023.04.24.538113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]

Love RR, Sikder JR, Vivero RJ, Matute DR, Schrider DR. Strong Positive Selection in Aedes aegypti and the Rapid Evolution of Insecticide Resistance. Mol Biol Evol 2023;40:msad072. [PMID: 36971242 PMCID: PMC10118305 DOI: 10.1093/molbev/msad072] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 02/13/2023] [Accepted: 03/23/2023] [Indexed: 03/29/2023] Open

Hamid I, Korunes KL, Schrider DR, Goldberg A. Localizing Post-Admixture Adaptive Variants with Object Detection on Ancestry-Painted Chromosomes. Mol Biol Evol 2023;40:msad074. [PMID: 36947126 PMCID: PMC10116606 DOI: 10.1093/molbev/msad074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2022] [Revised: 03/14/2023] [Accepted: 03/20/2023] [Indexed: 03/23/2023] Open

Amin MR, Hasan M, Arnab SP, DeGiorgio M. Tensor decomposition based feature extraction and classification to detect natural selection from genomic data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.27.527731. [PMID: 37034767 PMCID: PMC10081272 DOI: 10.1101/2023.03.27.527731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]

Bouwman AC, Hulsegge I, Hawken RJ, Henshall JM, Veerkamp RF, Schokker D, Kamphuis C. Classifying aneuploidy in genotype intensity data using deep learning. J Anim Breed Genet 2023;140:304-315. [PMID: 36806175 DOI: 10.1111/jbg.12760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 01/10/2023] [Indexed: 02/23/2023]

Abstract

Aneuploidy is the loss or gain of one or more chromosomes. Although it is a rare phenomenon in liveborn individuals, it is observed in livestock breeding populations. These breeding populations are often routinely genotyped and the genotype intensity data from single nucleotide polymorphism (SNP) arrays can be exploited to identify aneuploidy cases. This identification is a time-consuming and costly task, because it is often performed by visual inspection of the data per chromosome, usually done in plots of the intensity data by an expert. Therefore, we wanted to explore the feasibility of automated image classification to replace (part of) the visual detection procedure for any diploid species. The aim of this study was to develop a deep learning Convolutional Neural Network (CNN) classification model based on chromosome level plots of SNP array intensity data that can classify the images into disomic, monosomic and trisomic cases. A multispecies dataset enriched for aneuploidy cases was collected containing genotype intensity data of 3321 disomic, 1759 monosomic and 164 trisomic chromosomes. The final CNN model had an accuracy of 99.9%, overall precision was 1, recall was 0.98 and the F1 score was 0.99 for classifying images from intensity data. The high precision assures that cases detected are most likely true cases, however, some trisomy cases may be missed (the recall of the class trisomic was 0.94). This supervised CNN model performed much better than an unsupervised k-means clustering, which reached an accuracy of 0.73 and had especially difficult to classify trisomic cases correctly. The developed CNN classification model provides high accuracy to classify aneuploidy cases based on images of plotted X and Y genotype intensity values. The classification model can be used as a tool for routine screening in large diploid populations that are genotyped to get a better understanding of the incidence and inheritance, and in addition, avoid anomalies in breeding candidates.

Collapse

Korfmann K, Gaggiotti OE, Fumagalli M. Deep Learning in Population Genetics. Genome Biol Evol 2023;15:6997869. [PMID: 36683406 PMCID: PMC9897193 DOI: 10.1093/gbe/evad008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 12/19/2022] [Accepted: 01/16/2023] [Indexed: 01/24/2023] Open

Harris M, Garud NR. Enrichment of Hard Sweeps on the X Chromosome in Drosophila melanogaster. Mol Biol Evol 2022;40:6955808. [PMID: 36546413 PMCID: PMC9825254 DOI: 10.1093/molbev/msac268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 11/11/2022] [Accepted: 12/05/2022] [Indexed: 12/24/2022] Open

Min J, Gupta M, Desai MM, Weissman DB. Spatial structure alters the site frequency spectrum produced by hitchhiking. Genetics 2022;222:iyac139. [PMID: 36094352 PMCID: PMC9630975 DOI: 10.1093/genetics/iyac139] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 08/24/2022] [Indexed: 11/13/2022] Open

Tang J, Huang M, He S, Zeng J, Zhu H. Uncovering the extensive trade-off between adaptive evolution and disease susceptibility. Cell Rep 2022;40:111351. [PMID: 36103812 DOI: 10.1016/j.celrep.2022.111351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 06/13/2022] [Accepted: 08/23/2022] [Indexed: 11/03/2022] Open

Borowiec ML, Dikow RB, Frandsen PB, McKeeken A, Valentini G, White AE. Deep learning as a tool for ecology and evolution. Methods Ecol Evol 2022. [DOI: 10.1111/2041-210x.13901] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

DeGiorgio M, Szpiech ZA. A spatially aware likelihood test to detect sweeps from haplotype distributions. PLoS Genet 2022;18:e1010134. [PMID: 35404934 PMCID: PMC9022890 DOI: 10.1371/journal.pgen.1010134] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 04/21/2022] [Accepted: 03/04/2022] [Indexed: 01/13/2023] Open

Abstract

The inference of positive selection in genomes is a problem of great interest in evolutionary genomics. By identifying putative regions of the genome that contain adaptive mutations, we are able to learn about the biology of organisms and their evolutionary history. Here we introduce a composite likelihood method that identifies recently completed or ongoing positive selection by searching for extreme distortions in the spatial distribution of the haplotype frequency spectrum along the genome relative to the genome-wide expectation taken as neutrality. Furthermore, the method simultaneously infers two parameters of the sweep: the number of sweeping haplotypes and the “width” of the sweep, which is related to the strength and timing of selection. We demonstrate that this method outperforms the leading haplotype-based selection statistics, though strong signals in low-recombination regions merit extra scrutiny. As a positive control, we apply it to two well-studied human populations from the 1000 Genomes Project and examine haplotype frequency spectrum patterns at the LCT and MHC loci. We also apply it to a data set of brown rats sampled in NYC and identify genes related to olfactory perception. To facilitate use of this method, we have implemented it in user-friendly open source software.

Identifying regions of the genome that contain adaptive variation is of fundamental interest in evolutionary biology, providing insight into an organism’s history and biology. When positive selection is recent or ongoing, we expect to find genomic patterns such as high frequency haplotypes and low genetic diversity in the vicinity of the adaptive locus. Here we develop a statistic to identify these regions based on distortions of the haplotype frequency spectrum from a background distribution. We evaluate the performance of this statistic under numerous realistic settings of interest to empiricists and demonstrate its superior performance relative to other haplotype-based selection statistics. We also apply this statistic to real population-genetic data. As a positive control, we explore two well-studied loci, LCT and MHC, in a European and an African human population that show strong evidence for selection. We also apply this statistic to the genomes of an urban brown rat population, where we uncover evidence for adaptation in olfactory perception genes. We release user-friendly software implementing this statistic.

Collapse

Moran RL, Jaggard JB, Roback EY, Kenzior A, Rohner N, Kowalko JE, Ornelas-García CP, McGaugh SE, Keene AC. Hybridization underlies localized trait evolution in cavefish. iScience 2022;25:103778. [PMID: 35146393 PMCID: PMC8819016 DOI: 10.1016/j.isci.2022.103778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 09/13/2021] [Accepted: 01/12/2022] [Indexed: 11/04/2022] Open

Mueller JC, Botero-Delgadillo E, Espíndola-Hernández P, Gilsenan C, Ewels P, Gruselius J, Kempenaers B. Local selection signals in the genome of Blue tits emphasize regulatory and neuronal evolution. Mol Ecol 2022;31:1504-1514. [PMID: 34995389 DOI: 10.1111/mec.16345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 11/18/2021] [Accepted: 12/15/2021] [Indexed: 11/30/2022]

Hejase HA, Mo Z, Campagna L, Siepel A. A Deep-Learning Approach for Inference of Selective Sweeps from the Ancestral Recombination Graph. Mol Biol Evol 2022;39:msab332. [PMID: 34888675 PMCID: PMC8789311 DOI: 10.1093/molbev/msab332] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open

Montinaro F, Pankratov V, Yelmen B, Pagani L, Mondal M. Revisiting the out of Africa event with a deep-learning approach. Am J Hum Genet 2021;108:2037-2051. [PMID: 34626535 PMCID: PMC8595897 DOI: 10.1016/j.ajhg.2021.09.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 09/09/2021] [Indexed: 10/20/2022] Open

Manthey JD, Klicka J, Spellman GM. The Genomic Signature of Allopatric Speciation in a Songbird Is Shaped by Genome Architecture (Aves: Certhia americana). Genome Biol Evol 2021;13:evab120. [PMID: 34042960 PMCID: PMC8364988 DOI: 10.1093/gbe/evab120] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/24/2021] [Indexed: 12/31/2022] Open

O'Gorman M, Thakur S, Imrie G, Moran RL, Choy S, Sifuentes-Romero I, Bilandžija H, Renner KJ, Duboué E, Rohner N, McGaugh SE, Keene AC, Kowalko JE. Pleiotropic function of the oca2 gene underlies the evolution of sleep loss and albinism in cavefish. Curr Biol 2021;31:3694-3701.e4. [PMID: 34293332 DOI: 10.1016/j.cub.2021.06.077] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2020] [Revised: 03/22/2021] [Accepted: 06/25/2021] [Indexed: 12/29/2022]

DeGiorgio M, Assis R. Learning Retention Mechanisms and Evolutionary Parameters of Duplicate Genes from Their Expression Data. Mol Biol Evol 2021;38:1209-1224. [PMID: 33045078 PMCID: PMC7947822 DOI: 10.1093/molbev/msaa267] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open

Roberts Kingman GA, Vyas DN, Jones FC, Brady SD, Chen HI, Reid K, Milhaven M, Bertino TS, Aguirre WE, Heins DC, von Hippel FA, Park PJ, Kirch M, Absher DM, Myers RM, Di Palma F, Bell MA, Kingsley DM, Veeramah KR. Predicting future from past: The genomic basis of recurrent and rapid stickleback evolution. SCIENCE ADVANCES 2021;7:7/25/eabg5285. [PMID: 34144992 PMCID: PMC8213234 DOI: 10.1126/sciadv.abg5285] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 05/05/2021] [Indexed: 05/30/2023]

Affiliation(s)

Garrett A Roberts Kingman Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305-5329, USA
Deven N Vyas Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY 11794-5245, USA
Felicity C Jones Friedrich Miescher Laboratory of the Max Planck Society, Max-Planck-Ring, Tübingen, Germany
Shannon D Brady Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305-5329, USA
Heidi I Chen Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305-5329, USA
Kerry Reid Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY 11794-5245, USA
Mark Milhaven Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY 11794-5245, USA School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA
Thomas S Bertino Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY 11794-5245, USA
Windsor E Aguirre Department of Biological Sciences, DePaul University, Chicago, IL 60614-3207, USA
David C Heins Department of Ecology and Evolutionary Biology, Tulane University, New Orleans, LA 70118, USA
Frank A von Hippel Department of Community, Environment and Policy, Mel & Enid Zuckerman College of Public Health, University of Arizona, Tucson, AZ 85724, USA
Peter J Park Department of Biology, Farmingdale State College, Farmingdale, NY 11735-1021, USA
Melanie Kirch Friedrich Miescher Laboratory of the Max Planck Society, Max-Planck-Ring, Tübingen, Germany
Devin M Absher HudsonAlpha Institute for Biotechnology, 601 Genome Way, Huntsville, AL 35806, USA
Richard M Myers HudsonAlpha Institute for Biotechnology, 601 Genome Way, Huntsville, AL 35806, USA
Federica Di Palma Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02142, USA
Michael A Bell University of California Museum of Paleontology, University of California, Berkeley, Berkeley, CA 94720, USA.
David M Kingsley Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305-5329, USA. Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
Krishna R Veeramah Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY 11794-5245, USA.

Collapse

Bourgeois YXC, Warren BH. An overview of current population genomics methods for the analysis of whole-genome resequencing data in eukaryotes. Mol Ecol 2021;30:6036-6071. [PMID: 34009688 DOI: 10.1111/mec.15989] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 04/26/2021] [Accepted: 05/11/2021] [Indexed: 01/01/2023]

Harris AM, DeGiorgio M. A Likelihood Approach for Uncovering Selective Sweep Signatures from Haplotype Data. Mol Biol Evol 2021;37:3023-3046. [PMID: 32392293 PMCID: PMC7530616 DOI: 10.1093/molbev/msaa115] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open

Elhaik E, Graur D. On the Unfounded Enthusiasm for Soft Selective Sweeps III: The Supervised Machine Learning Algorithm That Isn't. Genes (Basel) 2021;12:genes12040527. [PMID: 33916341 PMCID: PMC8066263 DOI: 10.3390/genes12040527] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 03/22/2021] [Accepted: 03/29/2021] [Indexed: 12/12/2022] Open

Abstract

In the last 15 years or so, soft selective sweep mechanisms have been catapulted from a curiosity of little evolutionary importance to a ubiquitous mechanism claimed to explain most adaptive evolution and, in some cases, most evolution. This transformation was aided by a series of articles by Daniel Schrider and Andrew Kern. Within this series, a paper entitled “Soft sweeps are the dominant mode of adaptation in the human genome” (Schrider and Kern, Mol. Biol. Evolut. 2017, 34(8), 1863–1877) attracted a great deal of attention, in particular in conjunction with another paper (Kern and Hahn, Mol. Biol. Evolut. 2018, 35(6), 1366–1371), for purporting to discredit the Neutral Theory of Molecular Evolution (Kimura 1968). Here, we address an alleged novelty in Schrider and Kern’s paper, i.e., the claim that their study involved an artificial intelligence technique called supervised machine learning (SML). SML is predicated upon the existence of a training dataset in which the correspondence between the input and output is known empirically to be true. Curiously, Schrider and Kern did not possess a training dataset of genomic segments known a priori to have evolved either neutrally or through soft or hard selective sweeps. Thus, their claim of using SML is thoroughly and utterly misleading. In the absence of legitimate training datasets, Schrider and Kern used: (1) simulations that employ many manipulatable variables and (2) a system of data cherry-picking rivaling the worst excesses in the literature. These two factors, in addition to the lack of negative controls and the irreproducibility of their results due to incomplete methodological detail, lead us to conclude that all evolutionary inferences derived from so-called SML algorithms (e.g., S/HIC) should be taken with a huge shovel of salt.

Collapse

Qi X, An H, Hall TE, Di C, Blischak PD, McKibben MTW, Hao Y, Conant GC, Pires JC, Barker MS. Genes derived from ancient polyploidy have higher genetic diversity and are associated with domestication in Brassica rapa. THE NEW PHYTOLOGIST 2021;230:372-386. [PMID: 33452818 DOI: 10.1111/nph.17194] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Accepted: 11/30/2020] [Indexed: 06/12/2023]

Isildak U, Stella A, Fumagalli M. Distinguishing between recent balancing selection and incomplete sweep using deep neural networks. Mol Ecol Resour 2021;21:2706-2718. [PMID: 33749134 DOI: 10.1111/1755-0998.13379] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 03/01/2021] [Accepted: 03/05/2021] [Indexed: 12/12/2022]

Xue AT, Schrider DR, Kern AD. Discovery of Ongoing Selective Sweeps within Anopheles Mosquito Populations Using Deep Learning. Mol Biol Evol 2021;38:1168-1183. [PMID: 33022051 PMCID: PMC7947845 DOI: 10.1093/molbev/msaa259] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open

Abstract

Identification of partial sweeps, which include both hard and soft sweeps that have not currently reached fixation, provides crucial information about ongoing evolutionary responses. To this end, we introduce partialS/HIC, a deep learning method to discover selective sweeps from population genomic data. partialS/HIC uses a convolutional neural network for image processing, which is trained with a large suite of summary statistics derived from coalescent simulations incorporating population-specific history, to distinguish between completed versus partial sweeps, hard versus soft sweeps, and regions directly affected by selection versus those merely linked to nearby selective sweeps. We perform several simulation experiments under various demographic scenarios to demonstrate partialS/HIC's performance, which exhibits excellent resolution for detecting partial sweeps. We also apply our classifier to whole genomes from eight mosquito populations sampled across sub-Saharan Africa by the Anopheles gambiae 1000 Genomes Consortium, elucidating both continent-wide patterns as well as sweeps unique to specific geographic regions. These populations have experienced intense insecticide exposure over the past two decades, and we observe a strong overrepresentation of sweeps at insecticide resistance loci. Our analysis thus provides a list of candidate adaptive loci that may be relevant to mosquito control efforts. More broadly, our supervised machine learning approach introduces a method to distinguish between completed and partial sweeps, as well as between hard and soft sweeps, under a variety of demographic scenarios. As whole-genome data rapidly accumulate for a greater diversity of organisms, partialS/HIC addresses an increasing demand for useful selection scan tools that can track in-progress evolutionary dynamics.

Collapse

Hübner S, Kantar MB. Tapping Diversity From the Wild: From Sampling to Implementation. FRONTIERS IN PLANT SCIENCE 2021;12:626565. [PMID: 33584776 PMCID: PMC7873362 DOI: 10.3389/fpls.2021.626565] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 01/07/2021] [Indexed: 05/05/2023]

Schneider K, White TJ, Mitchell S, Adams CE, Reeve R, Elmer KR. The pitfalls and virtues of population genetic summary statistics: Detecting selective sweeps in recent divergences. J Evol Biol 2020;34:893-909. [DOI: 10.1111/jeb.13738] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 10/22/2020] [Accepted: 10/24/2020] [Indexed: 12/12/2022]

Kautt AF, Kratochwil CF, Nater A, Machado-Schiaffino G, Olave M, Henning F, Torres-Dowdall J, Härer A, Hulsey CD, Franchini P, Pippel M, Myers EW, Meyer A. Contrasting signatures of genomic divergence during sympatric speciation. Nature 2020;588:106-111. [PMID: 33116308 PMCID: PMC7759464 DOI: 10.1038/s41586-020-2845-0] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Accepted: 07/23/2020] [Indexed: 01/25/2023]

Harpak A, Garud N, Rosenberg NA, Petrov DA, Combs M, Pennings PS, Munshi-South J. Genetic Adaptation in New York City Rats. Genome Biol Evol 2020;13:5991490. [PMID: 33211096 PMCID: PMC7851592 DOI: 10.1093/gbe/evaa247] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/17/2020] [Indexed: 02/06/2023] Open

Genomic islands of differentiation in a rapid avian radiation have been driven by recent selective sweeps. Proc Natl Acad Sci U S A 2020;117:30554-30565. [PMID: 33199636 DOI: 10.1073/pnas.2015987117] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open

Bourgeois Y, Ruggiero RP, Hariyani I, Boissinot S. Disentangling the determinants of transposable elements dynamics in vertebrate genomes using empirical evidences and simulations. PLoS Genet 2020;16:e1009082. [PMID: 33017388 PMCID: PMC7561263 DOI: 10.1371/journal.pgen.1009082] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 10/15/2020] [Accepted: 08/25/2020] [Indexed: 12/14/2022] Open

Schrider DR. Background Selection Does Not Mimic the Patterns of Genetic Diversity Produced by Selective Sweeps. Genetics 2020;216:499-519. [PMID: 32847814 PMCID: PMC7536861 DOI: 10.1534/genetics.120.303469] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 08/04/2020] [Indexed: 12/28/2022] Open

Abstract

It is increasingly evident that natural selection plays a prominent role in shaping patterns of diversity across the genome. The most commonly studied modes of natural selection are positive selection and negative selection, which refer to directional selection for and against derived mutations, respectively. Positive selection can result in hitchhiking events, in which a beneficial allele rapidly replaces all others in the population, creating a valley of diversity around the selected site along with characteristic skews in allele frequencies and linkage disequilibrium among linked neutral polymorphisms. Similarly, negative selection reduces variation not only at selected sites but also at linked sites, a phenomenon called background selection (BGS). Thus, discriminating between these two forces may be difficult, and one might expect efforts to detect hitchhiking to produce an excess of false positives in regions affected by BGS. Here, we examine the similarity between BGS and hitchhiking models via simulation. First, we show that BGS may somewhat resemble hitchhiking in simplistic scenarios in which a region constrained by negative selection is flanked by large stretches of unconstrained sites, echoing previous results. However, this scenario does not mirror the actual spatial arrangement of selected sites across the genome. By performing forward simulations under more realistic scenarios of BGS, modeling the locations of protein-coding and conserved noncoding DNA in real genomes, we show that the spatial patterns of variation produced by BGS rarely mimic those of hitchhiking events. Indeed, BGS is not substantially more likely than neutrality to produce false signatures of hitchhiking. This holds for simulations modeled after both humans and Drosophila, and for several different demographic histories. These results demonstrate that appropriately designed scans for hitchhiking need not consider BGS's impact on false-positive rates. However, we do find evidence that BGS increases the false-negative rate for hitchhiking, an observation that demands further investigation.

Collapse

Mughal MR, Koch H, Huang J, Chiaromonte F, DeGiorgio M. Learning the properties of adaptive regions with functional data analysis. PLoS Genet 2020;16:e1008896. [PMID: 32853200 PMCID: PMC7480868 DOI: 10.1371/journal.pgen.1008896] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 09/09/2020] [Accepted: 05/29/2020] [Indexed: 12/12/2022] Open