1
|
Soni V, Jensen JD. Temporal challenges in detecting balancing selection from population genomic data. G3 (BETHESDA, MD.) 2024; 14:jkae069. [PMID: 38551137 DOI: 10.1093/g3journal/jkae069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 12/21/2023] [Accepted: 03/19/2024] [Indexed: 04/28/2024]
Abstract
The role of balancing selection in maintaining genetic variation remains an open question in population genetics. Recent years have seen numerous studies identifying candidate loci potentially experiencing balancing selection, most predominantly in human populations. There are however numerous alternative evolutionary processes that may leave similar patterns of variation, thereby potentially confounding inference, and the expected signatures of balancing selection additionally change in a temporal fashion. Here we use forward-in-time simulations to quantify expected statistical power to detect balancing selection using both site frequency spectrum- and linkage disequilibrium-based methods under a variety of evolutionarily realistic null models. We find that whilst site frequency spectrum-based methods have little power immediately after a balanced mutation begins segregating, power increases with time since the introduction of the balanced allele. Conversely, linkage disequilibrium-based methods have considerable power whilst the allele is young, and power dissipates rapidly as the time since introduction increases. Taken together, this suggests that site frequency spectrum-based methods are most effective at detecting long-term balancing selection (>25N generations since the introduction of the balanced allele) whilst linkage disequilibrium-based methods are effective over much shorter timescales (<1N generations), thereby leaving a large time frame over which current methods have little power to detect the action of balancing selection. Finally, we investigate the extent to which alternative evolutionary processes may mimic these patterns, and demonstrate the need for caution in attempting to distinguish the signatures of balancing selection from those of both neutral processes (e.g. population structure and admixture) as well as of alternative selective processes (e.g. partial selective sweeps).
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ 85281, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ 85281, USA
| |
Collapse
|
2
|
Dabi A, Schrider DR. Population size rescaling significantly biases outcomes of forward-in-time population genetic simulations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.07.588318. [PMID: 38645049 PMCID: PMC11030438 DOI: 10.1101/2024.04.07.588318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Simulations are an essential tool in all areas of population genetic research, used in tasks such as the validation of theoretical analysis and the study of complex evolutionary models. Forward-in-time simulations are especially flexible, allowing for various types of natural selection, complex genetic architectures, and non-Wright-Fisher dynamics. However, their intense computational requirements can be prohibitive to simulating large populations and genomes. A popular method to alleviate this burden is to scale down the population size by some scaling factor while scaling up the mutation rate, selection coefficients, and recombination rate by the same factor. However, this rescaling approach may in some cases bias simulation results. To investigate the manner and degree to which rescaling impacts simulation outcomes, we carried out simulations with different demographic histories and distributions of fitness effects using several values of the rescaling factor, Q , and compared the deviation of key outcomes (fixation times, fixation probabilities, allele frequencies, and linkage disequilibrium) between the scaled and unscaled simulations. Our results indicate that scaling introduces substantial biases to each of these measured outcomes, even at small values of Q . Moreover, the nature of these effects depends on the evolutionary model and scaling factor being examined. While increasing the scaling factor tends to increase the observed biases, this relationship is not always straightforward, thus it may be difficult to know the impact of scaling on simulation outcomes a priori. However, it appears that for most models, only a small number of replicates was needed to accurately quantify the bias produced by rescaling for a given Q . In summary, while rescaling forward-in-time simulations may be necessary in many cases, researchers should be aware of the rescaling effect's impact on simulation outcomes and consider investigating its magnitude in smaller scale simulations of the desired model(s) before selecting an appropriate value of Q .
Collapse
Affiliation(s)
- Amjad Dabi
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Daniel R. Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA
| |
Collapse
|
3
|
Rajawat D, Ghildiyal K, Sonejita Nayak S, Sharma A, Parida S, Kumar S, Ghosh AK, Singh U, Sivalingam J, Bhushan B, Dutt T, Panigrahi M. Genome-wide mining of diversity and evolutionary signatures revealed selective hotspots in Indian Sahiwal cattle. Gene 2024; 901:148178. [PMID: 38242377 DOI: 10.1016/j.gene.2024.148178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 01/10/2024] [Accepted: 01/16/2024] [Indexed: 01/21/2024]
Abstract
The Sahiwal cattle breed is the best indigenous dairy cattle breed, and it plays a pivotal role in the Indian dairy industry. This is due to its exceptional milk-producing potential, adaptability to local tropical conditions, and its resilience to ticks and diseases. The study aimed to identify selective sweeps and estimate intrapopulation genetic diversity parameters in Sahiwal cattle using ddRAD sequencing-based genotyping data from 82 individuals. After applying filtering criteria, 78,193 high-quality SNPs remained for further analysis. The population exhibited an average minor allele frequency of 0.221 ± 0.119. Genetic diversity metrics, including observed (0.597 ± 0.196) and expected heterozygosity (0.433 ± 0.096), nucleotide diversity (0.327 ± 0.114), the proportion of polymorphic SNPs (0.726), and allelic richness (1.323 ± 0.134), indicated ample genomic diversity within the breed. Furthermore, an effective population size of 74 was observed in the most recent generation. The overall mean linkage disequilibrium (r2) for pairwise SNPs was 0.269 ± 0.057. Moreover, a greater proportion of short Runs of Homozygosity (ROH) segments were observed suggesting that there may be low levels of recent inbreeding in this population. The genomic inbreeding coefficients, computed using different inbreeding estimates (FHOM, FUNI, FROH, and FGROM), ranged from -0.0289 to 0.0725. Subsequently, we found 146 regions undergoing selective sweeps using five distinct statistical tests: Tajima's D, CLR, |iHS|, |iHH12|, and ROH. These regions, located in non-overlapping 500 kb windows, were mapped and revealed various protein-coding genes associated with enhanced immune systems and disease resistance (IFNL3, IRF8, BLK), as well as production traits (NRXN1, PLCE1, GHR). Notably, we identified interleukin 2 (IL2) on Chr17: 35217075-35223276 as a gene linked to tick resistance and uncovered a cluster of genes (HSPA8, UBASH3B, ADAMTS18, CRTAM) associated with heat stress. These findings indicate the evolutionary impact of natural and artificial selection on the environmental adaptation of the Sahiwal cattle population.
Collapse
Affiliation(s)
- Divya Rajawat
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, Bareilly 243122, UP, India
| | - Kanika Ghildiyal
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, Bareilly 243122, UP, India
| | - Sonali Sonejita Nayak
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, Bareilly 243122, UP, India
| | - Anurodh Sharma
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, Bareilly 243122, UP, India
| | - Subhashree Parida
- Pharmacology & Toxicology Division, Indian Veterinary Research Institute, Izatnagar, Bareilly 243122, UP, India
| | - Shive Kumar
- Department of Animal Genetics and Breeding, Govind Ballabh Pant University of Agriculture and Technology, Pantnagar, Uttarakhand, India
| | - A K Ghosh
- Department of Animal Genetics and Breeding, Govind Ballabh Pant University of Agriculture and Technology, Pantnagar, Uttarakhand, India
| | - Umesh Singh
- ICAR Central Institute for Research on Cattle, Meerut, UP, India
| | | | - Bharat Bhushan
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, Bareilly 243122, UP, India
| | - Triveni Dutt
- Livestock Production and Management Section, Indian Veterinary Research Institute, Izatnagar, Bareilly 243122, UP, India
| | - Manjit Panigrahi
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, Bareilly 243122, UP, India.
| |
Collapse
|
4
|
Song H, Chu J, Li W, Li X, Fang L, Han J, Zhao S, Ma Y. A Novel Approach Utilizing Domain Adversarial Neural Networks for the Detection and Classification of Selective Sweeps. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2304842. [PMID: 38308186 PMCID: PMC11005742 DOI: 10.1002/advs.202304842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 01/10/2024] [Indexed: 02/04/2024]
Abstract
The identification and classification of selective sweeps are of great significance for improving the understanding of biological evolution and exploring opportunities for precision medicine and genetic improvement. Here, a domain adaptation sweep detection and classification (DASDC) method is presented to balance the alignment of two domains and the classification performance through a domain-adversarial neural network and its adversarial learning modules. DASDC effectively addresses the issue of mismatch between training data and real genomic data in deep learning models, leading to a significant improvement in its generalization capability, prediction robustness, and accuracy. The DASDC method demonstrates improved identification performance compared to existing methods and excels in classification performance, particularly in scenarios where there is a mismatch between application data and training data. The successful implementation of DASDC in real data of three distinct species highlights its potential as a useful tool for identifying crucial functional genes and investigating adaptive evolutionary mechanisms, particularly with the increasing availability of genomic data.
Collapse
Affiliation(s)
- Hui Song
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
| | - Jinyu Chu
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
| | - Wangjiao Li
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
| | - Xinyun Li
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
- Hubei Hongshan LaboratoryWuhan430070China
| | - Lingzhao Fang
- Center for Quantitative Genetics and GenomicsAarhus UniversityAarhus8000Denmark
| | - Jianlin Han
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
- CAAS‐ILRI Joint Laboratory on Livestock and Forage Genetic ResourcesInstitute of Animal ScienceChinese Academy of Agricultural Sciences (CAAS)Beijing100193China
- Livestock Genetics ProgramInternational Livestock Research Institute (ILRI)Nairobi00100Kenya
| | - Shuhong Zhao
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
- Hubei Hongshan LaboratoryWuhan430070China
- Lingnan Modern Agricultural Science and Technology Guangdong LaboratoryGuangzhou510642China
| | - Yunlong Ma
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
- Hubei Hongshan LaboratoryWuhan430070China
- Lingnan Modern Agricultural Science and Technology Guangdong LaboratoryGuangzhou510642China
| |
Collapse
|
5
|
Ray DD, Flagel L, Schrider DR. IntroUNET: Identifying introgressed alleles via semantic segmentation. PLoS Genet 2024; 20:e1010657. [PMID: 38377104 PMCID: PMC10906877 DOI: 10.1371/journal.pgen.1010657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 03/01/2024] [Accepted: 01/29/2024] [Indexed: 02/22/2024] Open
Abstract
A growing body of evidence suggests that gene flow between closely related species is a widespread phenomenon. Alleles that introgress from one species into a close relative are typically neutral or deleterious, but sometimes confer a significant fitness advantage. Given the potential relevance to speciation and adaptation, numerous methods have therefore been devised to identify regions of the genome that have experienced introgression. Recently, supervised machine learning approaches have been shown to be highly effective for detecting introgression. One especially promising approach is to treat population genetic inference as an image classification problem, and feed an image representation of a population genetic alignment as input to a deep neural network that distinguishes among evolutionary models (i.e. introgression or no introgression). However, if we wish to investigate the full extent and fitness effects of introgression, merely identifying genomic regions in a population genetic alignment that harbor introgressed loci is insufficient-ideally we would be able to infer precisely which individuals have introgressed material and at which positions in the genome. Here we adapt a deep learning algorithm for semantic segmentation, the task of correctly identifying the type of object to which each individual pixel in an image belongs, to the task of identifying introgressed alleles. Our trained neural network is thus able to infer, for each individual in a two-population alignment, which of those individual's alleles were introgressed from the other population. We use simulated data to show that this approach is highly accurate, and that it can be readily extended to identify alleles that are introgressed from an unsampled "ghost" population, performing comparably to a supervised learning method tailored specifically to that task. Finally, we apply this method to data from Drosophila, showing that it is able to accurately recover introgressed haplotypes from real data. This analysis reveals that introgressed alleles are typically confined to lower frequencies within genic regions, suggestive of purifying selection, but are found at much higher frequencies in a region previously shown to be affected by adaptive introgression. Our method's success in recovering introgressed haplotypes in challenging real-world scenarios underscores the utility of deep learning approaches for making richer evolutionary inferences from genomic data.
Collapse
Affiliation(s)
- Dylan D. Ray
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Lex Flagel
- Division of Data Science, Gencove Inc., New York, New York, United States of America
- Department of Plant and Microbial Biology, University of Minnesota, Saint Paul, Minnesota, United States of America
| | - Daniel R. Schrider
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| |
Collapse
|
6
|
Ray DD, Flagel L, Schrider DR. IntroUNET: identifying introgressed alleles via semantic segmentation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.02.07.527435. [PMID: 36865105 PMCID: PMC9979274 DOI: 10.1101/2023.02.07.527435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
A growing body of evidence suggests that gene flow between closely related species is a widespread phenomenon. Alleles that introgress from one species into a close relative are typically neutral or deleterious, but sometimes confer a significant fitness advantage. Given the potential relevance to speciation and adaptation, numerous methods have therefore been devised to identify regions of the genome that have experienced introgression. Recently, supervised machine learning approaches have been shown to be highly effective for detecting introgression. One especially promising approach is to treat population genetic inference as an image classification problem, and feed an image representation of a population genetic alignment as input to a deep neural network that distinguishes among evolutionary models (i.e. introgression or no introgression). However, if we wish to investigate the full extent and fitness effects of introgression, merely identifying genomic regions in a population genetic alignment that harbor introgressed loci is insufficient-ideally we would be able to infer precisely which individuals have introgressed material and at which positions in the genome. Here we adapt a deep learning algorithm for semantic segmentation, the task of correctly identifying the type of object to which each individual pixel in an image belongs, to the task of identifying introgressed alleles. Our trained neural network is thus able to infer, for each individual in a two-population alignment, which of those individual's alleles were introgressed from the other population. We use simulated data to show that this approach is highly accurate, and that it can be readily extended to identify alleles that are introgressed from an unsampled "ghost" population, performing comparably to a supervised learning method tailored specifically to that task. Finally, we apply this method to data from Drosophila, showing that it is able to accurately recover introgressed haplotypes from real data. This analysis reveals that introgressed alleles are typically confined to lower frequencies within genic regions, suggestive of purifying selection, but are found at much higher frequencies in a region previously shown to be affected by adaptive introgression. Our method's success in recovering introgressed haplotypes in challenging real-world scenarios underscores the utility of deep learning approaches for making richer evolutionary inferences from genomic data.
Collapse
Affiliation(s)
- Dylan D. Ray
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Lex Flagel
- Division of Data Science, Gencove Inc., New York, NY 11101, USA
- Department of Plant and Microbial Biology, University of Minnesota, St Paul MN, 55108, USA
| | - Daniel R. Schrider
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
7
|
Gao Z. Unveiling recent and ongoing adaptive selection in human populations. PLoS Biol 2024; 22:e3002469. [PMID: 38236800 PMCID: PMC10796035 DOI: 10.1371/journal.pbio.3002469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2024] Open
Abstract
Genome-wide scans for signals of selection have become a routine part of the analysis of population genomic variation datasets and have resulted in compelling evidence of selection during recent human evolution. This Essay spotlights methodological innovations that have enabled the detection of selection over very recent timescales, even in contemporary human populations. By harnessing large-scale genomic and phenotypic datasets, these new methods use different strategies to uncover connections between genotype, phenotype, and fitness. This Essay outlines the rationale and key findings of each strategy, discusses challenges in interpretation, and describes opportunities to improve detection and understanding of ongoing selection in human populations.
Collapse
Affiliation(s)
- Ziyue Gao
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
8
|
Panigrahi M, Rajawat D, Nayak SS, Ghildiyal K, Sharma A, Jain K, Lei C, Bhushan B, Mishra BP, Dutt T. Landmarks in the history of selective sweeps. Anim Genet 2023; 54:667-688. [PMID: 37710403 DOI: 10.1111/age.13355] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 08/28/2023] [Indexed: 09/16/2023]
Abstract
Half a century ago, a seminal article on the hitchhiking effect by Smith and Haigh inaugurated the concept of the selection signature. Selective sweeps are characterised by the rapid spread of an advantageous genetic variant through a population and hence play an important role in shaping evolution and research on genetic diversity. The process by which a beneficial allele arises and becomes fixed in a population, leading to a increase in the frequency of other linked alleles, is known as genetic hitchhiking or genetic draft. Kimura's neutral theory and hitchhiking theory are complementary, with Kimura's neutral evolution as the 'null model' and positive selection as the 'signal'. Both are widely accepted in evolution, especially with genomics enabling precise measurements. Significant advances in genomic technologies, such as next-generation sequencing, high-density SNP arrays and powerful bioinformatics tools, have made it possible to systematically investigate selection signatures in a variety of species. Although the history of selection signatures is relatively recent, progress has been made in the last two decades, owing to the increasing availability of large-scale genomic data and the development of computational methods. In this review, we embark on a journey through the history of research on selective sweeps, ranging from early theoretical work to recent empirical studies that utilise genomic data.
Collapse
Affiliation(s)
- Manjit Panigrahi
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Divya Rajawat
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | | | - Kanika Ghildiyal
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Anurodh Sharma
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Karan Jain
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Chuzhao Lei
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, China
| | - Bharat Bhushan
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Bishnu Prasad Mishra
- Division of Animal Biotechnology, ICAR-National Bureau of Animal Genetic Resources, Karnal, India
| | - Triveni Dutt
- Livestock Production and Management Section, Indian Veterinary Research Institute, Bareilly, India
| |
Collapse
|
9
|
Soni V, Johri P, Jensen JD. Evaluating power to detect recurrent selective sweeps under increasingly realistic evolutionary null models. Evolution 2023; 77:2113-2127. [PMID: 37395482 PMCID: PMC10547124 DOI: 10.1093/evolut/qpad120] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 06/15/2023] [Accepted: 06/30/2023] [Indexed: 07/04/2023]
Abstract
The detection of selective sweeps from population genomic data often relies on the premise that the beneficial mutations in question have fixed very near the sampling time. As it has been previously shown that the power to detect a selective sweep is strongly dependent on the time since fixation as well as the strength of selection, it is naturally the case that strong, recent sweeps leave the strongest signatures. However, the biological reality is that beneficial mutations enter populations at a rate, one that partially determines the mean wait time between sweep events and hence their age distribution. An important question thus remains about the power to detect recurrent selective sweeps when they are modeled by a realistic mutation rate and as part of a realistic distribution of fitness effects, as opposed to a single, recent, isolated event on a purely neutral background as is more commonly modeled. Here we use forward-in-time simulations to study the performance of commonly used sweep statistics, within the context of more realistic evolutionary baseline models incorporating purifying and background selection, population size change, and mutation and recombination rate heterogeneity. Results demonstrate the important interplay of these processes, necessitating caution when interpreting selection scans; specifically, false-positive rates are in excess of true-positive across much of the evaluated parameter space, and selective sweeps are often undetectable unless the strength of selection is exceptionally strong.
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| |
Collapse
|
10
|
Mascarenhas R, Meirelles PM, Batalha-Filho H. Urbanization drives adaptive evolution in a Neotropical bird. Curr Zool 2023; 69:607-619. [PMID: 37637315 PMCID: PMC10449428 DOI: 10.1093/cz/zoac066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 08/16/2022] [Indexed: 08/29/2023] Open
Abstract
Urbanization has dramatic impacts on natural habitats and such changes may potentially drive local adaptation of urban populations. Behavioral change has been specifically shown to facilitate the fast adaptation of birds to changing environments, but few studies have investigated the genetic mechanisms of this process. Such investigations could provide insights into questions about both evolutionary theory and management of urban populations. In this study, we investigated whether local adaptation has occurred in urban populations of a Neotropical bird species, Coereba flaveola, specifically addressing whether observed behavioral adaptations are correlated to genetic signatures of natural selection. To answer this question, we sampled 24 individuals in urban and rural environments, and searched for selected loci through a genome-scan approach based on RADseq genomic data, generated and assembled using a reference genome for the species. We recovered 46 loci as putative selection outliers, and 30 of them were identified as associated with biological processes possibly related to urban adaptation, such as the regulation of energetic metabolism, regulation of genetic expression, and changes in the immunological system. Moreover, genes involved in the development of the nervous system showed signatures of selection, suggesting a link between behavioral and genetic adaptations. Our findings, in conjunction with similar results in previous studies, support the idea that cities provide a similar selective pressure on urban populations and that behavioral plasticity may be enhanced through genetic changes in urban populations.
Collapse
Affiliation(s)
- Rilquer Mascarenhas
- National Institute of Science and Technology in Interdisciplinary and Transdisciplinary Studies in Ecology and Evolution (INCT IN-TREE), Instituto de Biologia, Universidade Federal da Bahia, 40170-115 Salvador, Bahia, Brazil
| | - Pedro Milet Meirelles
- National Institute of Science and Technology in Interdisciplinary and Transdisciplinary Studies in Ecology and Evolution (INCT IN-TREE), Instituto de Biologia, Universidade Federal da Bahia, 40170-115 Salvador, Bahia, Brazil
| | - Henrique Batalha-Filho
- National Institute of Science and Technology in Interdisciplinary and Transdisciplinary Studies in Ecology and Evolution (INCT IN-TREE), Instituto de Biologia, Universidade Federal da Bahia, 40170-115 Salvador, Bahia, Brazil
| |
Collapse
|
11
|
Soni V, Johri P, Jensen JD. Evaluating power to detect recurrent selective sweeps under increasingly realistic evolutionary null models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.15.545166. [PMID: 37398347 PMCID: PMC10312679 DOI: 10.1101/2023.06.15.545166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
The detection of selective sweeps from population genomic data often relies on the premise that the beneficial mutations in question have fixed very near the sampling time. As it has been previously shown that the power to detect a selective sweep is strongly dependent on the time since fixation as well as the strength of selection, it is naturally the case that strong, recent sweeps leave the strongest signatures. However, the biological reality is that beneficial mutations enter populations at a rate, one that partially determines the mean wait time between sweep events and hence their age distribution. An important question thus remains about the power to detect recurrent selective sweeps when they are modelled by a realistic mutation rate and as part of a realistic distribution of fitness effects (DFE), as opposed to a single, recent, isolated event on a purely neutral background as is more commonly modelled. Here we use forward-in-time simulations to study the performance of commonly used sweep statistics, within the context of more realistic evolutionary baseline models incorporating purifying and background selection, population size change, and mutation and recombination rate heterogeneity. Results demonstrate the important interplay of these processes, necessitating caution when interpreting selection scans; specifically, false positive rates are in excess of true positive across much of the evaluated parameter space, and selective sweeps are often undetectable unless the strength of selection is exceptionally strong. Teaser Text Outlier-based genomic scans have proven a popular approach for identifying loci that have potentially experienced recent positive selection. However, it has previously been shown that an evolutionarily appropriate baseline model that incorporates non-equilibrium population histories, purifying and background selection, and variation in mutation and recombination rates is necessary to reduce often extreme false positive rates when performing genomic scans. Here we evaluate the power to detect recurrent selective sweeps using common SFS-based and haplotype-based methods under these increasingly realistic models. We find that while these appropriate evolutionary baselines are essential to reduce false positive rates, the power to accurately detect recurrent selective sweeps is generally low across much of the biologically relevant parameter space.
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
- Present address: Department of Biology, Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | | |
Collapse
|
12
|
Ahlquist KD, Sugden LA, Ramachandran S. Enabling interpretable machine learning for biological data with reliability scores. PLoS Comput Biol 2023; 19:e1011175. [PMID: 37235578 PMCID: PMC10249903 DOI: 10.1371/journal.pcbi.1011175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 06/08/2023] [Accepted: 05/10/2023] [Indexed: 05/28/2023] Open
Abstract
Machine learning tools have proven useful across biological disciplines, allowing researchers to draw conclusions from large datasets, and opening up new opportunities for interpreting complex and heterogeneous biological data. Alongside the rapid growth of machine learning, there have also been growing pains: some models that appear to perform well have later been revealed to rely on features of the data that are artifactual or biased; this feeds into the general criticism that machine learning models are designed to optimize model performance over the creation of new biological insights. A natural question arises: how do we develop machine learning models that are inherently interpretable or explainable? In this manuscript, we describe the SWIF(r) reliability score (SRS), a method building on the SWIF(r) generative framework that reflects the trustworthiness of the classification of a specific instance. The concept of the reliability score has the potential to generalize to other machine learning methods. We demonstrate the utility of the SRS when faced with common challenges in machine learning including: 1) an unknown class present in testing data that was not present in training data, 2) systemic mismatch between training and testing data, and 3) instances of testing data that have missing values for some attributes. We explore these applications of the SRS using a range of biological datasets, from agricultural data on seed morphology, to 22 quantitative traits in the UK Biobank, and population genetic simulations and 1000 Genomes Project data. With each of these examples, we demonstrate how the SRS can allow researchers to interrogate their data and training approach thoroughly, and to pair their domain-specific knowledge with powerful machine-learning frameworks. We also compare the SRS to related tools for outlier and novelty detection, and find that it has comparable performance, with the advantage of being able to operate when some data are missing. The SRS, and the broader discussion of interpretable scientific machine learning, will aid researchers in the biological machine learning space as they seek to harness the power of machine learning without sacrificing rigor and biological insight.
Collapse
Affiliation(s)
- K. D. Ahlquist
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, Rhode Island, United States of America
| | - Lauren A. Sugden
- Department of Mathematics and Computer Science, Duquesne University, Pittsburgh, Pennsylvania, United States of America
| | - Sohini Ramachandran
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Department of Ecology, Evolution and Organismal Biology, Brown University, Providence, Rhode Island, United States of America
- Data Science Initiative, Brown University, Providence, Rhode Island, United States of America
| |
Collapse
|
13
|
Korfmann K, Gaggiotti OE, Fumagalli M. Deep Learning in Population Genetics. Genome Biol Evol 2023; 15:6997869. [PMID: 36683406 PMCID: PMC9897193 DOI: 10.1093/gbe/evad008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 12/19/2022] [Accepted: 01/16/2023] [Indexed: 01/24/2023] Open
Abstract
Population genetics is transitioning into a data-driven discipline thanks to the availability of large-scale genomic data and the need to study increasingly complex evolutionary scenarios. With likelihood and Bayesian approaches becoming either intractable or computationally unfeasible, machine learning, and in particular deep learning, algorithms are emerging as popular techniques for population genetic inferences. These approaches rely on algorithms that learn non-linear relationships between the input data and the model parameters being estimated through representation learning from training data sets. Deep learning algorithms currently employed in the field comprise discriminative and generative models with fully connected, convolutional, or recurrent layers. Additionally, a wide range of powerful simulators to generate training data under complex scenarios are now available. The application of deep learning to empirical data sets mostly replicates previous findings of demography reconstruction and signals of natural selection in model organisms. To showcase the feasibility of deep learning to tackle new challenges, we designed a branched architecture to detect signals of recent balancing selection from temporal haplotypic data, which exhibited good predictive performance on simulated data. Investigations on the interpretability of neural networks, their robustness to uncertain training data, and creative representation of population genetic data, will provide further opportunities for technological advancements in the field.
Collapse
Affiliation(s)
- Kevin Korfmann
- Professorship for Population Genetics, Department of Life Science Systems, Technical University of Munich, Germany
| | - Oscar E Gaggiotti
- Centre for Biological Diversity, Sir Harold Mitchell Building, University of St Andrews, Fife KY16 9TF, UK
| | | |
Collapse
|
14
|
Schlichta F, Moinet A, Peischl S, Excoffier L. The Impact of Genetic Surfing on Neutral Genomic Diversity. Mol Biol Evol 2022; 39:msac249. [PMID: 36403964 PMCID: PMC9703594 DOI: 10.1093/molbev/msac249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Range expansions have been common in the history of most species. Serial founder effects and subsequent population growth at expansion fronts typically lead to a loss of genomic diversity along the expansion axis. A frequent consequence is the phenomenon of "gene surfing," where variants located near the expanding front can reach high frequencies or even fix in newly colonized territories. Although gene surfing events have been characterized thoroughly for a specific locus, their effects on linked genomic regions and the overall patterns of genomic diversity have been little investigated. In this study, we simulated the evolution of whole genomes during several types of 1D and 2D range expansions differing by the extent of migration, founder events, and recombination rates. We focused on the characterization of local dips of diversity, or "troughs," taken as a proxy for surfing events. We find that, for a given recombination rate, once we consider the amount of diversity lost since the beginning of the expansion, it is possible to predict the initial evolution of trough density and their average width irrespective of the expansion condition. Furthermore, when recombination rates vary across the genome, we find that troughs are over-represented in regions of low recombination. Therefore, range expansions can leave local and global genomic signatures often interpreted as evidence of past selective events. Given the generality of our results, they could be used as a null model for species having gone through recent expansions, and thus be helpful to correctly interpret many evolutionary biology studies.
Collapse
Affiliation(s)
- Flávia Schlichta
- Computational and Molecular Population Genetics lab, Institute of Ecology and Evolution, University of Bern, Baltzerstrasse 6, 3012 Bern, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Antoine Moinet
- Computational and Molecular Population Genetics lab, Institute of Ecology and Evolution, University of Bern, Baltzerstrasse 6, 3012 Bern, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Interfaculty Bioinformatics Unit, University of Bern, Baltzerstrasse 6, 3012 Bern, Switzerland
| | - Stephan Peischl
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Interfaculty Bioinformatics Unit, University of Bern, Baltzerstrasse 6, 3012 Bern, Switzerland
| | - Laurent Excoffier
- Computational and Molecular Population Genetics lab, Institute of Ecology and Evolution, University of Bern, Baltzerstrasse 6, 3012 Bern, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
15
|
Burny C, Nolte V, Dolezal M, Schlötterer C. Genome-wide selection signatures reveal widespread synergistic effects of two different stressors in Drosophila melanogaster. Proc Biol Sci 2022; 289:20221857. [PMID: 36259211 PMCID: PMC9579754 DOI: 10.1098/rspb.2022.1857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Experimental evolution combined with whole-genome sequencing (evolve and resequence (E&R)) is a powerful approach to study the adaptive architecture of selected traits. Nevertheless, so far the focus has been on the selective response triggered by a single stressor. Building on the highly parallel selection response of founder populations with reduced variation, we evaluated how the presence of a second stressor affects the genomic selection response. After 20 generations of adaptation to laboratory conditions at either 18°C or 29°C, strong genome-wide selection signatures were observed. Only 38% of the selection signatures can be attributed to laboratory adaptation (no difference between temperature regimes). The remaining selection responses are either caused by temperature-specific effects, or reflect the joint effects of temperature and laboratory adaptation (same direction, but the magnitude differs between temperatures). The allele frequency changes resulting from the combined effects of temperature and laboratory adaptation were more extreme in the hot environment for 83% of the affected genomic regions-indicating widespread synergistic effects of the two stressors. We conclude that E&R with reduced genetic variation is a powerful approach to study genome-wide fitness consequences driven by the combined effects of multiple environmental factors.
Collapse
Affiliation(s)
- Claire Burny
- Institut für Populationsgenetik, Vetmeduni Vienna, Veterinärplatz 1, Vienna 1210, Austria.,Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Vienna 1210, Austria
| | - Viola Nolte
- Institut für Populationsgenetik, Vetmeduni Vienna, Veterinärplatz 1, Vienna 1210, Austria
| | - Marlies Dolezal
- Plattform Bioinformatik und Biostatistik, Vetmeduni Vienna, Vienna 1210, Austria
| | - Christian Schlötterer
- Institut für Populationsgenetik, Vetmeduni Vienna, Veterinärplatz 1, Vienna 1210, Austria
| |
Collapse
|
16
|
Gómez-Espejo AL, Sansaloni CP, Burgueño J, Toledo FH, Benavides-Mendoza A, Reyes-Valdés MH. Worldwide Selection Footprints for Drought and Heat in Bread Wheat (Triticum aestivum L.). PLANTS 2022; 11:plants11172289. [PMID: 36079671 PMCID: PMC9460392 DOI: 10.3390/plants11172289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 08/18/2022] [Accepted: 08/29/2022] [Indexed: 11/16/2022]
Abstract
Genome–environment Associations (GEA) or Environmental Genome-Wide Association scans (EnvGWAS) have been poorly applied for studying the genomics of adaptive traits in bread wheat landraces (Triticum aestivum L.). We analyzed 990 landraces and seven climatic variables (mean temperature, maximum temperature, precipitation, precipitation seasonality, heat index of mean temperature, heat index of maximum temperature, and drought index) in GEA using the FarmCPU approach with GAPIT. Historical temperature and precipitation values were obtained as monthly averages from 1970 to 2000. Based on 26,064 high-quality SNP loci, landraces were classified into ten subpopulations exhibiting high genetic differentiation. The GEA identified 59 SNPs and nearly 89 protein-encoding genes involved in the response processes to abiotic stress. Genes related to biosynthesis and signaling are mainly mediated by auxins, abscisic acid (ABA), ethylene (ET), salicylic acid (SA), and jasmonates (JA), which are known to operate together in modulation responses to heat stress and drought in plants. In addition, we identified some proteins associated with the response and tolerance to stress by high temperatures, water deficit, and cell wall functions. The results provide candidate regions for selection aimed to improve drought and heat tolerance in bread wheat and provide insights into the genetic mechanisms involved in adaptation to extreme environments.
Collapse
Affiliation(s)
- Ana L. Gómez-Espejo
- Programa de Doctorado en Recursos Fitogenéticos para Zonas Áridas, Universidad Autónoma Agraria Antonio Narro (UAAAN), Saltillo 25315, Mexico or
| | | | - Juan Burgueño
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco 56237, Mexico
| | - Fernando H. Toledo
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco 56237, Mexico
| | - Adalberto Benavides-Mendoza
- Programa de Doctorado en Recursos Fitogenéticos para Zonas Áridas, Universidad Autónoma Agraria Antonio Narro (UAAAN), Saltillo 25315, Mexico or
| | - M. Humberto Reyes-Valdés
- Programa de Doctorado en Recursos Fitogenéticos para Zonas Áridas, Universidad Autónoma Agraria Antonio Narro (UAAAN), Saltillo 25315, Mexico or
- Correspondence:
| |
Collapse
|
17
|
Kumar H, Panigrahi M, Panwar A, Rajawat D, Nayak SS, Saravanan KA, Kaisa K, Parida S, Bhushan B, Dutt T. Machine-Learning Prospects for Detecting Selection Signatures Using Population Genomics Data. J Comput Biol 2022; 29:943-960. [PMID: 35639362 DOI: 10.1089/cmb.2021.0447] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Natural selection has been given a lot of attention because it relates to the adaptation of populations to their environments, both biotic and abiotic. An allele is selected when it is favored by natural selection. Consequently, the favored allele increases in frequency in the population and neighboring linked variation diminishes, causing so-called selective sweeps. A high-throughput genomic sequence allows one to disentangle the evolutionary forces at play in populations. With the development of high-throughput genome sequencing technologies, it has become easier to detect these selective sweeps/selection signatures. Various methods can be used to detect selective sweeps, from simple implementations using summary statistics to complex statistical approaches. One of the important problems of these statistical models is the potential to provide inaccurate results when their assumptions are violated. The use of machine learning (ML) in population genetics has been introduced as an alternative method of detecting selection by treating the problem of detecting selection signatures as a classification problem. Since the availability of population genomics data is increasing, researchers may incorporate ML into these statistical models to infer signatures of selection with higher predictive accuracy and better resolution. This article describes how ML can be used to aid in detecting and studying natural selection patterns using population genomic data.
Collapse
Affiliation(s)
- Harshit Kumar
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Manjit Panigrahi
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Anuradha Panwar
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Divya Rajawat
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Sonali Sonejita Nayak
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - K A Saravanan
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Kaiho Kaisa
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Subhashree Parida
- Divisions of Pharmacology and Toxicology, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Bharat Bhushan
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Triveni Dutt
- Livestock Production and Management Section, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| |
Collapse
|
18
|
Pettie N, Llopart A, Comeron JM. Meiotic, genomic and evolutionary properties of crossover distribution in Drosophila yakuba. PLoS Genet 2022; 18:e1010087. [PMID: 35320272 PMCID: PMC8979470 DOI: 10.1371/journal.pgen.1010087] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 04/04/2022] [Accepted: 02/09/2022] [Indexed: 12/14/2022] Open
Abstract
The number and location of crossovers across genomes are highly regulated during meiosis, yet the key components controlling them are fast evolving, hindering our understanding of the mechanistic causes and evolutionary consequences of changes in crossover rates. Drosophila melanogaster has been a model species to study meiosis for more than a century, with an available high-resolution crossover map that is, nonetheless, missing for closely related species, thus preventing evolutionary context. Here, we applied a novel and highly efficient approach to generate whole-genome high-resolution crossover maps in D. yakuba to tackle multiple questions that benefit from being addressed collectively within an appropriate phylogenetic framework, in our case the D. melanogaster species subgroup. The genotyping of more than 1,600 individual meiotic events allowed us to identify several key distinct properties relative to D. melanogaster. We show that D. yakuba, in addition to higher crossover rates than D. melanogaster, has a stronger centromere effect and crossover assurance than any Drosophila species analyzed to date. We also report the presence of an active crossover-associated meiotic drive mechanism for the X chromosome that results in the preferential inclusion in oocytes of chromatids with crossovers. Our evolutionary and genomic analyses suggest that the genome-wide landscape of crossover rates in D. yakuba has been fairly stable and captures a significant signal of the ancestral crossover landscape for the whole D. melanogaster subgroup, even informative for the D. melanogaster lineage. Contemporary crossover rates in D. melanogaster, on the other hand, do not recapitulate ancestral crossovers landscapes. As a result, the temporal stability of crossover landscapes observed in D. yakuba makes this species an ideal system for applying population genetic models of selection and linkage, given that these models assume temporal constancy in linkage effects. Our studies emphasize the importance of generating multiple high-resolution crossover rate maps within a coherent phylogenetic context to broaden our understanding of crossover control during meiosis and to improve studies on the evolutionary consequences of variable crossover rates across genomes and time.
Collapse
Affiliation(s)
- Nikale Pettie
- Interdisciplinary Program in Genetics, University of Iowa, Iowa City, Iowa, United States of America
| | - Ana Llopart
- Interdisciplinary Program in Genetics, University of Iowa, Iowa City, Iowa, United States of America
- Department of Biology, University of Iowa, Iowa City, Iowa, United States of America
| | - Josep M. Comeron
- Interdisciplinary Program in Genetics, University of Iowa, Iowa City, Iowa, United States of America
- Department of Biology, University of Iowa, Iowa City, Iowa, United States of America
- * E-mail:
| |
Collapse
|
19
|
Fine human genetic map based on UK10K data set. Hum Genet 2022; 141:273-281. [DOI: 10.1007/s00439-021-02415-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2021] [Accepted: 12/03/2021] [Indexed: 11/04/2022]
|
20
|
Vecchyo DOD, Lohmueller KE, Novembre J. Haplotype-based inference of the distribution of fitness effects. Genetics 2022; 220:6501446. [PMID: 35100400 PMCID: PMC8982047 DOI: 10.1093/genetics/iyac002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 12/18/2021] [Indexed: 11/13/2022] Open
Abstract
Abstract
Recent genome sequencing studies with large sample sizes in humans have discovered a vast quantity of low-frequency variants, providing an important source of information to analyze how selection is acting on human genetic variation. In order to estimate the strength of natural selection acting on low-frequency variants, we have developed a likelihood-based method that uses the lengths of pairwise identity-by-state between haplotypes carrying low-frequency variants. We show that in some non-equilibrium populations (such as those that have had recent population expansions) it is possible to distinguish between positive or negative selection acting on a set of variants. With our new framework, one can infer a fixed selection intensity acting on a set of variants at a particular frequency, or a distribution of selection coefficients for standing variants and new mutations. We show an application of our method to the UK10K phased haplotype dataset of individuals.
Collapse
Affiliation(s)
- Diego Ortega-Del Vecchyo
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Juriquilla, Querétaro, 76230, México
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, California, 90095, United States of America
| | - Kirk E Lohmueller
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, California, 90095, United States of America
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, 90095, United States of America
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, 90095, United States of America
| | - John Novembre
- Department of Human Genetics, University of Chicago, Chicago, Illinois, 60637, United States of America
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, 60637, United States of America
| |
Collapse
|
21
|
Stephan W. The classical hitchhiking model with continuous mutational pressure and purifying selection. Ecol Evol 2021; 11:15896-15904. [PMID: 34824798 PMCID: PMC8601925 DOI: 10.1002/ece3.8259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Revised: 08/24/2021] [Accepted: 10/08/2021] [Indexed: 11/14/2022] Open
Abstract
Detecting selective sweeps driven by strong positive selection and localizing the targets of selection in the genome play a major role in modern population genetics and genomics. Most of these analyses are based on the classical model of genetic hitchhiking proposed by Maynard Smith and Haigh (1974, Genetical Research, 23, 23). Here, we consider extensions of the classical two-locus model. Introducing mutation at the strongly selected site, we analyze the conditions under which soft sweeps may arise. We identify a new parameter (the ratio of the beneficial mutation rate to the selection coefficient) that characterizes the occurrence of multiple-origin soft sweeps. Furthermore, we quantify the hitchhiking effect when the polymorphism at the linked locus is not neutral but maintained in a mutation-selection balance. In this case, we find a smaller relative reduction of heterozygosity at the linked site than for a neutral polymorphism. In our analysis, we use a semi-deterministic approach; i.e., we analyze the frequency process of the beneficial allele in an infinitely large population when its frequency is above a certain threshold; however, for very small frequencies in the initial phase after the onset of selection we rely on diffusion theory.
Collapse
Affiliation(s)
- Wolfgang Stephan
- Leibniz‐Institute for Evolution and Biodiversity ScienceNatural History MuseumBerlinGermany
| |
Collapse
|
22
|
Isildak U, Stella A, Fumagalli M. Distinguishing between recent balancing selection and incomplete sweep using deep neural networks. Mol Ecol Resour 2021; 21:2706-2718. [PMID: 33749134 DOI: 10.1111/1755-0998.13379] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 03/01/2021] [Accepted: 03/05/2021] [Indexed: 12/12/2022]
Abstract
Balancing selection is an important adaptive mechanism underpinning a wide range of phenotypes. Despite its relevance, the detection of recent balancing selection from genomic data is challenging as its signatures are qualitatively similar to those left by ongoing positive selection. In this study, we developed and implemented two deep neural networks and tested their performance to predict loci under recent selection, either due to balancing selection or incomplete sweep, from population genomic data. Specifically, we generated forward-in-time simulations to train and test an artificial neural network (ANN) and a convolutional neural network (CNN). ANN received as input multiple summary statistics calculated on the locus of interest, while CNN was applied directly on the matrix of haplotypes. We found that both architectures have high accuracy to identify loci under recent selection. CNN generally outperformed ANN to distinguish between signals of balancing selection and incomplete sweep and was less affected by incorrect training data. We deployed both trained networks on neutral genomic regions in European populations and demonstrated a lower false-positive rate for CNN than ANN. We finally deployed CNN within the MEFV gene region and identified several common variants predicted to be under incomplete sweep in a European population. Notably, two of these variants are functional changes and could modulate susceptibility to familial Mediterranean fever, possibly as a consequence of past adaptation to pathogens. In conclusion, deep neural networks were able to characterize signals of selection on intermediate frequency variants, an analysis currently inaccessible by commonly used strategies.
Collapse
Affiliation(s)
- Ulas Isildak
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Alessandro Stella
- Laboratory of Medical Genetics, Department of Biomedical Sciences and Human Oncology, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Matteo Fumagalli
- Department of Life Sciences, Silwood Park Campus, Imperial College London, London, UK
| |
Collapse
|
23
|
Xue AT, Schrider DR, Kern AD. Discovery of Ongoing Selective Sweeps within Anopheles Mosquito Populations Using Deep Learning. Mol Biol Evol 2021; 38:1168-1183. [PMID: 33022051 PMCID: PMC7947845 DOI: 10.1093/molbev/msaa259] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Identification of partial sweeps, which include both hard and soft sweeps that have not currently reached fixation, provides crucial information about ongoing evolutionary responses. To this end, we introduce partialS/HIC, a deep learning method to discover selective sweeps from population genomic data. partialS/HIC uses a convolutional neural network for image processing, which is trained with a large suite of summary statistics derived from coalescent simulations incorporating population-specific history, to distinguish between completed versus partial sweeps, hard versus soft sweeps, and regions directly affected by selection versus those merely linked to nearby selective sweeps. We perform several simulation experiments under various demographic scenarios to demonstrate partialS/HIC's performance, which exhibits excellent resolution for detecting partial sweeps. We also apply our classifier to whole genomes from eight mosquito populations sampled across sub-Saharan Africa by the Anopheles gambiae 1000 Genomes Consortium, elucidating both continent-wide patterns as well as sweeps unique to specific geographic regions. These populations have experienced intense insecticide exposure over the past two decades, and we observe a strong overrepresentation of sweeps at insecticide resistance loci. Our analysis thus provides a list of candidate adaptive loci that may be relevant to mosquito control efforts. More broadly, our supervised machine learning approach introduces a method to distinguish between completed and partial sweeps, as well as between hard and soft sweeps, under a variety of demographic scenarios. As whole-genome data rapidly accumulate for a greater diversity of organisms, partialS/HIC addresses an increasing demand for useful selection scan tools that can track in-progress evolutionary dynamics.
Collapse
Affiliation(s)
- Alexander T Xue
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC
| | - Andrew D Kern
- Institute of Ecology and Evolution, 5289 University of Oregon, Eugene, OR
| |
Collapse
|
24
|
Abstract
Drosophila melanogaster, a small dipteran of African origin, represents one of the best-studied model organisms. Early work in this system has uniquely shed light on the basic principles of genetics and resulted in a versatile collection of genetic tools that allow to uncover mechanistic links between genotype and phenotype. Moreover, given its worldwide distribution in diverse habitats and its moderate genome-size, Drosophila has proven very powerful for population genetics inference and was one of the first eukaryotes whose genome was fully sequenced. In this book chapter, we provide a brief historical overview of research in Drosophila and then focus on recent advances during the genomic era. After describing different types and sources of genomic data, we discuss mechanisms of neutral evolution including the demographic history of Drosophila and the effects of recombination and biased gene conversion. Then, we review recent advances in detecting genome-wide signals of selection, such as soft and hard selective sweeps. We further provide a brief introduction to background selection, selection of noncoding DNA and codon usage and focus on the role of structural variants, such as transposable elements and chromosomal inversions, during the adaptive process. Finally, we discuss how genomic data helps to dissect neutral and adaptive evolutionary mechanisms that shape genetic and phenotypic variation in natural populations along environmental gradients. In summary, this book chapter serves as a starting point to Drosophila population genomics and provides an introduction to the system and an overview to data sources, important population genetic concepts and recent advances in the field.
Collapse
|
25
|
Demographic analyses of a new sample of haploid genomes from a Swedish population of Drosophila melanogaster. Sci Rep 2020; 10:22415. [PMID: 33376238 PMCID: PMC7772335 DOI: 10.1038/s41598-020-79720-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Accepted: 12/11/2020] [Indexed: 01/27/2023] Open
Abstract
European and African natural populations of Drosophila melanogaster have been the focus of several studies aiming at inferring demographic and adaptive processes based on genetic variation data. However, in these analyses little attention has been given to gene flow between African and European samples. Here we present a dataset consisting of 14 fully sequenced haploid genomes sampled from a natural population from the northern species range (Umeå, Sweden). We co-analyzed this new data with an African population to compare the likelihood of several competing demographic scenarios for European and African populations and show that gene flow improves the fit of demographic models to data.
Collapse
|
26
|
Choi JY, Purugganan M, Stacy EA. Divergent Selection and Primary Gene Flow Shape Incipient Speciation of a Riparian Tree on Hawaii Island. Mol Biol Evol 2020; 37:695-710. [PMID: 31693149 PMCID: PMC7038655 DOI: 10.1093/molbev/msz259] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
A long-standing goal of evolutionary biology is to understand the mechanisms underlying the formation of species. Of particular interest is whether or not speciation can occur in the presence of gene flow and without a period of physical isolation. Here, we investigated this process within Hawaiian Metrosideros, a hypervariable and highly dispersible woody species complex that dominates the Hawaiian Islands in continuous stands. Specifically, we investigated the origin of Metrosideros polymorpha var. newellii (newellii), a riparian ecotype endemic to Hawaii Island that is purportedly derived from the archipelago-wide M. polymorpha var. glaberrima (glaberrima). Disruptive selection across a sharp forest-riparian ecotone contributes to the isolation of these varieties and is a likely driver of newellii's origin. We examined genome-wide variation of 42 trees from Hawaii Island and older islands. Results revealed a split between glaberrima and newellii within the past 0.3-1.2 My. Admixture was extensive between lineages within Hawaii Island and between islands, but introgression from populations on older islands (i.e., secondary gene flow) did not appear to contribute to the emergence of newellii. In contrast, recurrent gene flow (i.e., primary gene flow) between glaberrima and newellii contributed to the formation of genomic islands of elevated absolute and relative divergence. These regions were enriched for genes with regulatory functions as well as for signals of positive selection, especially in newellii, consistent with divergent selection underlying their formation. In sum, our results support riparian newellii as a rare case of incipient ecological speciation with primary gene flow in trees.
Collapse
Affiliation(s)
- Jae Young Choi
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY
| | - Michael Purugganan
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY.,Center for Genomics and Systems Biology, NYU Abu Dhabi Research Institute, New York University Abu Dhabi, Abu Dhabi, United Arab Emirates
| | - Elizabeth A Stacy
- School of Life Sciences, University of Nevada, Las Vegas, Las Vegas, NV
| |
Collapse
|
27
|
Marchi N, Excoffier L. Gene flow as a simple cause for an excess of high-frequency-derived alleles. Evol Appl 2020; 13:2254-2263. [PMID: 33005222 PMCID: PMC7513730 DOI: 10.1111/eva.12998] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Revised: 04/30/2020] [Accepted: 05/04/2020] [Indexed: 01/19/2023] Open
Abstract
Most human populations exhibit an excess of high-frequency variants, leading to a U-shaped site-frequency spectrum (uSFS). This pattern has been generally interpreted as a signature of ongoing episodes of positive selection, or as evidence for a mis-assignment of ancestral/derived allelic states, but uSFS has also been observed in populations receiving gene flow from a ghost population, in structured populations, or after range expansions. In order to better explain the prevalence of high-frequency variants in humans and other populations, we describe here which patterns of gene flow and population demography can lead to uSFS by using extensive coalescent simulations. We find that uSFS can often be observed in a population if gene flow brings a few ancestral alleles from a well-differentiated population. Gene flow can either consist in single pulses of admixture or continuous immigration, but different demographic conditions are necessary to observe uSFS in these two scenarios. Indeed, an extremely low and recent gene flow is required in the case of single admixture events, while with continuous immigration, uSFS occurs only if gene flow started recently at a high rate or if it lasted for a long time at a low rate. Overall, we find that a neutral uSFS occurs under more restrictive conditions in populations having received single pulses of gene flow than in populations exposed to continuous gene flow. We also show that the uSFS observed in human populations from the 1000 Genomes Project can easily be explained by gene flow from surrounding populations without requiring past episodes of positive selection. These results imply that uSFS should be common in non-isolated populations, such as most wild or domesticated plants and animals.
Collapse
Affiliation(s)
- Nina Marchi
- CMPGInstitute of Ecology and EvolutionUniversity of BerneBerneSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Laurent Excoffier
- CMPGInstitute of Ecology and EvolutionUniversity of BerneBerneSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| |
Collapse
|
28
|
Abstract
The domestication of animals led to a major shift in human subsistence patterns, from a hunter-gatherer to a sedentary agricultural lifestyle, which ultimately resulted in the development of complex societies. Over the past 15,000 years, the phenotype and genotype of multiple animal species, such as dogs, pigs, sheep, goats, cattle and horses, have been substantially altered during their adaptation to the human niche. Recent methodological innovations, such as improved ancient DNA extraction methods and next-generation sequencing, have enabled the sequencing of whole ancient genomes. These genomes have helped reconstruct the process by which animals entered into domestic relationships with humans and were subjected to novel selection pressures. Here, we discuss and update key concepts in animal domestication in light of recent contributions from ancient genomics.
Collapse
|
29
|
Nakagome S, Hudson RR, Di Rienzo A. Inferring the model and onset of natural selection under varying population size from the site frequency spectrum and haplotype structure. Proc Biol Sci 2020; 286:20182541. [PMID: 30963935 DOI: 10.1098/rspb.2018.2541] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
A fundamental question about adaptation in a population is the time of onset of the selective pressure acting on beneficial alleles. Inferring this time, in turn, depends on the selection model. We develop a framework of approximate Bayesian computation (ABC) that enables the use of the full site frequency spectrum and haplotype structure to test the goodness-of-fit of selection models and estimate the timing of selection under varying population size scenarios. We show that our method has sufficient power to distinguish natural selection from neutrality even if relatively old selection increased the frequency of a pre-existing allele from 20% to 50% or from 40% to 80%. Our ABC can accurately estimate the time of onset of selection on a new mutation. However, estimates are prone to bias under the standing variation model, possibly due to the uncertainty in the allele frequency at the onset of selection. We further extend our approach to take advantage of ancient DNA data that provides information on the allele frequency path of the beneficial allele. Applying our ABC, including both modern and ancient human DNA data, to four pigmentation alleles in Europeans, we detected selection on standing variants that occurred after the dispersal from Africa even though models of selection on a new mutation were initially supported for two of these alleles without the ancient data.
Collapse
Affiliation(s)
- Shigeki Nakagome
- 1 Department of Human Genetics, University of Chicago , Chicago, IL , USA.,3 School of Medicine, Faculty of Health Sciences, Trinity College Dublin, the University of Dublin , Dublin , Ireland
| | - Richard R Hudson
- 1 Department of Human Genetics, University of Chicago , Chicago, IL , USA.,2 Department of Ecology & Evolution, University of Chicago , Chicago, IL , USA
| | - Anna Di Rienzo
- 1 Department of Human Genetics, University of Chicago , Chicago, IL , USA
| |
Collapse
|
30
|
Moest M, Van Belleghem SM, James JE, Salazar C, Martin SH, Barker SL, Moreira GRP, Mérot C, Joron M, Nadeau NJ, Steiner FM, Jiggins CD. Selective sweeps on novel and introgressed variation shape mimicry loci in a butterfly adaptive radiation. PLoS Biol 2020; 18:e3000597. [PMID: 32027643 PMCID: PMC7029882 DOI: 10.1371/journal.pbio.3000597] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 02/19/2020] [Accepted: 01/15/2020] [Indexed: 11/21/2022] Open
Abstract
Natural selection leaves distinct signatures in the genome that can reveal the targets and history of adaptive evolution. By analysing high-coverage genome sequence data from 4 major colour pattern loci sampled from nearly 600 individuals in 53 populations, we show pervasive selection on wing patterns in the Heliconius adaptive radiation. The strongest signatures correspond to loci with the greatest phenotypic effects, consistent with visual selection by predators, and are found in colour patterns with geographically restricted distributions. These recent sweeps are similar between co-mimics and indicate colour pattern turn-over events despite strong stabilising selection. Using simulations, we compare sweep signatures expected under classic hard sweeps with those resulting from adaptive introgression, an important aspect of mimicry evolution in Heliconius butterflies. Simulated recipient populations show a distinct 'volcano' pattern with peaks of increased genetic diversity around the selected target, characteristic of sweeps of introgressed variation and consistent with diversity patterns found in some populations. Our genomic data reveal a surprisingly dynamic history of colour pattern selection and co-evolution in this adaptive radiation.
Collapse
Affiliation(s)
- Markus Moest
- Department of Zoology, University of Cambridge, Cambridge, United Kingdom
- Department of Ecology, University of Innsbruck, Innsbruck, Austria
| | - Steven M. Van Belleghem
- Department of Zoology, University of Cambridge, Cambridge, United Kingdom
- Department of Biology, University of Puerto Rico, Rio Piedras, Puerto Rico
| | - Jennifer E. James
- Department of Zoology, University of Cambridge, Cambridge, United Kingdom
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona, United States of America
| | - Camilo Salazar
- Biology Program, Faculty of Natural Sciences and Mathematics, Universidad del Rosario, Bogota D.C., Colombia
| | - Simon H. Martin
- Department of Zoology, University of Cambridge, Cambridge, United Kingdom
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - Sarah L. Barker
- Department of Zoology, University of Cambridge, Cambridge, United Kingdom
| | - Gilson R. P. Moreira
- Departamento de Zoologia, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
| | - Claire Mérot
- IBIS, Department of Biology, Université Laval, Québec, Canada
| | - Mathieu Joron
- Centre d'Ecologie Fonctionnelle et Evolutive, UMR 5175 CNRS—Université de Montpellier—Université Paul Valéry Montpellier—EPHE, Montpellier, France
| | - Nicola J. Nadeau
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom
| | | | - Chris D. Jiggins
- Department of Zoology, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
31
|
Luo S, Zhang H, Duan Y, Yao X, Clark AG, Lu J. The evolutionary arms race between transposable elements and piRNAs in Drosophila melanogaster. BMC Evol Biol 2020; 20:14. [PMID: 31992188 PMCID: PMC6988346 DOI: 10.1186/s12862-020-1580-3] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2019] [Accepted: 01/13/2020] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND The piwi-interacting RNAs (piRNAs) are small non-coding RNAs that specifically repress transposable elements (TEs) in the germline of Drosophila. Despite our expanding understanding of TE:piRNA interaction, whether there is an evolutionary arms race between TEs and piRNAs was unclear. RESULTS Here, we studied the population genomics of TEs and piRNAs in the worldwide strains of D. melanogaster. By conducting a correlation analysis between TE contents and the abundance of piRNAs from ovaries of representative strains of D. melanogaster, we find positive correlations between TEs and piRNAs in six TE families. Our simulations further highlight that TE activities and the strength of purifying selection against TEs are important factors shaping the interactions between TEs and piRNAs. Our studies also suggest that the de novo generation of piRNAs is an important mechanism to repress the newly invaded TEs. CONCLUSIONS Our results revealed the existence of an evolutionary arms race between the copy numbers of TEs and the abundance of antisense piRNAs at the population level. Although the interactions between TEs and piRNAs are complex and many factors should be considered to impact their interaction dynamics, our results suggest the emergence, repression specificity and strength of piRNAs on TEs should be considered in studying the landscapes of TE insertions in Drosophila. These results deepen our understanding of the interactions between piRNAs and TEs, and also provide novel insights into the nature of genomic conflicts of other forms.
Collapse
Affiliation(s)
- Shiqi Luo
- State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, College of Life Sciences and Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, 100871, China
- College of Plant Protection, Beijing Advanced Innovation Center for Food Nutrition and Human Health, China Agricultural University, Beijing, 100193, China
| | - Hong Zhang
- State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, College of Life Sciences and Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, 100871, China
| | - Yuange Duan
- State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, College of Life Sciences and Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, 100871, China
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Xinmin Yao
- State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, College of Life Sciences and Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, 100871, China
| | - Andrew G Clark
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, 14853, USA.
| | - Jian Lu
- State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, College of Life Sciences and Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, 100871, China.
| |
Collapse
|
32
|
Koropoulis A, Alachiotis N, Pavlidis P. Detecting Positive Selection in Populations Using Genetic Data. Methods Mol Biol 2020; 2090:87-123. [PMID: 31975165 DOI: 10.1007/978-1-0716-0199-0_5] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
High-throughput genomic sequencing allows to disentangle the evolutionary forces acting in populations. Among evolutionary forces, positive selection has received a lot of attention because it is related to the adaptation of populations in their environments, both biotic and abiotic. Positive selection, also known as Darwinian selection, occurs when an allele is favored by natural selection. The frequency of the favored allele increases in the population and, due to genetic hitchhiking, neighboring linked variation diminishes, creating so-called selective sweeps. Such a process leaves traces in genomes that can be detected in a future time point. Detecting traces of positive selection in genomes is achieved by searching for signatures introduced by selective sweeps, such as regions of reduced variation, a specific shift of the site frequency spectrum, and particular linkage disequilibrium (LD) patterns in the region. A variety of approaches can be used for detecting selective sweeps, ranging from simple implementations that compute summary statistics to more advanced statistical approaches, e.g., Bayesian approaches, maximum-likelihood-based methods, and machine learning methods. In this chapter, we discuss selective sweep detection methodologies on the basis of their capacity to analyze whole genomes or just subgenomic regions, and on the specific polymorphism patterns they exploit as selective sweep signatures. We also summarize the results of comparisons among five open-source software releases (SweeD, SweepFinder, SweepFinder2, OmegaPlus, and RAiSD) regarding sensitivity, specificity, and execution times. Furthermore, we test and discuss machine learning methods and present a thorough performance analysis. In equilibrium neutral models or mild bottlenecks, most methods are able to detect selective sweeps accurately. Methods and tools that rely on linkage disequilibrium (LD) rather than single SNPs exhibit higher true positive rates than the site frequency spectrum (SFS)-based methods under the model of a single sweep or recurrent hitchhiking. However, their false positive rate is elevated when a misspecified demographic model is used to build the distribution of the statistic under the null hypothesis. Both LD and SFS-based approaches suffer from decreased accuracy on localizing the true target of selection in bottleneck scenarios. Furthermore, we present an extensive analysis of the effects of gene flow on selective sweep detection, a problem that has been understudied in selective sweep literature.
Collapse
Affiliation(s)
- Angelos Koropoulis
- Institute of Computer Science, Foundation for Research and Technology Hellas, Heraklion, Greece
- Computer Science Department, University of Crete, Crete, Heraklion, Greece
| | - Nikolaos Alachiotis
- Institute of Computer Science, Foundation for Research and Technology Hellas, Heraklion, Greece
| | - Pavlos Pavlidis
- Institute of Computer Science, Foundation for Research and Technology Hellas, Heraklion, Greece.
| |
Collapse
|
33
|
Introgression drives repeated evolution of winter coat color polymorphism in hares. Proc Natl Acad Sci U S A 2019; 116:24150-24156. [PMID: 31712446 DOI: 10.1073/pnas.1910471116] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Changing from summer-brown to winter-white pelage or plumage is a crucial adaptation to seasonal snow in more than 20 mammal and bird species. Many of these species maintain nonwhite winter morphs, locally adapted to less snowy conditions, which may have evolved independently. Mountain hares (Lepus timidus) from Fennoscandia were introduced into the Faroe Islands in 1855. While they were initially winter-white, within ∼65 y all Faroese hares became winter-gray, a morph that occurs in the source population at low frequency. The documented population history makes this a valuable model for understanding the genetic basis and evolution of the seasonal trait polymorphism. Through whole-genome scans of differentiation and single-nucleotide polymorphism (SNP) genotyping, we associated winter coat color polymorphism to the genomic region of the pigmentation gene Agouti, previously linked to introgression-driven winter coat color variation in the snowshoe hare (Lepus americanus). Lower Agouti expression in the skin of winter-gray individuals during the autumn molt suggests that regulatory changes may underlie the color polymorphism. Variation in the associated genomic region shows signatures of a selective sweep in the Faroese population, suggesting that positive selection drove the fixation of the variant after the introduction. Whole-genome analyses of several hare species revealed that the winter-gray variant originated through introgression from a noncolor changing species, in keeping with the history of ancient hybridization between the species. Our findings show the recurrent role of introgression in generating winter coat color variation by repeatedly recruiting the regulatory region of Agouti to modulate seasonal coat color change.
Collapse
|
34
|
Kapopoulou A, Pfeifer SP, Jensen JD, Laurent S. The Demographic History of African Drosophila melanogaster. Genome Biol Evol 2019; 10:2338-2342. [PMID: 30169784 PMCID: PMC6363051 DOI: 10.1093/gbe/evy185] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/27/2018] [Indexed: 11/14/2022] Open
Abstract
As one of the most commonly utilized organisms in the study of local adaptation, an accurate characterization of the demographic history of Drosophila melanogaster remains as an important research question. This owes both to the inherent interest in characterizing the population history of this model organism, as well as to the well-established importance of an accurate null demographic model for increasing power and decreasing false positive rates in genomic scans for positive selection. Although considerable attention has been afforded to this issue in non-African populations, less is known about the demographic history of African populations, including from the ancestral range of the species. While qualitative predictions and hypotheses have previously been forwarded, we here present a quantitative model fitting of the population history characterizing both the ancestral Zambian population range as well as the subsequently colonized west African populations, which themselves served as the source of multiple non-African colonization events. We here report the split time of the West African population at 72 kya, a date corresponding to human migration into this region as well as a period of climatic changes in the African continent. Furthermore, we have estimated population sizes at this split time. These parameter estimates thus represent an important null model for future investigations in to African and non-African D. melanogaster populations alike.
Collapse
Affiliation(s)
- Adamandia Kapopoulou
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Susanne P Pfeifer
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.,School of Life Sciences, Center for Evolution and Medicine, Arizona State University, Tempe, Arizona
| | - Jeffrey D Jensen
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.,School of Life Sciences, Center for Evolution and Medicine, Arizona State University, Tempe, Arizona
| | - Stefan Laurent
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.,Department of Comparative Development and Genetics, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| |
Collapse
|
35
|
Abstract
For almost 20 years, many inference methods have been developed to detect selective sweeps and localize the targets of directional selection in the genome. These methods are based on population genetic models that describe the effect of a beneficial allele (e.g., a new mutation) on linked neutral variation (driven by directional selection from a single copy to fixation). Here, I discuss these models, ranging from selective sweeps in a panmictic population of constant size to evolutionary traffic when simultaneous sweeps at multiple loci interfere, and emphasize the important role of demography and population structure in data analysis. In the past 10 years, soft sweeps that may arise after an environmental change from directional selection on standing variation have become a focus of population genetic research. In contrast to selective sweeps, they are caused by beneficial alleles that were neutrally segregating in a population before the environmental change or were present at a mutation-selection balance in appreciable frequency.
Collapse
|
36
|
Flagel L, Brandvain Y, Schrider DR. The Unreasonable Effectiveness of Convolutional Neural Networks in Population Genetic Inference. Mol Biol Evol 2019; 36:220-238. [PMID: 30517664 PMCID: PMC6367976 DOI: 10.1093/molbev/msy224] [Citation(s) in RCA: 95] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Population-scale genomic data sets have given researchers incredible amounts of information from which to infer evolutionary histories. Concomitant with this flood of data, theoretical and methodological advances have sought to extract information from genomic sequences to infer demographic events such as population size changes and gene flow among closely related populations/species, construct recombination maps, and uncover loci underlying recent adaptation. To date, most methods make use of only one or a few summaries of the input sequences and therefore ignore potentially useful information encoded in the data. The most sophisticated of these approaches involve likelihood calculations, which require theoretical advances for each new problem, and often focus on a single aspect of the data (e.g., only allele frequency information) in the interest of mathematical and computational tractability. Directly interrogating the entirety of the input sequence data in a likelihood-free manner would thus offer a fruitful alternative. Here, we accomplish this by representing DNA sequence alignments as images and using a class of deep learning methods called convolutional neural networks (CNNs) to make population genetic inferences from these images. We apply CNNs to a number of evolutionary questions and find that they frequently match or exceed the accuracy of current methods. Importantly, we show that CNNs perform accurate evolutionary model selection and parameter estimation, even on problems that have not received detailed theoretical treatments. Thus, when applied to population genetic alignments, CNNs are capable of outperforming expert-derived statistical methods and offer a new path forward in cases where no likelihood approach exists.
Collapse
Affiliation(s)
- Lex Flagel
- Monsanto Company, Chesterfield, MO
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN
| | - Yaniv Brandvain
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC
| |
Collapse
|
37
|
Chan J, Perrone V, Spence JP, Jenkins PA, Mathieson S, Song YS. A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2018; 31:8594-8605. [PMID: 33244210 PMCID: PMC7687905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
An explosion of high-throughput DNA sequencing in the past decade has led to a surge of interest in population-scale inference with whole-genome data. Recent work in population genetics has centered on designing inference methods for relatively simple model classes, and few scalable general-purpose inference techniques exist for more realistic, complex models. To achieve this, two inferential challenges need to be addressed: (1) population data are exchangeable, calling for methods that efficiently exploit the symmetries of the data, and (2) computing likelihoods is intractable as it requires integrating over a set of correlated, extremely high-dimensional latent variables. These challenges are traditionally tackled by likelihood-free methods that use scientific simulators to generate datasets and reduce them to hand-designed, permutation-invariant summary statistics, often leading to inaccurate inference. In this work, we develop an exchangeable neural network that performs summary statistic-free, likelihood-free inference. Our framework can be applied in a black-box fashion across a variety of simulation-based tasks, both within and outside biology. We demonstrate the power of our approach on the recombination hotspot testing problem, outperforming the state-of-the-art.
Collapse
|
38
|
Detection and Classification of Hard and Soft Sweeps from Unphased Genotypes by Multilocus Genotype Identity. Genetics 2018; 210:1429-1452. [PMID: 30315068 DOI: 10.1534/genetics.118.301502] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2018] [Accepted: 10/08/2018] [Indexed: 11/18/2022] Open
Abstract
Positive natural selection can lead to a decrease in genomic diversity at the selected site and at linked sites, producing a characteristic signature of elevated expected haplotype homozygosity. These selective sweeps can be hard or soft. In the case of a hard selective sweep, a single adaptive haplotype rises to high population frequency, whereas multiple adaptive haplotypes sweep through the population simultaneously in a soft sweep, producing distinct patterns of genetic variation in the vicinity of the selected site. Measures of expected haplotype homozygosity have previously been used to detect sweeps in multiple study systems. However, these methods are formulated for phased haplotype data, typically unavailable for nonmodel organisms, and some may have reduced power to detect soft sweeps due to their increased genetic diversity relative to hard sweeps. To address these limitations, we applied the H12 and H2/H1 statistics proposed in 2015 by Garud et al., which have power to detect both hard and soft sweeps, to unphased multilocus genotypes, denoting them as G12 and G2/G1. G12 (and the more direct expected homozygosity analog to H12, denoted G123) has comparable power to H12 for detecting both hard and soft sweeps. G2/G1 can be used to classify hard and soft sweeps analogously to H2/H1, conditional on a genomic region having high G12 or G123 values. The reason for this power is that, under random mating, the most frequent haplotypes will yield the most frequent multilocus genotypes. Simulations based on parameters compatible with our recent understanding of human demographic history suggest that expected homozygosity methods are best suited for detecting recent sweeps, and increase in power under recent population expansions. Finally, we find candidates for selective sweeps within the 1000 Genomes CEU, YRI, GIH, and CHB populations, which corroborate and complement existing studies.
Collapse
|
39
|
Jones MR, Mills LS, Alves PC, Callahan CM, Alves JM, Lafferty DJR, Jiggins FM, Jensen JD, Melo-Ferreira J, Good JM. Adaptive introgression underlies polymorphic seasonal camouflage in snowshoe hares. Science 2018; 360:1355-1358. [DOI: 10.1126/science.aar5273] [Citation(s) in RCA: 182] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Accepted: 05/01/2018] [Indexed: 12/14/2022]
Abstract
Snowshoe hares (Lepus americanus) maintain seasonal camouflage by molting to a white winter coat, but some hares remain brown during the winter in regions with low snow cover. We show that cis-regulatory variation controlling seasonal expression of the Agouti gene underlies this adaptive winter camouflage polymorphism. Genetic variation at Agouti clustered by winter coat color across multiple hare and jackrabbit species, revealing a history of recurrent interspecific gene flow. Brown winter coats in snowshoe hares likely originated from an introgressed black-tailed jackrabbit allele that has swept to high frequency in mild winter environments. These discoveries show that introgression of genetic variants that underlie key ecological traits can seed past and ongoing adaptation to rapidly changing environments.
Collapse
|
40
|
Kern AD, Schrider DR. diploS/HIC: An Updated Approach to Classifying Selective Sweeps. G3 (BETHESDA, MD.) 2018; 8:1959-1970. [PMID: 29626082 PMCID: PMC5982824 DOI: 10.1534/g3.118.200262] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 04/04/2018] [Indexed: 11/18/2022]
Abstract
Identifying selective sweeps in populations that have complex demographic histories remains a difficult problem in population genetics. We previously introduced a supervised machine learning approach, S/HIC, for finding both hard and soft selective sweeps in genomes on the basis of patterns of genetic variation surrounding a window of the genome. While S/HIC was shown to be both powerful and precise, the utility of S/HIC was limited by the use of phased genomic data as input. In this report we describe a deep learning variant of our method, diploS/HIC, that uses unphased genotypes to accurately classify genomic windows. diploS/HIC is shown to be quite powerful even at moderate to small sample sizes.
Collapse
Affiliation(s)
- Andrew D Kern
- Department of Genetics, Rutgers University, Piscataway, NJ 08854
| | | |
Collapse
|
41
|
Schrider DR, Ayroles J, Matute DR, Kern AD. Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia. PLoS Genet 2018; 14:e1007341. [PMID: 29684059 PMCID: PMC5933812 DOI: 10.1371/journal.pgen.1007341] [Citation(s) in RCA: 69] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2017] [Revised: 05/03/2018] [Accepted: 03/28/2018] [Indexed: 12/30/2022] Open
Abstract
Hybridization and gene flow between species appears to be common. Even though it is clear that hybridization is widespread across all surveyed taxonomic groups, the magnitude and consequences of introgression are still largely unknown. Thus it is crucial to develop the statistical machinery required to uncover which genomic regions have recently acquired haplotypes via introgression from a sister population. We developed a novel machine learning framework, called FILET (Finding Introgressed Loci via Extra-Trees) capable of revealing genomic introgression with far greater power than competing methods. FILET works by combining information from a number of population genetic summary statistics, including several new statistics that we introduce, that capture patterns of variation across two populations. We show that FILET is able to identify loci that have experienced gene flow between related species with high accuracy, and in most situations can correctly infer which population was the donor and which was the recipient. Here we describe a data set of outbred diploid Drosophila sechellia genomes, and combine them with data from D. simulans to examine recent introgression between these species using FILET. Although we find that these populations may have split more recently than previously appreciated, FILET confirms that there has indeed been appreciable recent introgression (some of which might have been adaptive) between these species, and reveals that this gene flow is primarily in the direction of D. simulans to D. sechellia. Understanding the extent to which species or diverged populations hybridize in nature is crucially important if we are to understand the speciation process. Accordingly numerous research groups have developed methodology for finding the genetic evidence of such introgression. In this report we develop a supervised machine learning approach for uncovering loci which have introgressed across species boundaries. We show that our method, FILET, has greater accuracy and power than competing methods in discovering introgression, and in addition can detect the directionality associated with the gene flow between species. Using whole genome sequences from Drosophila simulans and Drosophila sechellia we show that FILET discovers quite extensive introgression between these species that has occurred mostly from D. simulans to D. sechellia. Our work highlights the complex process of speciation even within a well-studied system and points to the growing importance of supervised machine learning in population genetics.
Collapse
Affiliation(s)
- Daniel R. Schrider
- Department of Genetics, Rutgers University, Piscataway, New Jersey, United States of America
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, New Jersey, United States of America
- * E-mail:
| | - Julien Ayroles
- Ecology and Evolutionary Biology Department, Princeton University, Princeton, New Jersey, United States of America
- Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Daniel R. Matute
- Biology Department, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Andrew D. Kern
- Department of Genetics, Rutgers University, Piscataway, New Jersey, United States of America
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, New Jersey, United States of America
| |
Collapse
|
42
|
Weigand H, Leese F. Detecting signatures of positive selection in non-model species using genomic data. Zool J Linn Soc 2018. [DOI: 10.1093/zoolinnean/zly007] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Affiliation(s)
- Hannah Weigand
- Aquatic Ecosystem Research, University of Duisburg-Essen, Universitätsstraße, Essen, Germany
| | - Florian Leese
- Aquatic Ecosystem Research, University of Duisburg-Essen, Universitätsstraße, Essen, Germany
- Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Universitätsstraße, Essen, Germany
| |
Collapse
|
43
|
Schrider DR, Kern AD. Supervised Machine Learning for Population Genetics: A New Paradigm. Trends Genet 2018; 34:301-312. [PMID: 29331490 PMCID: PMC5905713 DOI: 10.1016/j.tig.2017.12.005] [Citation(s) in RCA: 201] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Revised: 11/29/2017] [Accepted: 12/08/2017] [Indexed: 01/21/2023]
Abstract
As population genomic datasets grow in size, researchers are faced with the daunting task of making sense of a flood of information. To keep pace with this explosion of data, computational methodologies for population genetic inference are rapidly being developed to best utilize genomic sequence data. In this review we discuss a new paradigm that has emerged in computational population genomics: that of supervised machine learning (ML). We review the fundamentals of ML, discuss recent applications of supervised ML to population genetics that outperform competing methods, and describe promising future directions in this area. Ultimately, we argue that supervised ML is an important and underutilized tool that has considerable potential for the world of evolutionary genomics.
Collapse
Affiliation(s)
- Daniel R Schrider
- Department of Genetics, and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08554, USA.
| | - Andrew D Kern
- Department of Genetics, and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08554, USA.
| |
Collapse
|
44
|
Lee KS, Chatterjee P, Choi EY, Sung MK, Oh J, Won H, Park SM, Kim YJ, Yi SV, Choi JK. Selection on the regulation of sympathetic nervous activity in humans and chimpanzees. PLoS Genet 2018; 14:e1007311. [PMID: 29672586 PMCID: PMC5908061 DOI: 10.1371/journal.pgen.1007311] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Accepted: 03/17/2018] [Indexed: 12/31/2022] Open
Abstract
Adrenergic α2C receptor (ADRA2C) is an inhibitory modulator of the sympathetic nervous system. Knockout mice for this gene show physiological and behavioural alterations that are associated with the fight-or-flight response. There is evidence of positive selection on the regulation of this gene during chicken domestication. Here, we find that the neuronal expression of ADRA2C is lower in human and chimpanzee than in other primates. On the basis of three-dimensional chromatin structure, we identified a cis-regulatory region whose DNA sequences have been significantly accelerated in human and chimpanzee. Active histone modification marks this region in rhesus macaque but not in human and chimpanzee; instead, repressive marks are enriched in various human brain samples. This region contains two neuron-restrictive silencer factor (NRSF) binding motifs, each of which harbours a polymorphism. Our genotyping and analysis of population genome data indicate that at both polymorphic sites, the derived allele has reached fixation in humans and chimpanzees but not in bonobos, whereas only the ancestral allele is present among macaques. Our CRISPR/Cas9 genome editing and reporter assays show that both derived nucleotides repress ADRA2C, most likely by increasing NRSF binding. In addition, we detected signatures of recent positive selection for lower neuronal ADRA2C expression in humans. Our findings indicate that there has been selective pressure for enhanced sympathetic nervous activity in the evolution of humans and chimpanzees.
Collapse
Affiliation(s)
- Kang Seon Lee
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Paramita Chatterjee
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Eun-Young Choi
- Specific Organs Cancer Branch, Research Institute, National Cancer Center, Ilsan, Gyeonggi, Republic of Korea
| | - Min Kyung Sung
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Jaeho Oh
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Hyejung Won
- Department of Neurology, University of California Los Angeles, Los Angeles, California, United States of America
| | - Seong-Min Park
- Specific Organs Cancer Branch, Research Institute, National Cancer Center, Ilsan, Gyeonggi, Republic of Korea
| | - Youn-Jae Kim
- Specific Organs Cancer Branch, Research Institute, National Cancer Center, Ilsan, Gyeonggi, Republic of Korea
| | - Soojin V. Yi
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Jung Kyoon Choi
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| |
Collapse
|
45
|
Sugden LA, Atkinson EG, Fischer AP, Rong S, Henn BM, Ramachandran S. Localization of adaptive variants in human genomes using averaged one-dependence estimation. Nat Commun 2018; 9:703. [PMID: 29459739 PMCID: PMC5818606 DOI: 10.1038/s41467-018-03100-7] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Accepted: 01/19/2018] [Indexed: 12/19/2022] Open
Abstract
Statistical methods for identifying adaptive mutations from population genetic data face several obstacles: assessing the significance of genomic outliers, integrating correlated measures of selection into one analytic framework, and distinguishing adaptive variants from hitchhiking neutral variants. Here, we introduce SWIF(r), a probabilistic method that detects selective sweeps by learning the distributions of multiple selection statistics under different evolutionary scenarios and calculating the posterior probability of a sweep at each genomic site. SWIF(r) is trained using simulations from a user-specified demographic model and explicitly models the joint distributions of selection statistics, thereby increasing its power to both identify regions undergoing sweeps and localize adaptive mutations. Using array and exome data from 45 ‡Khomani San hunter-gatherers of southern Africa, we identify an enrichment of adaptive signals in genes associated with metabolism and obesity. SWIF(r) provides a transparent probabilistic framework for localizing beneficial mutations that is extensible to a variety of evolutionary scenarios.
Collapse
Affiliation(s)
- Lauren Alpert Sugden
- Center for Computational Molecular Biology, Brown University, Providence, RI, 02912, USA.
- Department of Ecology and Evolutionary Biology, Brown University, Providence, RI, 02912, USA.
| | - Elizabeth G Atkinson
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Annie P Fischer
- Division of Applied Mathematics, Brown University, Providence, RI, 02912, USA
| | - Stephen Rong
- Center for Computational Molecular Biology, Brown University, Providence, RI, 02912, USA
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI, 02912, USA
| | - Brenna M Henn
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Sohini Ramachandran
- Center for Computational Molecular Biology, Brown University, Providence, RI, 02912, USA.
- Department of Ecology and Evolutionary Biology, Brown University, Providence, RI, 02912, USA.
| |
Collapse
|
46
|
Detecting Recent Positive Selection with a Single Locus Test Bipartitioning the Coalescent Tree. Genetics 2017; 208:791-805. [PMID: 29217523 DOI: 10.1534/genetics.117.300401] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2017] [Accepted: 12/01/2017] [Indexed: 01/09/2023] Open
Abstract
Many population genomic studies have been conducted in the past to search for traces of recent events of positive selection. These traces, however, can be obscured by temporal variation of population size or other demographic factors. To reduce the confounding impact of demography, the coalescent tree topology has been used as an additional source of information for detecting recent positive selection in a population or a species. Based on the branching pattern at the root, we partition the hypothetical coalescent tree, inferred from a sequence sample, into two subtrees. The reasoning is that positive selection could impose a strong impact on branch length in one of the two subtrees while demography has the same effect on average on both subtrees. Thus, positive selection should be detectable by comparing statistics calculated for the two subtrees. Simulations demonstrate that the proposed test based on these principles has high power to detect recent positive selection even when DNA polymorphism data from only one locus is available, and that it is robust to the confounding effect of demography. One feature is that all components in the summary statistics ([Formula: see text]) can be computed analytically. Moreover, misinference of derived and ancestral alleles is seen to have only a limited effect on the test, and it therefore avoids a notorious problem when searching for traces of recent positive selection.
Collapse
|
47
|
Harris SE, Munshi-South J. Signatures of positive selection and local adaptation to urbanization in white-footed mice (Peromyscus leucopus). Mol Ecol 2017; 26:6336-6350. [PMID: 28980357 DOI: 10.1111/mec.14369] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2016] [Accepted: 09/25/2017] [Indexed: 02/06/2023]
Abstract
Urbanization significantly alters natural ecosystems and has accelerated globally. Urban wildlife populations are often highly fragmented by human infrastructure, and isolated populations may adapt in response to local urban pressures. However, relatively few studies have identified genomic signatures of adaptation in urban animals. We used a landscape genomic approach to examine signatures of selection in urban populations of white-footed mice (Peromyscus leucopus) in New York City. We analysed 154,770 SNPs identified from transcriptome data from 48 P. leucopus individuals from three urban and three rural populations and used outlier tests to identify evidence of urban adaptation. We accounted for demography by simulating a neutral SNP data set under an inferred demographic history as a null model for outlier analysis. We also tested whether candidate genes were associated with environmental variables related to urbanization. In total, we detected 381 outlier loci and after stringent filtering, identified and annotated 19 candidate loci. Many of the candidate genes were involved in metabolic processes and have well-established roles in metabolizing lipids and carbohydrates. Our results indicate that white-footed mice in New York City are adapting at the biomolecular level to local selective pressures in urban habitats. Annotation of outlier loci suggests selection is acting on metabolic pathways in urban populations, likely related to novel diets in cities that differ from diets in less disturbed areas.
Collapse
Affiliation(s)
- Stephen E Harris
- The Graduate Center, City University of New York (CUNY), New York, NY, USA
| | - Jason Munshi-South
- Louis Calder Center-Biological Field Station, Fordham University, Armonk, NY, USA
| |
Collapse
|
48
|
Range Expansion Compromises Adaptive Evolution in an Outcrossing Plant. Curr Biol 2017; 27:2544-2551.e4. [DOI: 10.1016/j.cub.2017.07.007] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Revised: 05/22/2017] [Accepted: 07/04/2017] [Indexed: 01/04/2023]
|
49
|
Abstract
The degree to which adaptation in recent human evolution shapes genetic variation remains controversial. This is in part due to the limited evidence in humans for classic "hard selective sweeps", wherein a novel beneficial mutation rapidly sweeps through a population to fixation. However, positive selection may often proceed via "soft sweeps" acting on mutations already present within a population. Here, we examine recent positive selection across six human populations using a powerful machine learning approach that is sensitive to both hard and soft sweeps. We found evidence that soft sweeps are widespread and account for the vast majority of recent human adaptation. Surprisingly, our results also suggest that linked positive selection affects patterns of variation across much of the genome, and may increase the frequencies of deleterious mutations. Our results also reveal insights into the role of sexual selection, cancer risk, and central nervous system development in recent human evolution.
Collapse
Affiliation(s)
- Daniel R. Schrider
- Department of Genetics, Rutgers University, Piscataway, NJ
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ
| | - Andrew D. Kern
- Department of Genetics, Rutgers University, Piscataway, NJ
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ
| |
Collapse
|
50
|
Refining the Use of Linkage Disequilibrium as a Robust Signature of Selective Sweeps. Genetics 2017; 203:1807-25. [PMID: 27516617 DOI: 10.1534/genetics.115.185900] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Accepted: 04/05/2016] [Indexed: 12/12/2022] Open
Abstract
During a selective sweep, characteristic patterns of linkage disequilibrium can arise in the genomic region surrounding a selected locus. These have been used to infer past selective sweeps. However, the recombination rate is known to vary substantially along the genome for many species. We here investigate the effectiveness of current (Kelly's [Formula: see text] and [Formula: see text]) and novel statistics at inferring hard selective sweeps based on linkage disequilibrium distortions under different conditions, including a human-realistic demographic model and recombination rate variation. When the recombination rate is constant, Kelly's [Formula: see text] offers high power, but is outperformed by a novel statistic that we test, which we call [Formula: see text] We also find this statistic to be effective at detecting sweeps from standing variation. When recombination rate fluctuations are included, there is a considerable reduction in power for all linkage disequilibrium-based statistics. However, this can largely be reversed by appropriately controlling for expected linkage disequilibrium using a genetic map. To further test these different methods, we perform selection scans on well-characterized HapMap data, finding that all three statistics-[Formula: see text] Kelly's [Formula: see text] and [Formula: see text]-are able to replicate signals at regions previously identified as selection candidates based on population differentiation or the site frequency spectrum. While [Formula: see text] replicates most candidates when recombination map data are not available, the [Formula: see text] and [Formula: see text] statistics are more successful when recombination rate variation is controlled for. Given both this and their higher power in simulations of selective sweeps, these statistics are preferred when information on local recombination rate variation is available.
Collapse
|