1
|
Dabi A, Schrider DR. Population size rescaling significantly biases outcomes of forward-in-time population genetic simulations. Genetics 2025; 229:1-57. [PMID: 39503241 PMCID: PMC11708920 DOI: 10.1093/genetics/iyae180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Accepted: 10/18/2024] [Indexed: 11/13/2024] Open
Abstract
Simulations are an essential tool in all areas of population genetic research, used in tasks such as the validation of theoretical analysis and the study of complex evolutionary models. Forward-in-time simulations are especially flexible, allowing for various types of natural selection, complex genetic architectures, and non-Wright-Fisher dynamics. However, their intense computational requirements can be prohibitive to simulating large populations and genomes. A popular method to alleviate this burden is to scale down the population size by some scaling factor while scaling up the mutation rate, selection coefficients, and recombination rate by the same factor. However, this rescaling approach may in some cases bias simulation results. To investigate the manner and degree to which rescaling impacts simulation outcomes, we carried out simulations with different demographic histories and distributions of fitness effects using several values of the rescaling factor, Q, and compared the deviation of key outcomes (fixation times, allele frequencies, linkage disequilibrium, and the fraction of mutations that fix during the simulation) between the scaled and unscaled simulations. Our results indicate that scaling introduces substantial biases to each of these measured outcomes, even at small values of Q. Moreover, the nature of these effects depends on the evolutionary model and scaling factor being examined. While increasing the scaling factor tends to increase the observed biases, this relationship is not always straightforward; thus, it may be difficult to know the impact of scaling on simulation outcomes a priori. However, it appears that for most models, only a small number of replicates was needed to accurately quantify the bias produced by rescaling for a given Q. In summary, while rescaling forward-in-time simulations may be necessary in many cases, researchers should be aware of the rescaling procedure's impact on simulation outcomes and consider investigating its magnitude in smaller scale simulations of the desired model(s) before selecting an appropriate value of Q.
Collapse
Affiliation(s)
- Amjad Dabi
- Department of Genetics, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Daniel R Schrider
- Department of Genetics, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
2
|
Amin MR, Hasan M, DeGiorgio M. Digital Image Processing to Detect Adaptive Evolution. Mol Biol Evol 2024; 41:msae242. [PMID: 39565932 PMCID: PMC11631197 DOI: 10.1093/molbev/msae242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 10/28/2024] [Accepted: 11/13/2024] [Indexed: 11/22/2024] Open
Abstract
In recent years, advances in image processing and machine learning have fueled a paradigm shift in detecting genomic regions under natural selection. Early machine learning techniques employed population-genetic summary statistics as features, which focus on specific genomic patterns expected by adaptive and neutral processes. Though such engineered features are important when training data are limited, the ease at which simulated data can now be generated has led to the recent development of approaches that take in image representations of haplotype alignments and automatically extract important features using convolutional neural networks. Digital image processing methods termed α-molecules are a class of techniques for multiscale representation of objects that can extract a diverse set of features from images. One such α-molecule method, termed wavelet decomposition, lends greater control over high-frequency components of images. Another α-molecule method, termed curvelet decomposition, is an extension of the wavelet concept that considers events occurring along curves within images. We show that application of these α-molecule techniques to extract features from image representations of haplotype alignments yield high true positive rate and accuracy to detect hard and soft selective sweep signatures from genomic data with both linear and nonlinear machine learning classifiers. Moreover, we find that such models are easy to visualize and interpret, with performance rivaling those of contemporary deep learning approaches for detecting sweeps.
Collapse
Affiliation(s)
- Md Ruhul Amin
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Mahmudul Hasan
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Michael DeGiorgio
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| |
Collapse
|
3
|
Whitehouse LS, Ray DD, Schrider DR. Tree Sequences as a General-Purpose Tool for Population Genetic Inference. Mol Biol Evol 2024; 41:msae223. [PMID: 39460991 PMCID: PMC11600592 DOI: 10.1093/molbev/msae223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 10/05/2024] [Accepted: 10/17/2024] [Indexed: 10/28/2024] Open
Abstract
As population genetic data increase in size, new methods have been developed to store genetic information in efficient ways, such as tree sequences. These data structures are computationally and storage efficient but are not interchangeable with existing data structures used for many population genetic inference methodologies such as the use of convolutional neural networks applied to population genetic alignments. To better utilize these new data structures, we propose and implement a graph convolutional network to directly learn from tree sequence topology and node data, allowing for the use of neural network applications without an intermediate step of converting tree sequences to population genetic alignment format. We then compare our approach to standard convolutional neural network approaches on a set of previously defined benchmarking tasks including recombination rate estimation, positive selection detection, introgression detection, and demographic model parameter inference. We show that tree sequences can be directly learned from using a graph convolutional network approach and can be used to perform well on these common population genetic inference tasks with accuracies roughly matching or even exceeding that of a convolutional neural network-based method. As tree sequences become more widely used in population genetic research, we foresee developments and optimizations of this work to provide a foundation for population genetic inference moving forward.
Collapse
Affiliation(s)
- Logan S Whitehouse
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Dylan D Ray
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
| |
Collapse
|
4
|
Whitehouse LS, Ray D, Schrider DR. Tree sequences as a general-purpose tool for population genetic inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.20.581288. [PMID: 39185244 PMCID: PMC11343121 DOI: 10.1101/2024.02.20.581288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
As population genetics data increases in size new methods have been developed to store genetic information in efficient ways, such as tree sequences. These data structures are computationally and storage efficient, but are not interchangeable with existing data structures used for many population genetic inference methodologies such as the use of convolutional neural networks (CNNs) applied to population genetic alignments. To better utilize these new data structures we propose and implement a graph convolutional network (GCN) to directly learn from tree sequence topology and node data, allowing for the use of neural network applications without an intermediate step of converting tree sequences to population genetic alignment format. We then compare our approach to standard CNN approaches on a set of previously defined benchmarking tasks including recombination rate estimation, positive selection detection, introgression detection, and demographic model parameter inference. We show that tree sequences can be directly learned from using a GCN approach and can be used to perform well on these common population genetics inference tasks with accuracies roughly matching or even exceeding that of a CNN-based method. As tree sequences become more widely used in population genetics research we foresee developments and optimizations of this work to provide a foundation for population genetics inference moving forward.
Collapse
Affiliation(s)
- Logan S. Whitehouse
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA, 120 Mason Farm Rd, Chapel Hill, NC 27514
| | - Dylan Ray
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA, 120 Mason Farm Rd, Chapel Hill, NC 27514
| | - Daniel R. Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA, 120 Mason Farm Rd, Chapel Hill, NC 27514
| |
Collapse
|
5
|
Dabi A, Schrider DR. Population size rescaling significantly biases outcomes of forward-in-time population genetic simulations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.07.588318. [PMID: 38645049 PMCID: PMC11030438 DOI: 10.1101/2024.04.07.588318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Simulations are an essential tool in all areas of population genetic research, used in tasks such as the validation of theoretical analysis and the study of complex evolutionary models. Forward-in-time simulations are especially flexible, allowing for various types of natural selection, complex genetic architectures, and non-Wright-Fisher dynamics. However, their intense computational requirements can be prohibitive to simulating large populations and genomes. A popular method to alleviate this burden is to scale down the population size by some scaling factor while scaling up the mutation rate, selection coefficients, and recombination rate by the same factor. However, this rescaling approach may in some cases bias simulation results. To investigate the manner and degree to which rescaling impacts simulation outcomes, we carried out simulations with different demographic histories and distributions of fitness effects using several values of the rescaling factor, Q , and compared the deviation of key outcomes (fixation times, allele frequencies, linkage disequilibrium, and the fraction of mutations that fix during the simulation) between the scaled and unscaled simulations. Our results indicate that scaling introduces substantial biases to each of these measured outcomes, even at small values of Q . Moreover, the nature of these effects depends on the evolutionary model and scaling factor being examined. While increasing the scaling factor tends to increase the observed biases, this relationship is not always straightforward, thus it may be difficult to know the impact of scaling on simulation outcomes a priori. However, it appears that for most models, only a small number of replicates was needed to accurately quantify the bias produced by rescaling for a given Q . In summary, while rescaling forward-in-time simulations may be necessary in many cases, researchers should be aware of the rescaling procedure's impact on simulation outcomes and consider investigating its magnitude in smaller scale simulations of the desired model(s) before selecting an appropriate value of Q .
Collapse
Affiliation(s)
- Amjad Dabi
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Daniel R. Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA
| |
Collapse
|
6
|
Wang Y, Allen SL, Reddiex AJ, Chenoweth SF. The impacts of positive selection on genomic variation in Drosophila serrata: Insights from a deep learning approach. Mol Ecol 2024; 33:e17499. [PMID: 39188068 DOI: 10.1111/mec.17499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 07/22/2024] [Accepted: 08/07/2024] [Indexed: 08/28/2024]
Abstract
This study explores the impact of positive selection on the genetic composition of a Drosophila serrata population in eastern Australia through a comprehensive analysis of 110 whole genome sequences. Utilizing an advanced deep learning algorithm (partialS/HIC) and a range of inferred demographic histories, we identified that approximately 14% of the genome is directly affected by sweeps, with soft sweeps being more prevalent (10.6%) than hard sweeps (2.1%), and partial sweeps being uncommon (1.3%). The algorithm demonstrated robustness to demographic assumptions in classifying complete sweeps but faced challenges in distinguishing neutral regions from partial sweeps and linked regions under demographic misspecification. The findings reveal the indirect influence of sweeps on nearly two-thirds of the genome through linkage, with an over-representation of putatively deleterious variants suggesting that positive selection drags deleterious variants to higher frequency due to hitchhiking with beneficial loci. Gene ontology enrichment analysis further supported our confidence in the accuracy of sweep detection as several traits expected to be under positive selection due to evolutionary arms races (e.g. immunity) were detected in hard sweeps. This study provides valuable insights into the direct and indirect contributions of positive selection in shaping genomic variation in natural populations.
Collapse
Affiliation(s)
- Yiguan Wang
- School of Biological Sciences, The University of Queensland, St Lucia, Queensland, Australia
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK
| | - Scott L Allen
- School of Biological Sciences, The University of Queensland, St Lucia, Queensland, Australia
| | - Adam J Reddiex
- School of Biological Sciences, The University of Queensland, St Lucia, Queensland, Australia
- Biological Data Science Institute, The Australian National University, Canberra, Australian Capital Territory, Australia
| | - Stephen F Chenoweth
- School of Biological Sciences, The University of Queensland, St Lucia, Queensland, Australia
| |
Collapse
|
7
|
Sinigaglia B, Escudero J, Biagini SA, Garcia-Calleja J, Moreno J, Dobon B, Acosta S, Mondal M, Walsh S, Aguileta G, Vallès M, Forrow S, Martin-Caballero J, Migliano AB, Bertranpetit J, Muñoz FJ, Bosch E. Exploring Adaptive Phenotypes for the Human Calcium-Sensing Receptor Polymorphism R990G. Mol Biol Evol 2024; 41:msae015. [PMID: 38285634 PMCID: PMC10859840 DOI: 10.1093/molbev/msae015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 01/23/2024] [Accepted: 01/23/2024] [Indexed: 01/31/2024] Open
Abstract
Rainforest hunter-gatherers from Southeast Asia are characterized by specific morphological features including a particularly dark skin color (D), short stature (S), woolly hair (W), and the presence of steatopygia (S)-fat accumulation localized in the hips (DSWS phenotype). Based on previous evidence in the Andamanese population, we first characterized signatures of adaptive natural selection around the calcium-sensing receptor gene in Southeast Asian rainforest groups presenting the DSWS phenotype and identified the R990G substitution (rs1042636) as a putative adaptive variant for experimental follow-up. Although the calcium-sensing receptor has a critical role in calcium homeostasis by directly regulating the parathyroid hormone secretion, it is expressed in different tissues and has been described to be involved in many biological functions. Previous works have also characterized the R990G substitution as an activating polymorphism of the calcium-sensing receptor associated with hypocalcemia. Therefore, we generated a knock-in mouse for this substitution and investigated organismal phenotypes that could have become adaptive in rainforest hunter-gatherers from Southeast Asia. Interestingly, we found that mouse homozygous for the derived allele show not only lower serum calcium concentration but also greater body weight and fat accumulation, probably because of enhanced preadipocyte differentiation and lipolysis impairment resulting from the calcium-sensing receptor activation mediated by R990G. We speculate that such differential features in humans could have facilitated the survival of hunter-gatherer groups during periods of nutritional stress in the challenging conditions of the Southeast Asian tropical rainforests.
Collapse
Affiliation(s)
- Barbara Sinigaglia
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| | - Jorge Escudero
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| | - Simone A Biagini
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| | - Jorge Garcia-Calleja
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| | - Josep Moreno
- PCB-PRBB Animal Facility Alliance, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| | - Begoña Dobon
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| | - Sandra Acosta
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
- UB Institute of Neuroscience, Department of Pathology and Experimental Therapeutics, Universitat de Barcelona, Barcelona 08007, Spain
| | - Mayukh Mondal
- Institute of Genomics, University of Tartu, Tartu 51010, Estonia
- Institute of Clinical Molecular Biology, Christian-Albrechts-Universität zu Kiel, Kiel 24118, Germany
| | - Sandra Walsh
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| | - Gabriela Aguileta
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| | - Mònica Vallès
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| | - Stephen Forrow
- Mouse Mutant Core Facility, Institute for Research in Biomedicine (IRB), Barcelona 08028, Spain
| | - Juan Martin-Caballero
- PCB-PRBB Animal Facility Alliance, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| | - Andrea Bamberg Migliano
- Human Evolutionary Ecology Group, Department of Evolutionary Anthropology, University of Zurich, Zurich 8057, Switzerland
| | - Jaume Bertranpetit
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| | - Francisco J Muñoz
- Laboratory of Molecular Physiology, Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| | - Elena Bosch
- Institut de Biologia Evolutiva (UPF-CSIC), Departament de Medicina i Ciències de la Vida, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Spain
| |
Collapse
|
8
|
Ray DD, Flagel L, Schrider DR. IntroUNET: Identifying introgressed alleles via semantic segmentation. PLoS Genet 2024; 20:e1010657. [PMID: 38377104 PMCID: PMC10906877 DOI: 10.1371/journal.pgen.1010657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 03/01/2024] [Accepted: 01/29/2024] [Indexed: 02/22/2024] Open
Abstract
A growing body of evidence suggests that gene flow between closely related species is a widespread phenomenon. Alleles that introgress from one species into a close relative are typically neutral or deleterious, but sometimes confer a significant fitness advantage. Given the potential relevance to speciation and adaptation, numerous methods have therefore been devised to identify regions of the genome that have experienced introgression. Recently, supervised machine learning approaches have been shown to be highly effective for detecting introgression. One especially promising approach is to treat population genetic inference as an image classification problem, and feed an image representation of a population genetic alignment as input to a deep neural network that distinguishes among evolutionary models (i.e. introgression or no introgression). However, if we wish to investigate the full extent and fitness effects of introgression, merely identifying genomic regions in a population genetic alignment that harbor introgressed loci is insufficient-ideally we would be able to infer precisely which individuals have introgressed material and at which positions in the genome. Here we adapt a deep learning algorithm for semantic segmentation, the task of correctly identifying the type of object to which each individual pixel in an image belongs, to the task of identifying introgressed alleles. Our trained neural network is thus able to infer, for each individual in a two-population alignment, which of those individual's alleles were introgressed from the other population. We use simulated data to show that this approach is highly accurate, and that it can be readily extended to identify alleles that are introgressed from an unsampled "ghost" population, performing comparably to a supervised learning method tailored specifically to that task. Finally, we apply this method to data from Drosophila, showing that it is able to accurately recover introgressed haplotypes from real data. This analysis reveals that introgressed alleles are typically confined to lower frequencies within genic regions, suggestive of purifying selection, but are found at much higher frequencies in a region previously shown to be affected by adaptive introgression. Our method's success in recovering introgressed haplotypes in challenging real-world scenarios underscores the utility of deep learning approaches for making richer evolutionary inferences from genomic data.
Collapse
Affiliation(s)
- Dylan D. Ray
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Lex Flagel
- Division of Data Science, Gencove Inc., New York, New York, United States of America
- Department of Plant and Microbial Biology, University of Minnesota, Saint Paul, Minnesota, United States of America
| | - Daniel R. Schrider
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| |
Collapse
|
9
|
Ray DD, Flagel L, Schrider DR. IntroUNET: identifying introgressed alleles via semantic segmentation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.02.07.527435. [PMID: 36865105 PMCID: PMC9979274 DOI: 10.1101/2023.02.07.527435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
A growing body of evidence suggests that gene flow between closely related species is a widespread phenomenon. Alleles that introgress from one species into a close relative are typically neutral or deleterious, but sometimes confer a significant fitness advantage. Given the potential relevance to speciation and adaptation, numerous methods have therefore been devised to identify regions of the genome that have experienced introgression. Recently, supervised machine learning approaches have been shown to be highly effective for detecting introgression. One especially promising approach is to treat population genetic inference as an image classification problem, and feed an image representation of a population genetic alignment as input to a deep neural network that distinguishes among evolutionary models (i.e. introgression or no introgression). However, if we wish to investigate the full extent and fitness effects of introgression, merely identifying genomic regions in a population genetic alignment that harbor introgressed loci is insufficient-ideally we would be able to infer precisely which individuals have introgressed material and at which positions in the genome. Here we adapt a deep learning algorithm for semantic segmentation, the task of correctly identifying the type of object to which each individual pixel in an image belongs, to the task of identifying introgressed alleles. Our trained neural network is thus able to infer, for each individual in a two-population alignment, which of those individual's alleles were introgressed from the other population. We use simulated data to show that this approach is highly accurate, and that it can be readily extended to identify alleles that are introgressed from an unsampled "ghost" population, performing comparably to a supervised learning method tailored specifically to that task. Finally, we apply this method to data from Drosophila, showing that it is able to accurately recover introgressed haplotypes from real data. This analysis reveals that introgressed alleles are typically confined to lower frequencies within genic regions, suggestive of purifying selection, but are found at much higher frequencies in a region previously shown to be affected by adaptive introgression. Our method's success in recovering introgressed haplotypes in challenging real-world scenarios underscores the utility of deep learning approaches for making richer evolutionary inferences from genomic data.
Collapse
Affiliation(s)
- Dylan D. Ray
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Lex Flagel
- Division of Data Science, Gencove Inc., New York, NY 11101, USA
- Department of Plant and Microbial Biology, University of Minnesota, St Paul MN, 55108, USA
| | - Daniel R. Schrider
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
10
|
Panigrahi M, Rajawat D, Nayak SS, Ghildiyal K, Sharma A, Jain K, Lei C, Bhushan B, Mishra BP, Dutt T. Landmarks in the history of selective sweeps. Anim Genet 2023; 54:667-688. [PMID: 37710403 DOI: 10.1111/age.13355] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 08/28/2023] [Indexed: 09/16/2023]
Abstract
Half a century ago, a seminal article on the hitchhiking effect by Smith and Haigh inaugurated the concept of the selection signature. Selective sweeps are characterised by the rapid spread of an advantageous genetic variant through a population and hence play an important role in shaping evolution and research on genetic diversity. The process by which a beneficial allele arises and becomes fixed in a population, leading to a increase in the frequency of other linked alleles, is known as genetic hitchhiking or genetic draft. Kimura's neutral theory and hitchhiking theory are complementary, with Kimura's neutral evolution as the 'null model' and positive selection as the 'signal'. Both are widely accepted in evolution, especially with genomics enabling precise measurements. Significant advances in genomic technologies, such as next-generation sequencing, high-density SNP arrays and powerful bioinformatics tools, have made it possible to systematically investigate selection signatures in a variety of species. Although the history of selection signatures is relatively recent, progress has been made in the last two decades, owing to the increasing availability of large-scale genomic data and the development of computational methods. In this review, we embark on a journey through the history of research on selective sweeps, ranging from early theoretical work to recent empirical studies that utilise genomic data.
Collapse
Affiliation(s)
- Manjit Panigrahi
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Divya Rajawat
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | | | - Kanika Ghildiyal
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Anurodh Sharma
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Karan Jain
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Chuzhao Lei
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, China
| | - Bharat Bhushan
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Bishnu Prasad Mishra
- Division of Animal Biotechnology, ICAR-National Bureau of Animal Genetic Resources, Karnal, India
| | - Triveni Dutt
- Livestock Production and Management Section, Indian Veterinary Research Institute, Bareilly, India
| |
Collapse
|
11
|
Levi R, Levi L, Louzoun Y. Bw4 ligand and direct T-cell receptor binding induced selection on HLA A and B alleles. Front Immunol 2023; 14:1236080. [PMID: 38077375 PMCID: PMC10703150 DOI: 10.3389/fimmu.2023.1236080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 10/26/2023] [Indexed: 12/18/2023] Open
Abstract
Introduction The HLA region is the hallmark of balancing selection, argued to be driven by the pressure to present a wide variety of viral epitopes. As such selection on the peptide-binding positions has been proposed to drive HLA population genetics. MHC molecules also directly binds to the T-Cell Receptor and killer cell immunoglobulin-like receptors (KIR). Methods We here combine the HLA allele frequencies in over six-million Hematopoietic Stem Cells (HSC) donors with a novel machine-learning-based method to predict allele frequency. Results We show for the first time that allele frequency can be predicted from their sequences. This prediction yields a natural measure for selection. The strongest selection is affecting KIR binding regions, followed by the peptide-binding cleft. The selection from the direct interaction with the KIR and TCR is centered on positively charged residues (mainly Arginine), and some positions in the peptide-binding cleft are not associated with the allele frequency, especially Tyrosine residues. Discussion These results suggest that the balancing selection for peptide presentation is combined with a positive selection for KIR and TCR binding.
Collapse
Affiliation(s)
| | | | - Yoram Louzoun
- Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel
| |
Collapse
|
12
|
Tanaka T, Hayakawa T, Teshima KM. Power of neutrality tests for detecting natural selection. G3 (BETHESDA, MD.) 2023; 13:jkad161. [PMID: 37481468 PMCID: PMC10542275 DOI: 10.1093/g3journal/jkad161] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 06/09/2023] [Accepted: 07/19/2023] [Indexed: 07/24/2023]
Abstract
Detection of natural selection is one of the main interests in population genetics. Thus, many tests have been developed for detecting natural selection using genomic data. Although it is recognized that the utility of tests depends on several evolutionary factors, such as the timing of selection, strength of selection, frequency of selected alleles, demographic events, and initial frequency of selected allele when selection started acting (softness of selection), the relationships between such evolutionary factors and the power of tests are not yet entirely clear. In this study, we investigated the power of 4 tests: Tajiama's D, Fay and Wu's H, relative extended haplotype homozygosity (rEHH), and integrated haplotype score (iHS), under ranges of evolutionary parameters and demographic models to quantitatively expand the understanding of approaches for detecting selection. The results show that each test detects selection within a limited parameter range, and there are still wide ranges of parameters for which none of these tests work effectively. In addition, the parameter space in which each test shows the highest power overlaps the empirical results of previous research. These results indicate that our present perspective of adaptation is limited to only a part of actual adaptation.
Collapse
Affiliation(s)
- Tomotaka Tanaka
- Graduate School of System Life Science, Kyushu University, Fukuoka 819-0395, Japan
| | - Toshiyuki Hayakawa
- Graduate School of System Life Science, Kyushu University, Fukuoka 819-0395, Japan
- Faculty of Arts and Science, Kyushu University, Fukuoka 819-0395, Japan
| | - Kosuke M Teshima
- Department of Biology, Faculty of Science, Kyushu University, Fukuoka 819-0395, Japan
| |
Collapse
|
13
|
Saif R, Mahmood T, Zia S, Henkel J, Ejaz A. Genomic selection pressure discovery using site-frequency spectrum and reduced local variability statistics in Pakistani Dera-Din-Panah goat. Trop Anim Health Prod 2023; 55:331. [PMID: 37750990 DOI: 10.1007/s11250-023-03758-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 09/12/2023] [Indexed: 09/27/2023]
Abstract
BACKGROUND Population geneticists have long sought to comprehend various selection traces accumulated in the goat genome due to natural or human driven artificial selection through breeding practices, which led the wild animals to domestication, so understanding evolutionary process may helpful to utilize the full genetic potential of goat genome. METHODS AND RESULTS As a step forward to pinpoint the selection signals in Pakistani Dera-Din-Panah (DDP) goat, whole-genome pooled sequencing (n = 12) was performed, and 618,236,192 clean paired-end reads were mapped against ARS1 reference goat assembly. Five different selection signature statistics were applied using four site-frequency spectrum (SFS) methods (Tajima's D ([Formula: see text]), Fay and Wu's H ([Formula: see text]), Zeng's E ([Formula: see text]), [Formula: see text]) and one reduced local variability approach named pooled heterozygosity ([Formula: see text]). The under-selection regions were annotated with significant threshold values of [Formula: see text]≥4.7, [Formula: see text]≥6, [Formula: see text]≥2.5, Pool-HMM ≥ 12, and [Formula: see text]≥5 that resulted in accumulative 364 candidate gene hits. The highest genomic selection signals were observed on Chr. 4, 6, 10, 12, 15, 16, 18, 20, and 27 and harbor ADAMTS6, CWC27, RELN, MYCBP2, FGF14, STIM1, CFAP74, GNB1, CALML6, TMEM52, FAM149A, NADK, MMP23B, OPN3, FH, MFHAS1, KLKB1, RRM1, KMO, SPEF2, F11, KIT, KMO, ERI1, ATP8B4, and RHOG genes. Next, the validation of our captured genomic hits was also performed by more than one applied statistics which harbor meat production, immunity, and reproduction associated genes to strengthen our hypothesis of under-selection traits in this Pakistani goat breed. Furthermore, common candidate genes captured by more than one statistical method were subjected to gene ontology and KEGG pathway analysis to get insights of particular biological processes associated with this goat breed. CONCLUSION Current perception of genomic architecture of DDP goat provides a better understanding to improve its genetic potential and other economically important traits of medium to large body size, milk, and fiber production by updating the genomic insight driven breeding strategies to boost the livestock and agriculture-based economy of the country.
Collapse
Affiliation(s)
- Rashid Saif
- Department of Biotechnology, Qarshi University, Lahore, Pakistan.
- Decode Genomics, Punjab University Employees Housing Scheme, Lahore, Pakistan.
| | - Tania Mahmood
- Decode Genomics, Punjab University Employees Housing Scheme, Lahore, Pakistan
| | - Saeeda Zia
- Department of Sciences and Humanities, National University of Computer and Emerging Sciences, Lahore, Pakistan
| | - Jan Henkel
- MGZ-Medical Genetics Center, Munich, Germany
| | - Aniqa Ejaz
- Decode Genomics, Punjab University Employees Housing Scheme, Lahore, Pakistan
| |
Collapse
|
14
|
Chen Y, Li H, Yi TC, Shen J, Zhang J. Notch Signaling in Insect Development: A Simple Pathway with Diverse Functions. Int J Mol Sci 2023; 24:14028. [PMID: 37762331 PMCID: PMC10530718 DOI: 10.3390/ijms241814028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 09/05/2023] [Accepted: 09/06/2023] [Indexed: 09/29/2023] Open
Abstract
Notch signaling is an evolutionarily conserved pathway which functions between adjacent cells to establish their distinct identities. Despite operating in a simple mechanism, Notch signaling plays remarkably diverse roles in development to regulate cell fate determination, organ growth and tissue patterning. While initially discovered and characterized in the model insect Drosophila melanogaster, recent studies across various insect species have revealed the broad involvement of Notch signaling in shaping insect tissues. This review focuses on providing a comprehensive picture regarding the roles of the Notch pathway in insect development. The roles of Notch in the formation and patterning of the insect embryo, wing, leg, ovary and several specific structures, as well as in physiological responses, are summarized. These results are discussed within the developmental context, aiming to deepen our understanding of the diversified functions of the Notch signaling pathway in different insect species.
Collapse
Affiliation(s)
- Yao Chen
- Department of Plant Biosecurity and MOA Key Laboratory of Surveillance and Management for Plant Quarantine Pests, College of Plant Protection, China Agricultural University, Beijing 100193, China; (Y.C.)
| | - Haomiao Li
- Department of Plant Biosecurity and MOA Key Laboratory of Surveillance and Management for Plant Quarantine Pests, College of Plant Protection, China Agricultural University, Beijing 100193, China; (Y.C.)
| | - Tian-Ci Yi
- Guizhou Provincial Key Laboratory for Agricultural Pest Management of Mountainous Regions, Institute of Entomology, Guizhou University, Guiyang 550025, China
| | - Jie Shen
- Department of Plant Biosecurity and MOA Key Laboratory of Surveillance and Management for Plant Quarantine Pests, College of Plant Protection, China Agricultural University, Beijing 100193, China; (Y.C.)
| | - Junzheng Zhang
- Department of Plant Biosecurity and MOA Key Laboratory of Surveillance and Management for Plant Quarantine Pests, College of Plant Protection, China Agricultural University, Beijing 100193, China; (Y.C.)
| |
Collapse
|
15
|
Whitehouse LS, Schrider DR. Timesweeper: accurately identifying selective sweeps using population genomic time series. Genetics 2023; 224:iyad084. [PMID: 37157914 PMCID: PMC10324941 DOI: 10.1093/genetics/iyad084] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 07/25/2022] [Accepted: 04/25/2023] [Indexed: 05/10/2023] Open
Abstract
Despite decades of research, identifying selective sweeps, the genomic footprints of positive selection, remains a core problem in population genetics. Of the myriad methods that have been developed to tackle this task, few are designed to leverage the potential of genomic time-series data. This is because in most population genetic studies of natural populations, only a single period of time can be sampled. Recent advancements in sequencing technology, including improvements in extracting and sequencing ancient DNA, have made repeated samplings of a population possible, allowing for more direct analysis of recent evolutionary dynamics. Serial sampling of organisms with shorter generation times has also become more feasible due to improvements in the cost and throughput of sequencing. With these advances in mind, here we present Timesweeper, a fast and accurate convolutional neural network-based tool for identifying selective sweeps in data consisting of multiple genomic samplings of a population over time. Timesweeper analyzes population genomic time-series data by first simulating training data under a demographic model appropriate for the data of interest, training a one-dimensional convolutional neural network on said simulations, and inferring which polymorphisms in this serialized data set were the direct target of a completed or ongoing selective sweep. We show that Timesweeper is accurate under multiple simulated demographic and sampling scenarios, identifies selected variants with high resolution, and estimates selection coefficients more accurately than existing methods. In sum, we show that more accurate inferences about natural selection are possible when genomic time-series data are available; such data will continue to proliferate in coming years due to both the sequencing of ancient samples and repeated samplings of extant populations with faster generation times, as well as experimentally evolved populations where time-series data are often generated. Methodological advances such as Timesweeper thus have the potential to help resolve the controversy over the role of positive selection in the genome. We provide Timesweeper as a Python package for use by the community.
Collapse
Affiliation(s)
- Logan S Whitehouse
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27514, USA
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27514, USA
| |
Collapse
|
16
|
Liao K, Carlson J, Zöllner S. The effect of mutation subtypes on the allele frequency spectrum and population genetics inference. G3 (BETHESDA, MD.) 2023; 13:jkad035. [PMID: 36759699 PMCID: PMC10085755 DOI: 10.1093/g3journal/jkad035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 01/23/2023] [Accepted: 01/26/2023] [Indexed: 02/11/2023]
Abstract
Population genetics has adapted as technological advances in next-generation sequencing have resulted in an exponential increase of genetic data. A common approach to efficiently analyze genetic variation present in large sequencing data is through the allele frequency spectrum, defined as the distribution of allele frequencies in a sample. While the frequency spectrum serves to summarize patterns of genetic variation, it implicitly assumes mutation types (A→C vs C→T) as interchangeable. However, mutations of different types arise and spread due to spatial and temporal variation in forces such as mutation rate and biased gene conversion that result in heterogeneity in the distribution of allele frequencies across sites. In this work, we explore the impact of this simplification on multiple aspects of population genetic modeling. As a site's mutation rate is strongly affected by flanking nucleotides, we defined a mutation subtype by the base pair change and adjacent nucleotides (e.g. AAA→ATA) and systematically assessed the heterogeneity in the frequency spectrum across 96 distinct 3-mer mutation subtypes using n = 3556 whole-genome sequenced individuals of European ancestry. We observed substantial variation across the subtype-specific frequency spectra, with some of the variation being influenced by molecular factors previously identified for single base mutation types. Estimates of model parameters from demographic inference performed for each mutation subtype's AFS individually varied drastically across the 96 subtypes. In local patterns of variation, a combination of regional subtype composition and local genomic factors shaped the regional frequency spectrum across genomic regions. Our results illustrate how treating variants in large sequencing samples as interchangeable may confound population genetic frameworks and encourages us to consider the unique evolutionary mechanisms of analyzed polymorphisms.
Collapse
Affiliation(s)
- Kevin Liao
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jedidiah Carlson
- Department of Integrative Biology, University of Texas at Austin, Austin, TX 78712, USA
- Department of Population Health, University of Texas at Austin, Austin, TX 78712, USA
| | - Sebastian Zöllner
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Psychiatry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
17
|
Korfmann K, Gaggiotti OE, Fumagalli M. Deep Learning in Population Genetics. Genome Biol Evol 2023; 15:evad008. [PMID: 36683406 PMCID: PMC9897193 DOI: 10.1093/gbe/evad008] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 12/19/2022] [Accepted: 01/16/2023] [Indexed: 01/24/2023] Open
Abstract
Population genetics is transitioning into a data-driven discipline thanks to the availability of large-scale genomic data and the need to study increasingly complex evolutionary scenarios. With likelihood and Bayesian approaches becoming either intractable or computationally unfeasible, machine learning, and in particular deep learning, algorithms are emerging as popular techniques for population genetic inferences. These approaches rely on algorithms that learn non-linear relationships between the input data and the model parameters being estimated through representation learning from training data sets. Deep learning algorithms currently employed in the field comprise discriminative and generative models with fully connected, convolutional, or recurrent layers. Additionally, a wide range of powerful simulators to generate training data under complex scenarios are now available. The application of deep learning to empirical data sets mostly replicates previous findings of demography reconstruction and signals of natural selection in model organisms. To showcase the feasibility of deep learning to tackle new challenges, we designed a branched architecture to detect signals of recent balancing selection from temporal haplotypic data, which exhibited good predictive performance on simulated data. Investigations on the interpretability of neural networks, their robustness to uncertain training data, and creative representation of population genetic data, will provide further opportunities for technological advancements in the field.
Collapse
Affiliation(s)
- Kevin Korfmann
- Professorship for Population Genetics, Department of Life Science Systems, Technical University of Munich, Germany
| | - Oscar E Gaggiotti
- Centre for Biological Diversity, Sir Harold Mitchell Building, University of St Andrews, Fife KY16 9TF, UK
| | - Matteo Fumagalli
- Department of Biological and Behavioural Sciences, Queen Mary University of London, UK
| |
Collapse
|
18
|
Lappo E, Rosenberg NA. Approximations to the expectations and variances of ratios of tree properties under the coalescent. G3 (BETHESDA, MD.) 2022; 12:jkac205. [PMID: 35951748 PMCID: PMC9526068 DOI: 10.1093/g3journal/jkac205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 08/01/2022] [Indexed: 11/14/2022]
Abstract
Properties of gene genealogies such as tree height (H), total branch length (L), total lengths of external (E) and internal (I) branches, mean length of basal branches (B), and the underlying coalescence times (T) can be used to study population-genetic processes and to develop statistical tests of population-genetic models. Uses of tree features in statistical tests often rely on predictions that depend on pairwise relationships among such features. For genealogies under the coalescent, we provide exact expressions for Taylor approximations to expected values and variances of ratios Xn/Yn, for all 15 pairs among the variables {Hn,Ln,En,In,Bn,Tk}, considering n leaves and 2≤k≤n. For expected values of the ratios, the approximations match closely with empirical simulation-based values. The approximations to the variances are not as accurate, but they generally match simulations in their trends as n increases. Although En has expectation 2 and Hn has expectation 2 in the limit as n→∞, the approximation to the limiting expectation for En/Hn is not 1, instead equaling π2/3-2≈1.28987. The new approximations augment fundamental results in coalescent theory on the shapes of genealogical trees.
Collapse
Affiliation(s)
- Egor Lappo
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
19
|
Kumar H, Panigrahi M, Panwar A, Rajawat D, Nayak SS, Saravanan KA, Kaisa K, Parida S, Bhushan B, Dutt T. Machine-Learning Prospects for Detecting Selection Signatures Using Population Genomics Data. J Comput Biol 2022; 29:943-960. [PMID: 35639362 DOI: 10.1089/cmb.2021.0447] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Natural selection has been given a lot of attention because it relates to the adaptation of populations to their environments, both biotic and abiotic. An allele is selected when it is favored by natural selection. Consequently, the favored allele increases in frequency in the population and neighboring linked variation diminishes, causing so-called selective sweeps. A high-throughput genomic sequence allows one to disentangle the evolutionary forces at play in populations. With the development of high-throughput genome sequencing technologies, it has become easier to detect these selective sweeps/selection signatures. Various methods can be used to detect selective sweeps, from simple implementations using summary statistics to complex statistical approaches. One of the important problems of these statistical models is the potential to provide inaccurate results when their assumptions are violated. The use of machine learning (ML) in population genetics has been introduced as an alternative method of detecting selection by treating the problem of detecting selection signatures as a classification problem. Since the availability of population genomics data is increasing, researchers may incorporate ML into these statistical models to infer signatures of selection with higher predictive accuracy and better resolution. This article describes how ML can be used to aid in detecting and studying natural selection patterns using population genomic data.
Collapse
Affiliation(s)
- Harshit Kumar
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Manjit Panigrahi
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Anuradha Panwar
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Divya Rajawat
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Sonali Sonejita Nayak
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - K A Saravanan
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Kaiho Kaisa
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Subhashree Parida
- Divisions of Pharmacology and Toxicology, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Bharat Bhushan
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Triveni Dutt
- Livestock Production and Management Section, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| |
Collapse
|
20
|
Salloum PM, Santure AW, Lavery SD, de Villemereuil P. Finding the adaptive needles in a population-structured haystack: a case study in a New Zealand mollusc. J Anim Ecol 2022; 91:1209-1221. [PMID: 35318661 PMCID: PMC9311215 DOI: 10.1111/1365-2656.13692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2021] [Accepted: 03/09/2022] [Indexed: 11/30/2022]
Abstract
Genetic adaptation to future environmental conditions is crucial to help species persist as the climate changes. Genome scans are powerful tools to understand adaptive landscapes, enabling us to correlate genetic diversity with environmental gradients while disentangling neutral from adaptive variation. However, low gene flow can lead to both local adaptation and highly structured populations, and is a major confounding factor for genome scans, resulting in an inflated number of candidate loci. Here, we compared candidate locus detection in a marine mollusc (Onithochiton neglectus), taking advantage of a natural geographical contrast in the levels of genetic structure between its populations. O. neglectus is endemic to New Zealand and distributed throughout an environmental gradient from the subtropical north to the subantarctic south. Due to a brooding developmental mode, populations tend to be locally isolated. However, adult hitchhiking on rafting kelp increases connectivity among southern populations. We applied two genome scans for outliers (Bayescan and PCAdapt) and two genotype–environment association (GEA) tests (BayeScEnv and RDA). To limit issues with false positives, we combined results using the geometric mean of q‐values and performed association tests with random environmental variables. This novel approach is a compromise between stringent and relaxed approaches widely used before, and allowed us to classify candidate loci as low confidence or high confidence. Genome scans for outliers detected a large number of significant outliers in strong and moderately structured populations. No high‐confidence GEA loci were detected in the context of strong population structure. However, 86 high‐confidence loci were associated predominantly with latitudinally varying abiotic factors in the less structured southern populations. This suggests that the degree of connectivity driven by kelp rafting over the southern scale may be insufficient to counteract local adaptation in this species. Our study supports the expectation that genome scans may be prone to errors in highly structured populations. Nonetheless, it also empirically demonstrates that careful statistical controls enable the identification of candidate loci that invite more detailed investigations. Ultimately, genome scans are valuable tools to help guide further research aiming to determine the potential of non‐model species to adapt to future environments.
Collapse
Affiliation(s)
- P M Salloum
- School of Biological Sciences, University of Auckland, Auckland, New Zealand
| | - A W Santure
- School of Biological Sciences, University of Auckland, Auckland, New Zealand
| | - S D Lavery
- School of Biological Sciences, University of Auckland, Auckland, New Zealand.,Institute of Marine Science, Leigh Marine Laboratory, University of Auckland, Warkworth, New Zealand
| | - P de Villemereuil
- Institut de Systématique, Évolution, Biodiversité (ISYEB), École Pratique des Hautes Études
- PSL, MNHN, CNRS, SU, UA, Paris, France
| |
Collapse
|
21
|
A compendium of covariances and correlation coefficients of coalescent tree properties. Theor Popul Biol 2022; 143:1-13. [PMID: 34757022 PMCID: PMC9731325 DOI: 10.1016/j.tpb.2021.09.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 09/21/2021] [Accepted: 09/28/2021] [Indexed: 02/03/2023]
Abstract
Gene genealogies are frequently studied by measuring properties such as their height (H), length (L), sum of external branches (E), sum of internal branches (I), and mean of their two basal branches (B), and the coalescence times that contribute to the other genealogical features (T). These tree properties and their relationships can provide insight into the effects of population-genetic processes on genealogies and genetic sequences. Here, under the coalescent model, we study the 15 correlations among pairs of features of genealogical trees: Hn, Ln, En, In, Bn, and Tk for a sample of size n, with 2≤k≤n. We report high correlations among Hn, Ln, In, and Bn, with all pairwise correlations of these quantities having values greater than or equal to 6[6ζ(3)+6-π2]/(π18+9π2-π4)≈0.84930 in the limit as n→∞, where ζ is the Riemann zeta function. Although En has expectation 2 for all n and Hn has expectation 2 in the n→∞ limit, their limiting correlation is 0. The results contribute toward understanding features of the shapes of coalescent trees.
Collapse
|
22
|
Guirao‐Rico S, González J. Benchmarking the performance of Pool-seq SNP callers using simulated and real sequencing data. Mol Ecol Resour 2021; 21:1216-1229. [PMID: 33534960 PMCID: PMC8251607 DOI: 10.1111/1755-0998.13343] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Revised: 12/21/2020] [Accepted: 01/27/2021] [Indexed: 12/13/2022]
Abstract
Population genomics is a fast-developing discipline with promising applications in a growing number of life sciences fields. Advances in sequencing technologies and bioinformatics tools allow population genomics to exploit genome-wide information to identify the molecular variants underlying traits of interest and the evolutionary forces that modulate these variants through space and time. However, the cost of genomic analyses of multiple populations is still too high to address them through individual genome sequencing. Pooling individuals for sequencing can be a more effective strategy in Single Nucleotide Polymorphism (SNP) detection and allele frequency estimation because of a higher total coverage. However, compared to individual sequencing, SNP calling from pools has the additional difficulty of distinguishing rare variants from sequencing errors, which is often avoided by establishing a minimum threshold allele frequency for the analysis. Finding an optimal balance between minimizing information loss and reducing sequencing costs is essential to ensure the success of population genomics studies. Here, we have benchmarked the performance of SNP callers for Pool-seq data, based on different approaches, under different conditions, and using computer simulations and real data. We found that SNP callers performance varied for allele frequencies up to 0.35. We also found that SNP callers based on Bayesian (SNAPE-pooled) or maximum likelihood (MAPGD) approaches outperform the two heuristic callers tested (VarScan and PoolSNP), in terms of the balance between sensitivity and FDR both in simulated and sequencing data. Our results will help inform the selection of the most appropriate SNP caller not only for large-scale population studies but also in cases where the Pool-seq strategy is the only option, such as in metagenomic or polyploid studies.
Collapse
Affiliation(s)
- Sara Guirao‐Rico
- Institute of Evolutionary BiologyCSIC‐Universitat Pompeu FabraBarcelonaSpain
| | - Josefa González
- Institute of Evolutionary BiologyCSIC‐Universitat Pompeu FabraBarcelonaSpain
| |
Collapse
|
23
|
Isildak U, Stella A, Fumagalli M. Distinguishing between recent balancing selection and incomplete sweep using deep neural networks. Mol Ecol Resour 2021; 21:2706-2718. [PMID: 33749134 DOI: 10.1111/1755-0998.13379] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 03/01/2021] [Accepted: 03/05/2021] [Indexed: 12/12/2022]
Abstract
Balancing selection is an important adaptive mechanism underpinning a wide range of phenotypes. Despite its relevance, the detection of recent balancing selection from genomic data is challenging as its signatures are qualitatively similar to those left by ongoing positive selection. In this study, we developed and implemented two deep neural networks and tested their performance to predict loci under recent selection, either due to balancing selection or incomplete sweep, from population genomic data. Specifically, we generated forward-in-time simulations to train and test an artificial neural network (ANN) and a convolutional neural network (CNN). ANN received as input multiple summary statistics calculated on the locus of interest, while CNN was applied directly on the matrix of haplotypes. We found that both architectures have high accuracy to identify loci under recent selection. CNN generally outperformed ANN to distinguish between signals of balancing selection and incomplete sweep and was less affected by incorrect training data. We deployed both trained networks on neutral genomic regions in European populations and demonstrated a lower false-positive rate for CNN than ANN. We finally deployed CNN within the MEFV gene region and identified several common variants predicted to be under incomplete sweep in a European population. Notably, two of these variants are functional changes and could modulate susceptibility to familial Mediterranean fever, possibly as a consequence of past adaptation to pathogens. In conclusion, deep neural networks were able to characterize signals of selection on intermediate frequency variants, an analysis currently inaccessible by commonly used strategies.
Collapse
Affiliation(s)
- Ulas Isildak
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Alessandro Stella
- Laboratory of Medical Genetics, Department of Biomedical Sciences and Human Oncology, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Matteo Fumagalli
- Department of Life Sciences, Silwood Park Campus, Imperial College London, London, UK
| |
Collapse
|
24
|
Wang Z, Wang J, Kourakos M, Hoang N, Lee HH, Mathieson I, Mathieson S. Automatic inference of demographic parameters using generative adversarial networks. Mol Ecol Resour 2021; 21:2689-2705. [PMID: 33745225 PMCID: PMC8596911 DOI: 10.1111/1755-0998.13386] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Accepted: 03/05/2021] [Indexed: 12/12/2022]
Abstract
Population genetics relies heavily on simulated data for validation, inference and intuition. In particular, since the evolutionary ‘ground truth’ for real data is always limited, simulated data are crucial for training supervised machine learning methods. Simulation software can accurately model evolutionary processes but requires many hand‐selected input parameters. As a result, simulated data often fail to mirror the properties of real genetic data, which limits the scope of methods that rely on it. Here, we develop a novel approach to estimating parameters in population genetic models that automatically adapts to data from any population. Our method, pg‐gan, is based on a generative adversarial network that gradually learns to generate realistic synthetic data. We demonstrate that our method is able to recover input parameters in a simulated isolation‐with‐migration model. We then apply our method to human data from the 1000 Genomes Project and show that we can accurately recapitulate the features of real data.
Collapse
Affiliation(s)
- Zhanpeng Wang
- Department of Computer Science, Haverford College, Haverford, PA, USA
| | - Jiaping Wang
- Department of Computer Science, Haverford College, Haverford, PA, USA
| | - Michael Kourakos
- Department of Computer Science, Swarthmore College, Swarthmore, PA, USA
| | - Nhung Hoang
- Department of Computer Science, Swarthmore College, Swarthmore, PA, USA
| | - Hyong Hark Lee
- Department of Computer Science, Swarthmore College, Swarthmore, PA, USA
| | - Iain Mathieson
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| | - Sara Mathieson
- Department of Computer Science, Haverford College, Haverford, PA, USA
| |
Collapse
|
25
|
Xue AT, Schrider DR, Kern AD. Discovery of Ongoing Selective Sweeps within Anopheles Mosquito Populations Using Deep Learning. Mol Biol Evol 2021; 38:1168-1183. [PMID: 33022051 PMCID: PMC7947845 DOI: 10.1093/molbev/msaa259] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Identification of partial sweeps, which include both hard and soft sweeps that have not currently reached fixation, provides crucial information about ongoing evolutionary responses. To this end, we introduce partialS/HIC, a deep learning method to discover selective sweeps from population genomic data. partialS/HIC uses a convolutional neural network for image processing, which is trained with a large suite of summary statistics derived from coalescent simulations incorporating population-specific history, to distinguish between completed versus partial sweeps, hard versus soft sweeps, and regions directly affected by selection versus those merely linked to nearby selective sweeps. We perform several simulation experiments under various demographic scenarios to demonstrate partialS/HIC's performance, which exhibits excellent resolution for detecting partial sweeps. We also apply our classifier to whole genomes from eight mosquito populations sampled across sub-Saharan Africa by the Anopheles gambiae 1000 Genomes Consortium, elucidating both continent-wide patterns as well as sweeps unique to specific geographic regions. These populations have experienced intense insecticide exposure over the past two decades, and we observe a strong overrepresentation of sweeps at insecticide resistance loci. Our analysis thus provides a list of candidate adaptive loci that may be relevant to mosquito control efforts. More broadly, our supervised machine learning approach introduces a method to distinguish between completed and partial sweeps, as well as between hard and soft sweeps, under a variety of demographic scenarios. As whole-genome data rapidly accumulate for a greater diversity of organisms, partialS/HIC addresses an increasing demand for useful selection scan tools that can track in-progress evolutionary dynamics.
Collapse
Affiliation(s)
- Alexander T Xue
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC
| | - Andrew D Kern
- Institute of Ecology and Evolution, 5289 University of Oregon, Eugene, OR
| |
Collapse
|
26
|
Saleem A, Muylle H, Aper J, Ruttink T, Wang J, Yu D, Roldán-Ruiz I. A Genome-Wide Genetic Diversity Scan Reveals Multiple Signatures of Selection in a European Soybean Collection Compared to Chinese Collections of Wild and Cultivated Soybean Accessions. FRONTIERS IN PLANT SCIENCE 2021; 12:631767. [PMID: 33732276 PMCID: PMC7959735 DOI: 10.3389/fpls.2021.631767] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Accepted: 02/01/2021] [Indexed: 05/03/2023]
Abstract
Targeted and untargeted selections including domestication and breeding efforts can reduce genetic diversity in breeding germplasm and create selective sweeps in crop genomes. The genomic regions at which selective sweeps are detected can reveal important information about signatures of selection. We have analyzed the genetic diversity within a soybean germplasm collection relevant for breeding in Europe (the EUCLEG collection), and have identified selective sweeps through a genome-wide scan comparing that collection to Chinese soybean collections. This work involved genotyping of 480 EUCLEG soybean accessions, including 210 improved varieties, 216 breeding lines and 54 landraces using the 355K SoySNP microarray. SNP calling of 477 EUCLEG accessions together with 328 Chinese soybean accessions identified 224,993 high-quality SNP markers. Population structure analysis revealed a clear differentiation between the EUCLEG collection and the Chinese materials. Further, the EUCLEG collection was sub-structured into five subgroups that were differentiated by geographical origin. No clear association between subgroups and maturity group was detected. The genetic diversity was lower in the EUCLEG collection compared to the Chinese collections. Selective sweep analysis revealed 23 selective sweep regions distributed over 12 chromosomes. Co-localization of these selective sweep regions with previously reported QTLs and genes revealed that various signatures of selection in the EUCLEG collection may be related to domestication and improvement traits including seed protein and oil content, phenology, nitrogen fixation, yield components, diseases resistance and quality. No signatures of selection related to stem determinacy were detected. In addition, absence of signatures of selection for a substantial number of QTLs related to yield, protein content, oil content and phenological traits suggests the presence of substantial genetic diversity in the EUCLEG collection. Taken together, the results obtained demonstrate that the available genetic diversity in the EUCLEG collection can be further exploited for research and breeding purposes. However, incorporation of exotic material can be considered to broaden its genetic base.
Collapse
Affiliation(s)
- Aamir Saleem
- Plant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Melle, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
| | - Hilde Muylle
- Plant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Melle, Belgium
| | - Jonas Aper
- Plant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Melle, Belgium
| | - Tom Ruttink
- Plant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Melle, Belgium
| | - Jiao Wang
- National Center for Soybean Improvement, National Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China
| | - Deyue Yu
- National Center for Soybean Improvement, National Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China
| | - Isabel Roldán-Ruiz
- Plant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Melle, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- *Correspondence: Isabel Roldán-Ruiz,
| |
Collapse
|
27
|
Nonsynonymous Polymorphism Counts in Bacterial Genomes: a Comparative Examination. Appl Environ Microbiol 2020; 87:AEM.02002-20. [PMID: 33097502 DOI: 10.1128/aem.02002-20] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2020] [Accepted: 10/14/2020] [Indexed: 01/14/2023] Open
Abstract
Genomic data reveal single-nucleotide polymorphisms (SNPs) that may carry information about the evolutionary history of bacteria. However, it remains unclear what inferences about selection can be made from genomic SNP data. Bacterial species are often sampled during epidemic outbreaks or within hosts during the course of chronic infections. SNPs obtained from genomic analysis of these data are not necessarily fixed. Treating them as fixed during analysis by using measures such as the ratio of nonsynonymous to synonymous evolutionary changes (dN/dS) may lead to incorrect inferences about the strength and direction of selection. In this study, we consider data from a range of whole-genome sequencing studies of bacterial pathogens and explore patterns of nonsynonymous variation to assess whether evidence of selection can be identified by investigating SNP counts alone across multiple WGS studies. We visualize these SNP data in ways that highlight their relationship to neutral baseline expectations. These neutral expectations are based on a simple model of mutation, from which we simulate SNP accumulation to investigate how SNP counts are distributed under alternative assumptions about positive and negative selection. We compare these patterns with empirical SNP data and illustrate the general difficulty of detecting positive selection from SNP data. Finally, we consider whether SNP counts observed at the between-host population level differ from those observed at the within-host level and find some evidence that suggests that dynamics across these two scales are driven by different underlying processes.IMPORTANCE Identifying selection from SNP data obtained from whole-genome sequencing studies is challenging. Some current measures used to identify and quantify selection acting on genomes rely on fixed differences; thus, these are inappropriate for SNP data where variants are not fixed. With the increase in whole-genome sequencing studies, it is important to consider SNP data in the context of evolutionary processes. How SNPs are counted and analyzed can help in understanding mutation accumulation and trajectories of strains. We developed a tool for identifying possible evidence of selection and for comparative analysis with other SNP data. We propose a model that provides a rule-of-thumb guideline and two new visualization techniques that can be used to interpret and compare SNP data. We quantify the expected proportion of nonsynonymous SNPs in coding regions under neutrality and demonstrate its use in identifying evidence of positive and negative selection from simulations and empirical data.
Collapse
|
28
|
Walsh S, Izquierdo-Serra M, Acosta S, Edo A, Lloret M, Moret R, Bosch E, Oliva B, Bertranpetit J, Fernández-Fernández JM. Adaptive selection drives TRPP3 loss-of-function in an Ethiopian population. Sci Rep 2020; 10:20999. [PMID: 33268808 PMCID: PMC7710729 DOI: 10.1038/s41598-020-78081-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Accepted: 11/20/2020] [Indexed: 11/15/2022] Open
Abstract
TRPP3 (also called PKD2L1) is a nonselective, cation-permeable channel activated by multiple stimuli, including extracellular pH changes. TRPP3 had been considered a candidate for sour sensor in humans, due to its high expression in a subset of tongue receptor cells detecting sour, along with its membership to the TRP channel family known to function as sensory receptors. Here, we describe the functional consequences of two non-synonymous genetic variants (R278Q and R378W) found to be under strong positive selection in an Ethiopian population, the Gumuz. Electrophysiological studies and 3D modelling reveal TRPP3 loss-of-functions produced by both substitutions. R278Q impairs TRPP3 activation after alkalinisation by mislocation of H+ binding residues at the extracellular polycystin mucolipin domain. R378W dramatically reduces channel activity by altering conformation of the voltage sensor domain and hampering channel transition from closed to open state. Sour sensitivity tests in R278Q/R378W carriers argue against both any involvement of TRPP3 in sour detection and the role of such physiological process in the reported evolutionary positive selection past event.
Collapse
Affiliation(s)
- Sandra Walsh
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Dr. Aiguader, 88, 08003, Barcelona, Catalonia, Spain
| | - Mercè Izquierdo-Serra
- Laboratory of Molecular Physiology, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003, Barcelona, Spain
| | - Sandra Acosta
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Dr. Aiguader, 88, 08003, Barcelona, Catalonia, Spain
| | - Albert Edo
- Laboratory of Molecular Physiology, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003, Barcelona, Spain
| | - María Lloret
- Laboratory of Molecular Physiology, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003, Barcelona, Spain
| | - Roser Moret
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Dr. Aiguader, 88, 08003, Barcelona, Catalonia, Spain
| | - Elena Bosch
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Dr. Aiguader, 88, 08003, Barcelona, Catalonia, Spain.,Centro de Investigación Biomédica en Red de Salud Mental (CIBERSAM), 43206, Reus, Spain
| | - Baldo Oliva
- Structural Bioinformatics Lab, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003, Barcelona, Spain
| | - Jaume Bertranpetit
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Dr. Aiguader, 88, 08003, Barcelona, Catalonia, Spain.
| | - José Manuel Fernández-Fernández
- Laboratory of Molecular Physiology, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003, Barcelona, Spain.
| |
Collapse
|
29
|
|
30
|
Walsh S, Pagani L, Xue Y, Laayouni H, Tyler-Smith C, Bertranpetit J. Positive selection in admixed populations from Ethiopia. BMC Genet 2020; 21:108. [PMID: 33092534 PMCID: PMC7580818 DOI: 10.1186/s12863-020-00908-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 08/27/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND In the process of adaptation of humans to their environment, positive or adaptive selection has played a main role. Positive selection has, however, been under-studied in African populations, despite their diversity and importance for understanding human history. RESULTS Here, we have used 119 available whole-genome sequences from five Ethiopian populations (Amhara, Oromo, Somali, Wolayta and Gumuz) to investigate the modes and targets of positive selection in this part of the world. The site frequency spectrum-based test SFselect was applied to idfentify a wide range of events of selection (old and recent), and the haplotype-based statistic integrated haplotype score to detect more recent events, in each case with evaluation of the significance of candidate signals by extensive simulations. Additional insights were provided by considering admixture proportions and functional categories of genes. We identified both individual loci that are likely targets of classic sweeps and groups of genes that may have experienced polygenic adaptation. We found population-specific as well as shared signals of selection, with folate metabolism and the related ultraviolet response and skin pigmentation standing out as a shared pathway, perhaps as a response to the high levels of ultraviolet irradiation, and in addition strong signals in genes such as IFNA, MRC1, immunoglobulins and T-cell receptors which contribute to defend against pathogens. CONCLUSIONS Signals of positive selection were detected in Ethiopian populations revealing novel adaptations in East Africa, and abundant targets for functional follow-up.
Collapse
Affiliation(s)
- Sandra Walsh
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Dr. Aiguader, 88 08003, Barcelona, Catalonia, Spain
| | - Luca Pagani
- Estonian Biocentre, Institute of Genomics, University of Tartu, 51010, Tartu, Estonia
- Department of Biology, University of Padova, 35131, Padova, Italy
| | - Yali Xue
- The Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Hafid Laayouni
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Dr. Aiguader, 88 08003, Barcelona, Catalonia, Spain
- Bioinformatics Studies, ESCI-UPF, Barcelona, Catalonia, Spain
| | - Chris Tyler-Smith
- The Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.
| | - Jaume Bertranpetit
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Dr. Aiguader, 88 08003, Barcelona, Catalonia, Spain.
| |
Collapse
|
31
|
Marchi N, Excoffier L. Gene flow as a simple cause for an excess of high-frequency-derived alleles. Evol Appl 2020; 13:2254-2263. [PMID: 33005222 PMCID: PMC7513730 DOI: 10.1111/eva.12998] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Revised: 04/30/2020] [Accepted: 05/04/2020] [Indexed: 01/19/2023] Open
Abstract
Most human populations exhibit an excess of high-frequency variants, leading to a U-shaped site-frequency spectrum (uSFS). This pattern has been generally interpreted as a signature of ongoing episodes of positive selection, or as evidence for a mis-assignment of ancestral/derived allelic states, but uSFS has also been observed in populations receiving gene flow from a ghost population, in structured populations, or after range expansions. In order to better explain the prevalence of high-frequency variants in humans and other populations, we describe here which patterns of gene flow and population demography can lead to uSFS by using extensive coalescent simulations. We find that uSFS can often be observed in a population if gene flow brings a few ancestral alleles from a well-differentiated population. Gene flow can either consist in single pulses of admixture or continuous immigration, but different demographic conditions are necessary to observe uSFS in these two scenarios. Indeed, an extremely low and recent gene flow is required in the case of single admixture events, while with continuous immigration, uSFS occurs only if gene flow started recently at a high rate or if it lasted for a long time at a low rate. Overall, we find that a neutral uSFS occurs under more restrictive conditions in populations having received single pulses of gene flow than in populations exposed to continuous gene flow. We also show that the uSFS observed in human populations from the 1000 Genomes Project can easily be explained by gene flow from surrounding populations without requiring past episodes of positive selection. These results imply that uSFS should be common in non-isolated populations, such as most wild or domesticated plants and animals.
Collapse
Affiliation(s)
- Nina Marchi
- CMPGInstitute of Ecology and EvolutionUniversity of BerneBerneSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Laurent Excoffier
- CMPGInstitute of Ecology and EvolutionUniversity of BerneBerneSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| |
Collapse
|
32
|
Schrider DR. Background Selection Does Not Mimic the Patterns of Genetic Diversity Produced by Selective Sweeps. Genetics 2020; 216:499-519. [PMID: 32847814 PMCID: PMC7536861 DOI: 10.1534/genetics.120.303469] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 08/04/2020] [Indexed: 12/28/2022] Open
Abstract
It is increasingly evident that natural selection plays a prominent role in shaping patterns of diversity across the genome. The most commonly studied modes of natural selection are positive selection and negative selection, which refer to directional selection for and against derived mutations, respectively. Positive selection can result in hitchhiking events, in which a beneficial allele rapidly replaces all others in the population, creating a valley of diversity around the selected site along with characteristic skews in allele frequencies and linkage disequilibrium among linked neutral polymorphisms. Similarly, negative selection reduces variation not only at selected sites but also at linked sites, a phenomenon called background selection (BGS). Thus, discriminating between these two forces may be difficult, and one might expect efforts to detect hitchhiking to produce an excess of false positives in regions affected by BGS. Here, we examine the similarity between BGS and hitchhiking models via simulation. First, we show that BGS may somewhat resemble hitchhiking in simplistic scenarios in which a region constrained by negative selection is flanked by large stretches of unconstrained sites, echoing previous results. However, this scenario does not mirror the actual spatial arrangement of selected sites across the genome. By performing forward simulations under more realistic scenarios of BGS, modeling the locations of protein-coding and conserved noncoding DNA in real genomes, we show that the spatial patterns of variation produced by BGS rarely mimic those of hitchhiking events. Indeed, BGS is not substantially more likely than neutrality to produce false signatures of hitchhiking. This holds for simulations modeled after both humans and Drosophila, and for several different demographic histories. These results demonstrate that appropriately designed scans for hitchhiking need not consider BGS's impact on false-positive rates. However, we do find evidence that BGS increases the false-negative rate for hitchhiking, an observation that demands further investigation.
Collapse
Affiliation(s)
- Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27514
| |
Collapse
|
33
|
Horscroft C, Ennis S, Pengelly RJ, Sluckin TJ, Collins A. Sequencing era methods for identifying signatures of selection in the genome. Brief Bioinform 2020; 20:1997-2008. [PMID: 30053138 DOI: 10.1093/bib/bby064] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Revised: 05/16/2018] [Indexed: 12/12/2022] Open
Abstract
Insights into genetic loci which are under selection and their functional roles contribute to increased understanding of the patterns of phenotypic variation we observe today. The availability of whole-genome sequence data, for humans and other species, provides opportunities to investigate adaptation and evolution at unprecedented resolution. Many analytical methods have been developed to interrogate these large data sets and characterize signatures of selection in the genome. We review here recently developed methods and consider the impact of increased computing power and data availability on the detection of selection signatures. Consideration of demography, recombination and other confounding factors is important, and use of a range of methods in combination is a powerful route to resolving different forms of selection in genome sequence data. Overall, a substantial improvement in methods for application to whole-genome sequencing is evident, although further work is required to develop robust and computationally efficient approaches which may increase reproducibility across studies.
Collapse
Affiliation(s)
- Clare Horscroft
- Genetic Epidemiology and Bioinformatics, Faculty of Medicine, University of Southampton, Duthie Building (808), Tremona Road, Southampton, UK.,Institute for Life Sciences, University of Southampton, Life Sciences Building (85), Highfield, Southampton, UK
| | - Sarah Ennis
- Genetic Epidemiology and Bioinformatics, Faculty of Medicine, University of Southampton, Duthie Building (808), Tremona Road, Southampton, UK.,Institute for Life Sciences, University of Southampton, Life Sciences Building (85), Highfield, Southampton, UK
| | - Reuben J Pengelly
- Genetic Epidemiology and Bioinformatics, Faculty of Medicine, University of Southampton, Duthie Building (808), Tremona Road, Southampton, UK.,Institute for Life Sciences, University of Southampton, Life Sciences Building (85), Highfield, Southampton, UK
| | - Timothy J Sluckin
- Institute for Life Sciences, University of Southampton, Life Sciences Building (85), Highfield, Southampton, UK.,Mathematical Sciences, University of Southampton, Highfield, Southampton, UK
| | - Andrew Collins
- Genetic Epidemiology and Bioinformatics, Faculty of Medicine, University of Southampton, Duthie Building (808), Tremona Road, Southampton, UK.,Institute for Life Sciences, University of Southampton, Life Sciences Building (85), Highfield, Southampton, UK
| |
Collapse
|
34
|
Murga-Moreno J, Coronado-Zamora M, Hervas S, Casillas S, Barbadilla A. iMKT: the integrative McDonald and Kreitman test. Nucleic Acids Res 2020; 47:W283-W288. [PMID: 31081014 PMCID: PMC6602517 DOI: 10.1093/nar/gkz372] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Revised: 04/18/2019] [Accepted: 05/03/2019] [Indexed: 01/07/2023] Open
Abstract
The McDonald and Kreitman test (MKT) is one of the most powerful and widely used methods to detect and quantify recurrent natural selection using DNA sequence data. Here we present iMKT (acronym for integrative McDonald and Kreitman test), a novel web-based service performing four distinct MKT types. It allows the detection and estimation of four different selection regimes −adaptive, neutral, strongly deleterious and weakly deleterious− acting on any genomic sequence. iMKT can analyze both user's own population genomic data and pre-loaded Drosophila melanogaster and human sequences of protein-coding genes obtained from the largest population genomic datasets to date. Advanced options in the website allow testing complex hypotheses such as the application example showed here: do genes located in high recombination regions undergo higher rates of adaptation? We aim that iMKT will become a reference site tool for the study of evolutionary adaptation in massive population genomics datasets, especially in Drosophila and humans. iMKT is a free resource online at https://imkt.uab.cat.
Collapse
Affiliation(s)
- Jesús Murga-Moreno
- Institut de Biotecnologia i de Biomedicina and Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| | - Marta Coronado-Zamora
- Institut de Biotecnologia i de Biomedicina and Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| | - Sergi Hervas
- Institut de Biotecnologia i de Biomedicina and Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| | - Sònia Casillas
- Institut de Biotecnologia i de Biomedicina and Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| | - Antonio Barbadilla
- Institut de Biotecnologia i de Biomedicina and Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| |
Collapse
|
35
|
Azodi CB, Tang J, Shiu SH. Opening the Black Box: Interpretable Machine Learning for Geneticists. Trends Genet 2020; 36:442-455. [PMID: 32396837 DOI: 10.1016/j.tig.2020.03.005] [Citation(s) in RCA: 127] [Impact Index Per Article: 25.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Revised: 03/12/2020] [Accepted: 03/16/2020] [Indexed: 01/16/2023]
Abstract
Because of its ability to find complex patterns in high dimensional and heterogeneous data, machine learning (ML) has emerged as a critical tool for making sense of the growing amount of genetic and genomic data available. While the complexity of ML models is what makes them powerful, it also makes them difficult to interpret. Fortunately, efforts to develop approaches that make the inner workings of ML models understandable to humans have improved our ability to make novel biological insights. Here, we discuss the importance of interpretable ML, different strategies for interpreting ML models, and examples of how these strategies have been applied. Finally, we identify challenges and promising future directions for interpretable ML in genetics and genomics.
Collapse
Affiliation(s)
- Christina B Azodi
- Department of Plant Biology, Michigan State University, East Lansing, MI, USA; Bioinformatics and Cellular Genomics, St. Vincent's Institute of Medical Research, Fitzroy, Victoria, Australia.
| | - Jiliang Tang
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Shin-Han Shiu
- Department of Plant Biology, Michigan State University, East Lansing, MI, USA; Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI, USA.
| |
Collapse
|
36
|
Hejase HA, Dukler N, Siepel A. From Summary Statistics to Gene Trees: Methods for Inferring Positive Selection. Trends Genet 2020; 36:243-258. [PMID: 31954511 PMCID: PMC7177178 DOI: 10.1016/j.tig.2019.12.008] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 11/15/2019] [Accepted: 12/11/2019] [Indexed: 01/01/2023]
Abstract
Methods to detect signals of natural selection from genomic data have traditionally emphasized the use of simple summary statistics. Here, we review a new generation of methods that consider combinations of conventional summary statistics and/or richer features derived from inferred gene trees and ancestral recombination graphs (ARGs). We also review recent advances in methods for population genetic simulation and ARG reconstruction. Finally, we describe opportunities for future work on a variety of related topics, including the genetics of speciation, estimation of selection coefficients, and inference of selection on polygenic traits. Together, these emerging methods offer promising new directions in the study of natural selection.
Collapse
Affiliation(s)
- Hussein A Hejase
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
| | - Noah Dukler
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| |
Collapse
|
37
|
Satta Y, Zheng W, Nishiyama KV, Iwasaki RL, Hayakawa T, Fujito NT, Takahata N. Two-dimensional site frequency spectrum for detecting, classifying and dating incomplete selective sweeps. Genes Genet Syst 2019; 94:283-300. [DOI: 10.1266/ggs.19-00012] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Affiliation(s)
- Yoko Satta
- School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies)
| | - Wanjing Zheng
- School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies)
| | - Kumiko V. Nishiyama
- School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies)
| | - Risa L. Iwasaki
- School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies)
| | - Toshiyuki Hayakawa
- Graduate School of Systems Life Sciences and Faculty of Arts and Science, Kyushu University
| | - Naoko T. Fujito
- Institute for Human Genetics and Department of Epidemiology and Biostatistics, University of California
| | - Naoyuki Takahata
- School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies)
| |
Collapse
|
38
|
Torada L, Lorenzon L, Beddis A, Isildak U, Pattini L, Mathieson S, Fumagalli M. ImaGene: a convolutional neural network to quantify natural selection from genomic data. BMC Bioinformatics 2019; 20:337. [PMID: 31757205 PMCID: PMC6873651 DOI: 10.1186/s12859-019-2927-x] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Accepted: 05/31/2019] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND The genetic bases of many complex phenotypes are still largely unknown, mostly due to the polygenic nature of the traits and the small effect of each associated mutation. An alternative approach to classic association studies to determining such genetic bases is an evolutionary framework. As sites targeted by natural selection are likely to harbor important functionalities for the carrier, the identification of selection signatures in the genome has the potential to unveil the genetic mechanisms underpinning human phenotypes. Popular methods of detecting such signals rely on compressing genomic information into summary statistics, resulting in the loss of information. Furthermore, few methods are able to quantify the strength of selection. Here we explored the use of deep learning in evolutionary biology and implemented a program, called ImaGene, to apply convolutional neural networks on population genomic data for the detection and quantification of natural selection. RESULTS ImaGene enables genomic information from multiple individuals to be represented as abstract images. Each image is created by stacking aligned genomic data and encoding distinct alleles into separate colors. To detect and quantify signatures of positive selection, ImaGene implements a convolutional neural network which is trained using simulations. We show how the method implemented in ImaGene can be affected by data manipulation and learning strategies. In particular, we show how sorting images by row and column leads to accurate predictions. We also demonstrate how the misspecification of the correct demographic model for producing training data can influence the quantification of positive selection. We finally illustrate an approach to estimate the selection coefficient, a continuous variable, using multiclass classification techniques. CONCLUSIONS While the use of deep learning in evolutionary genomics is in its infancy, here we demonstrated its potential to detect informative patterns from large-scale genomic data. We implemented methods to process genomic data for deep learning in a user-friendly program called ImaGene. The joint inference of the evolutionary history of mutations and their functional impact will facilitate mapping studies and provide novel insights into the molecular mechanisms associated with human phenotypes.
Collapse
Affiliation(s)
- Luis Torada
- Department of Life Sciences, Silwood Park campus, Imperial College London, Buckhurst Road, Ascot, SL5 7PY UK
| | - Lucrezia Lorenzon
- Department of Life Sciences, Silwood Park campus, Imperial College London, Buckhurst Road, Ascot, SL5 7PY UK
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, piazza Leonardo da Vinci 32, Milan, 20133 Italy
| | - Alice Beddis
- Department of Life Sciences, Silwood Park campus, Imperial College London, Buckhurst Road, Ascot, SL5 7PY UK
| | - Ulas Isildak
- Department of Biological Sciences, Middle East Technical University, METU Üniversiteler Mah. Dumlupınar Blv. No:1, Ankara, 06800 Çankaya Turkey
| | - Linda Pattini
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, piazza Leonardo da Vinci 32, Milan, 20133 Italy
| | - Sara Mathieson
- Department of Computer Science, Swarthmore College, 500 College Ave, Swarthmore, 19081 PA USA
| | - Matteo Fumagalli
- Department of Life Sciences, Silwood Park campus, Imperial College London, Buckhurst Road, Ascot, SL5 7PY UK
| |
Collapse
|
39
|
Stern AJ, Wilton PR, Nielsen R. An approximate full-likelihood method for inferring selection and allele frequency trajectories from DNA sequence data. PLoS Genet 2019; 15:e1008384. [PMID: 31518343 PMCID: PMC6760815 DOI: 10.1371/journal.pgen.1008384] [Citation(s) in RCA: 77] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 09/25/2019] [Accepted: 08/26/2019] [Indexed: 12/24/2022] Open
Abstract
Most current methods for detecting natural selection from DNA sequence data are limited in that they are either based on summary statistics or a composite likelihood, and as a consequence, do not make full use of the information available in DNA sequence data. We here present a new importance sampling approach for approximating the full likelihood function for the selection coefficient. Our method CLUES treats the ancestral recombination graph (ARG) as a latent variable that is integrated out using previously published Markov Chain Monte Carlo (MCMC) methods. The method can be used for detecting selection, estimating selection coefficients, testing models of changes in the strength of selection, estimating the time of the start of a selective sweep, and for inferring the allele frequency trajectory of a selected or neutral allele. We perform extensive simulations to evaluate the method and show that it uniformly improves power to detect selection compared to current popular methods such as nSL and SDS, and can provide reliable inferences of allele frequency trajectories under many conditions. We also explore the potential of our method to detect extremely recent changes in the strength of selection. We use the method to infer the past allele frequency trajectory for a lactase persistence SNP (MCM6) in Europeans. We also infer the trajectory of a SNP (EDAR) in Han Chinese, finding evidence that this allele's age is much older than previously claimed. We also study a set of 11 pigmentation-associated variants. Several genes show evidence of strong selection particularly within the last 5,000 years, including ASIP, KITLG, and TYR. However, selection on OCA2/HERC2 seems to be much older and, in contrast to previous claims, we find no evidence of selection on TYRP1.
Collapse
Affiliation(s)
- Aaron J. Stern
- Graduate Group in Computation Biology, University of California, Berkeley, Berkeley, California, United States of America
| | - Peter R. Wilton
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
| | - Rasmus Nielsen
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
- Department of Statistics, University of California, Berkeley, Berkeley, California, United States of America
| |
Collapse
|
40
|
Coronado-Zamora M, Salvador-Martínez I, Castellano D, Barbadilla A, Salazar-Ciudad I. Adaptation and Conservation throughout the Drosophila melanogaster Life-Cycle. Genome Biol Evol 2019; 11:1463-1482. [PMID: 31028390 PMCID: PMC6535812 DOI: 10.1093/gbe/evz086] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/16/2019] [Indexed: 01/09/2023] Open
Abstract
Previous studies of the evolution of genes expressed at different life-cycle stages of Drosophila melanogaster have not been able to disentangle adaptive from nonadaptive substitutions when using nonsynonymous sites. Here, we overcome this limitation by combining whole-genome polymorphism data from D. melanogaster and divergence data between D. melanogaster and Drosophila yakuba. For the set of genes expressed at different life-cycle stages of D. melanogaster, as reported in modENCODE, we estimate the ratio of substitutions relative to polymorphism between nonsynonymous and synonymous sites (α) and then α is discomposed into the ratio of adaptive (ωa) and nonadaptive (ωna) substitutions to synonymous substitutions. We find that the genes expressed in mid- and late-embryonic development are the most conserved, whereas those expressed in early development and postembryonic stages are the least conserved. Importantly, we found that low conservation in early development is due to high rates of nonadaptive substitutions (high ωna), whereas in postembryonic stages it is due, instead, to high rates of adaptive substitutions (high ωa). By using estimates of different genomic features (codon bias, average intron length, exon number, recombination rate, among others), we also find that genes expressed in mid- and late-embryonic development show the most complex architecture: they are larger, have more exons, more transcripts, and longer introns. In addition, these genes are broadly expressed among all stages. We suggest that all these genomic features are related to the conservation of mid- and late-embryonic development. Globally, our study supports the hourglass pattern of conservation and adaptation over the life-cycle.
Collapse
Affiliation(s)
- Marta Coronado-Zamora
- Genomics, Bioinformatics and Evolution, Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
| | - Irepan Salvador-Martínez
- Evo-Devo Helsinki Community, Centre of Excellence in Experimental and Computational Developmental Biology, Institute of Biotechnology, University of Helsinki, Finland.,Department of Genetics, Evolution and Environment, University College London, United Kingdom
| | | | - Antonio Barbadilla
- Genomics, Bioinformatics and Evolution, Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
| | - Isaac Salazar-Ciudad
- Genomics, Bioinformatics and Evolution, Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain.,Evo-Devo Helsinki Community, Centre of Excellence in Experimental and Computational Developmental Biology, Institute of Biotechnology, University of Helsinki, Finland.,Centre de Recerca Matemàtica, Cerdanyola del Vallès, Spain
| |
Collapse
|
41
|
Casillas S, Mulet R, Villegas-Mirón P, Hervas S, Sanz E, Velasco D, Bertranpetit J, Laayouni H, Barbadilla A. PopHuman: the human population genomics browser. Nucleic Acids Res 2019; 46:D1003-D1010. [PMID: 29059408 PMCID: PMC5753332 DOI: 10.1093/nar/gkx943] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Accepted: 10/04/2017] [Indexed: 12/17/2022] Open
Abstract
The 1000 Genomes Project (1000GP) represents the most comprehensive world-wide nucleotide variation data set so far in humans, providing the sequencing and analysis of 2504 genomes from 26 populations and reporting >84 million variants. The availability of this sequence data provides the human lineage with an invaluable resource for population genomics studies, allowing the testing of molecular population genetics hypotheses and eventually the understanding of the evolutionary dynamics of genetic variation in human populations. Here we present PopHuman, a new population genomics-oriented genome browser based on JBrowse that allows the interactive visualization and retrieval of an extensive inventory of population genetics metrics. Efficient and reliable parameter estimates have been computed using a novel pipeline that faces the unique features and limitations of the 1000GP data, and include a battery of nucleotide variation measures, divergence and linkage disequilibrium parameters, as well as different tests of neutrality, estimated in non-overlapping windows along the chromosomes and in annotated genes for all 26 populations of the 1000GP. PopHuman is open and freely available at http://pophuman.uab.cat.
Collapse
Affiliation(s)
- Sònia Casillas
- Institut de Biotecnologia i de Biomedicina and Department de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
- To whom correspondence should be addressed. Sònia Casillas. Tel: +34 93 5868958; Fax: +34 93 5812011; . Correspondence may also be addressed to Antonio Barbadilla.
| | - Roger Mulet
- Institut de Biotecnologia i de Biomedicina and Department de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| | - Pablo Villegas-Mirón
- Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra, Doctor Aiguader 88 (PRBB), 08003 Barcelona, Catalonia, Spain
| | - Sergi Hervas
- Institut de Biotecnologia i de Biomedicina and Department de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| | - Esteve Sanz
- Servei de Genòmica i Bioinformàtica, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| | - Daniel Velasco
- Institut de Biotecnologia i de Biomedicina and Department de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| | - Jaume Bertranpetit
- Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra, Doctor Aiguader 88 (PRBB), 08003 Barcelona, Catalonia, Spain
| | - Hafid Laayouni
- Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra, Doctor Aiguader 88 (PRBB), 08003 Barcelona, Catalonia, Spain
- Bioinformatics Studies, ESCI-UPF, Pg. Pujades 1, 08003 Barcelona, Spain
| | - Antonio Barbadilla
- Institut de Biotecnologia i de Biomedicina and Department de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
- Servei de Genòmica i Bioinformàtica, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
- To whom correspondence should be addressed. Sònia Casillas. Tel: +34 93 5868958; Fax: +34 93 5812011; . Correspondence may also be addressed to Antonio Barbadilla.
| |
Collapse
|
42
|
Adams RH, Schield DR, Castoe TA. Recent Advances in the Inference of Gene Flow from Population Genomic Data. ACTA ACUST UNITED AC 2019. [DOI: 10.1007/s40610-019-00120-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
43
|
Salvador-Martínez I, Coronado-Zamora M, Castellano D, Barbadilla A, Salazar-Ciudad I. Mapping Selection within Drosophila melanogaster Embryo's Anatomy. Mol Biol Evol 2019; 35:66-79. [PMID: 29040697 DOI: 10.1093/molbev/msx266] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
We present a survey of selection across Drosophila melanogaster embryonic anatomy. Our approach integrates genomic variation, spatial gene expression patterns, and development with the aim of mapping adaptation over the entire embryo's anatomy. Our adaptation map is based on analyzing spatial gene expression information for 5,969 genes (from text-based annotations of in situ hybridization data directly from the BDGP database, Tomancak et al. 2007) and the polymorphism and divergence in these genes (from the project DGRP, Mackay et al. 2012).The proportion of nonsynonymous substitutions that are adaptive, neutral, or slightly deleterious are estimated for the set of genes expressed in each embryonic anatomical structure using the distribution of fitness effects-alpha method (Eyre-Walker and Keightley 2009). This method is a robust derivative of the McDonald and Kreitman test (McDonald and Kreitman 1991). We also explore whether different anatomical structures differ in the phylogenetic age, codon usage, or expression bias of the genes they express and whether genes expressed in many anatomical structures show more adaptive substitutions than other genes.We found that: 1) most of the digestive system and ectoderm-derived structures are under selective constraint, 2) the germ line and some specific mesoderm-derived structures show high rates of adaptive substitution, and 3) the genes that are expressed in a small number of anatomical structures show higher expression bias, lower phylogenetic ages, and less constraint.
Collapse
Affiliation(s)
- Irepan Salvador-Martínez
- Evo-devo Helsinki Community, Centre of Excellence in Experimental and Computational Developmental Biology, Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Marta Coronado-Zamora
- Departament de Genètica i de Microbiologia, Genomics, Bioinformatics and Evolution, Departament de Genètica i Microbiologia, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
| | - David Castellano
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Antonio Barbadilla
- Departament de Genètica i de Microbiologia, Genomics, Bioinformatics and Evolution, Departament de Genètica i Microbiologia, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
| | - Isaac Salazar-Ciudad
- Evo-devo Helsinki Community, Centre of Excellence in Experimental and Computational Developmental Biology, Institute of Biotechnology, University of Helsinki, Helsinki, Finland.,Departament de Genètica i de Microbiologia, Genomics, Bioinformatics and Evolution, Departament de Genètica i Microbiologia, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
| |
Collapse
|
44
|
Flagel L, Brandvain Y, Schrider DR. The Unreasonable Effectiveness of Convolutional Neural Networks in Population Genetic Inference. Mol Biol Evol 2019; 36:220-238. [PMID: 30517664 PMCID: PMC6367976 DOI: 10.1093/molbev/msy224] [Citation(s) in RCA: 105] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Population-scale genomic data sets have given researchers incredible amounts of information from which to infer evolutionary histories. Concomitant with this flood of data, theoretical and methodological advances have sought to extract information from genomic sequences to infer demographic events such as population size changes and gene flow among closely related populations/species, construct recombination maps, and uncover loci underlying recent adaptation. To date, most methods make use of only one or a few summaries of the input sequences and therefore ignore potentially useful information encoded in the data. The most sophisticated of these approaches involve likelihood calculations, which require theoretical advances for each new problem, and often focus on a single aspect of the data (e.g., only allele frequency information) in the interest of mathematical and computational tractability. Directly interrogating the entirety of the input sequence data in a likelihood-free manner would thus offer a fruitful alternative. Here, we accomplish this by representing DNA sequence alignments as images and using a class of deep learning methods called convolutional neural networks (CNNs) to make population genetic inferences from these images. We apply CNNs to a number of evolutionary questions and find that they frequently match or exceed the accuracy of current methods. Importantly, we show that CNNs perform accurate evolutionary model selection and parameter estimation, even on problems that have not received detailed theoretical treatments. Thus, when applied to population genetic alignments, CNNs are capable of outperforming expert-derived statistical methods and offer a new path forward in cases where no likelihood approach exists.
Collapse
Affiliation(s)
- Lex Flagel
- Monsanto Company, Chesterfield, MO
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN
| | - Yaniv Brandvain
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC
| |
Collapse
|
45
|
Abstract
Identifying genomic locations of natural selection from sequence data is an ongoing challenge in population genetics. Current methods utilizing information combined from several summary statistics typically assume no correlation of summary statistics regardless of the genomic location from which they are calculated. However, due to linkage disequilibrium, summary statistics calculated at nearby genomic positions are highly correlated. We introduce an approach termed Trendsetter that accounts for the similarity of statistics calculated from adjacent genomic regions through trend filtering, while reducing the effects of multicollinearity through regularization. Our penalized regression framework has high power to detect sweeps, is capable of classifying sweep regions as either hard or soft, and can be applied to other selection scenarios as well. We find that Trendsetter is robust to both extensive missing data and strong background selection, and has comparable power to similar current approaches. Moreover, the model learned by Trendsetter can be viewed as a set of curves modeling the spatial distribution of summary statistics in the genome. Application to human genomic data revealed positively selected regions previously discovered such as LCT in Europeans and EDAR in East Asians. We also identified a number of novel candidates and show that populations with greater relatedness share more sweep signals.
Collapse
Affiliation(s)
- Mehreen R Mughal
- Bioinformatics and Genomics at the Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA
| | - Michael DeGiorgio
- Departments of Biology and Statistics, Pennsylvania State University,University Park, PA
- Institute for CyberScience, Pennsylvania State University, University Park, PA
| |
Collapse
|
46
|
Edge MD, Coop G. Reconstructing the History of Polygenic Scores Using Coalescent Trees. Genetics 2019; 211:235-262. [PMID: 30389808 PMCID: PMC6325695 DOI: 10.1534/genetics.118.301687] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Accepted: 10/23/2018] [Indexed: 11/18/2022] Open
Abstract
Genome-wide association studies (GWAS) have revealed that many traits are highly polygenic, in that their within-population variance is governed, in part, by small-effect variants at many genetic loci. Standard population-genetic methods for inferring evolutionary history are ill-suited for polygenic traits: when there are many variants of small effect, signatures of natural selection are spread across the genome and are subtle at any one locus. In the last several years, various methods have emerged for detecting the action of natural selection on polygenic scores, sums of genotypes weighted by GWAS effect sizes. However, most existing methods do not reveal the timing or strength of selection. Here, we present a set of methods for estimating the historical time course of a population-mean polygenic score using local coalescent trees at GWAS loci. These time courses are estimated by using coalescent theory to relate the branch lengths of trees to allele-frequency change. The resulting time course can be tested for evidence of natural selection. We present theory and simulations supporting our procedures, as well as estimated time courses of polygenic scores for human height. Because of its grounding in coalescent theory, the framework presented here can be extended to a variety of demographic scenarios, and its usefulness will increase as both GWAS and ancestral-recombination-graph inference continue to progress.
Collapse
Affiliation(s)
- Michael D Edge
- Center for Population Biology, Department of Evolution and Ecology, University of California, Davis, California 95616
| | - Graham Coop
- Center for Population Biology, Department of Evolution and Ecology, University of California, Davis, California 95616
| |
Collapse
|
47
|
Villanea FA, Schraiber JG. Multiple episodes of interbreeding between Neanderthal and modern humans. Nat Ecol Evol 2019; 3:39-44. [PMID: 30478305 PMCID: PMC6309227 DOI: 10.1038/s41559-018-0735-8] [Citation(s) in RCA: 82] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2018] [Accepted: 10/18/2018] [Indexed: 11/30/2022]
Abstract
Neanderthals and anatomically modern humans overlapped geographically for a period of over 30,000 years following human migration out of Africa. During this period, Neanderthals and humans interbred, as evidenced by Neanderthal portions of the genome carried by non-African individuals today. A key observation is that the proportion of Neanderthal ancestry is ~12-20% higher in East Asian individuals relative to European individuals. Here, we explore various demographic models that could explain this observation. These include distinguishing between a single admixture event and multiple Neanderthal contributions to either population, and the hypothesis that reduced Neanderthal ancestry in modern Europeans resulted from more recent admixture with a ghost population that lacked a Neanderthal ancestry component (the 'dilution' hypothesis). To summarize the asymmetric pattern of Neanderthal allele frequencies, we compiled the joint fragment frequency spectrum of European and East Asian Neanderthal fragments and compared it with both analytical theory and data simulated under various models of admixture. Using maximum-likelihood and machine learning, we found that a simple model of a single admixture did not fit the empirical data, and instead favour a model of multiple episodes of gene flow into both European and East Asian populations. These findings indicate a longer-term, more complex interaction between humans and Neanderthals than was previously appreciated.
Collapse
Affiliation(s)
- Fernando A Villanea
- Department of Biology, Temple University, Philadelphia, PA, USA
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - Joshua G Schraiber
- Department of Biology, Temple University, Philadelphia, PA, USA.
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA.
| |
Collapse
|
48
|
He L, Xiong L, Zhang A, Li Y, Huang R, Liao L, Zhu Z, Wang Y. Changes in gene and genotype frequencies during the development of the grass carp Ctenopharyngodon idella. JOURNAL OF FISH BIOLOGY 2018; 93:1113-1120. [PMID: 30281158 DOI: 10.1111/jfb.13828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2018] [Accepted: 09/28/2018] [Indexed: 06/08/2023]
Abstract
In this study, a full-sib population of Ctenopharyngodon idella was constructed and approximately 500 C. idella individuals were sampled at four early developmental stages (hatching, first feeding, juvenile fish and young fish). Four DNA pools were constructed and subjected to next-generation sequencing. On the basis of the identification of single nucleotide polymorphisms (SNP), changes in gene and genotype frequencies during the developmental progress of C. idella were revealed, which indicates that death during the early developmental stage is not a random process. These findings will establish the basis for further studies performed for identifying superior alleles or genotypes as target markers for molecular breeding.
Collapse
Affiliation(s)
- LiB He
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| | - Lv Xiong
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
- University of Chinese Academy of Sciences, Beijing, China
| | - AiD Zhang
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| | - YongM Li
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| | - Rong Huang
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| | - LanJ Liao
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| | - ZuoY Zhu
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| | - YaP Wang
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| |
Collapse
|
49
|
Detection and Classification of Hard and Soft Sweeps from Unphased Genotypes by Multilocus Genotype Identity. Genetics 2018; 210:1429-1452. [PMID: 30315068 DOI: 10.1534/genetics.118.301502] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2018] [Accepted: 10/08/2018] [Indexed: 11/18/2022] Open
Abstract
Positive natural selection can lead to a decrease in genomic diversity at the selected site and at linked sites, producing a characteristic signature of elevated expected haplotype homozygosity. These selective sweeps can be hard or soft. In the case of a hard selective sweep, a single adaptive haplotype rises to high population frequency, whereas multiple adaptive haplotypes sweep through the population simultaneously in a soft sweep, producing distinct patterns of genetic variation in the vicinity of the selected site. Measures of expected haplotype homozygosity have previously been used to detect sweeps in multiple study systems. However, these methods are formulated for phased haplotype data, typically unavailable for nonmodel organisms, and some may have reduced power to detect soft sweeps due to their increased genetic diversity relative to hard sweeps. To address these limitations, we applied the H12 and H2/H1 statistics proposed in 2015 by Garud et al., which have power to detect both hard and soft sweeps, to unphased multilocus genotypes, denoting them as G12 and G2/G1. G12 (and the more direct expected homozygosity analog to H12, denoted G123) has comparable power to H12 for detecting both hard and soft sweeps. G2/G1 can be used to classify hard and soft sweeps analogously to H2/H1, conditional on a genomic region having high G12 or G123 values. The reason for this power is that, under random mating, the most frequent haplotypes will yield the most frequent multilocus genotypes. Simulations based on parameters compatible with our recent understanding of human demographic history suggest that expected homozygosity methods are best suited for detecting recent sweeps, and increase in power under recent population expansions. Finally, we find candidates for selective sweeps within the 1000 Genomes CEU, YRI, GIH, and CHB populations, which corroborate and complement existing studies.
Collapse
|
50
|
Atkinson EG, Audesse AJ, Palacios JA, Bobo DM, Webb AE, Ramachandran S, Henn BM. No Evidence for Recent Selection at FOXP2 among Diverse Human Populations. Cell 2018; 174:1424-1435.e15. [PMID: 30078708 DOI: 10.1016/j.cell.2018.06.048] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Revised: 06/15/2018] [Accepted: 06/26/2018] [Indexed: 12/18/2022]
Abstract
FOXP2, initially identified for its role in human speech, contains two nonsynonymous substitutions derived in the human lineage. Evidence for a recent selective sweep in Homo sapiens, however, is at odds with the presence of these substitutions in archaic hominins. Here, we comprehensively reanalyze FOXP2 in hundreds of globally distributed genomes to test for recent selection. We do not find evidence of recent positive or balancing selection at FOXP2. Instead, the original signal appears to have been due to sample composition. Our tests do identify an intronic region that is enriched for highly conserved sites that are polymorphic among humans, compatible with a loss of function in humans. This region is lowly expressed in relevant tissue types that were tested via RNA-seq in human prefrontal cortex and RT-PCR in immortalized human brain cells. Our results represent a substantial revision to the adaptive history of FOXP2, a gene regarded as vital to human evolution.
Collapse
Affiliation(s)
| | - Amanda Jane Audesse
- Department of Molecular Biology, Cell Biology and Biochemistry, Brown University, Providence, RI 02912, USA; Neuroscience Graduate Program, Brown University, Providence, RI 02912, USA
| | - Julia Adela Palacios
- Department of Ecology and Evolutionary Biology and Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Statistics and Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Dean Michael Bobo
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY, USA
| | - Ashley Elizabeth Webb
- Department of Molecular Biology, Cell Biology and Biochemistry, Brown University, Providence, RI 02912, USA; Carney Institute for Brain Science, Brown University, Providence, RI 02912, USA
| | - Sohini Ramachandran
- Department of Ecology and Evolutionary Biology and Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA
| | - Brenna Mariah Henn
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY, USA; Department of Anthropology and the Genome Center, University of California, Davis, Davis, CA 95616, USA.
| |
Collapse
|