1
|
Amin MR, Hasan M, Arnab SP, DeGiorgio M. Tensor Decomposition-based Feature Extraction and Classification to Detect Natural Selection from Genomic Data. Mol Biol Evol 2023; 40:msad216. [PMID: 37772983 PMCID: PMC10581699 DOI: 10.1093/molbev/msad216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 08/10/2023] [Accepted: 09/14/2023] [Indexed: 09/30/2023] Open
Abstract
Inferences of adaptive events are important for learning about traits, such as human digestion of lactose after infancy and the rapid spread of viral variants. Early efforts toward identifying footprints of natural selection from genomic data involved development of summary statistic and likelihood methods. However, such techniques are grounded in simple patterns or theoretical models that limit the complexity of settings they can explore. Due to the renaissance in artificial intelligence, machine learning methods have taken center stage in recent efforts to detect natural selection, with strategies such as convolutional neural networks applied to images of haplotypes. Yet, limitations of such techniques include estimation of large numbers of model parameters under nonconvex settings and feature identification without regard to location within an image. An alternative approach is to use tensor decomposition to extract features from multidimensional data although preserving the latent structure of the data, and to feed these features to machine learning models. Here, we adopt this framework and present a novel approach termed T-REx, which extracts features from images of haplotypes across sampled individuals using tensor decomposition, and then makes predictions from these features using classical machine learning methods. As a proof of concept, we explore the performance of T-REx on simulated neutral and selective sweep scenarios and find that it has high power and accuracy to discriminate sweeps from neutrality, robustness to common technical hurdles, and easy visualization of feature importance. Therefore, T-REx is a powerful addition to the toolkit for detecting adaptive processes from genomic data.
Collapse
Affiliation(s)
- Md Ruhul Amin
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Mahmudul Hasan
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Sandipan Paul Arnab
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Michael DeGiorgio
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| |
Collapse
|
2
|
Amin MR, Hasan M, Arnab SP, DeGiorgio M. Tensor decomposition based feature extraction and classification to detect natural selection from genomic data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.27.527731. [PMID: 37034767 PMCID: PMC10081272 DOI: 10.1101/2023.03.27.527731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Inferences of adaptive events are important for learning about traits, such as human digestion of lactose after infancy and the rapid spread of viral variants. Early efforts toward identifying footprints of natural selection from genomic data involved development of summary statistic and likelihood methods. However, such techniques are grounded in simple patterns or theoretical models that limit the complexity of settings they can explore. Due to the renaissance in artificial intelligence, machine learning methods have taken center stage in recent efforts to detect natural selection, with strategies such as convolutional neural networks applied to images of haplotypes. Yet, limitations of such techniques include estimation of large numbers of model parameters under non-convex settings and feature identification without regard to location within an image. An alternative approach is to use tensor decomposition to extract features from multidimensional data while preserving the latent structure of the data, and to feed these features to machine learning models. Here, we adopt this framework and present a novel approach termed T-REx , which extracts features from images of haplotypes across sampled individuals using tensor decomposition, and then makes predictions from these features using classical machine learning methods. As a proof of concept, we explore the performance of T-REx on simulated neutral and selective sweep scenarios and find that it has high power and accuracy to discriminate sweeps from neutrality, robustness to common technical hurdles, and easy visualization of feature importance. Therefore, T-REx is a powerful addition to the toolkit for detecting adaptive processes from genomic data.
Collapse
|
3
|
Abondio P, Cilli E, Luiselli D. Inferring Signatures of Positive Selection in Whole-Genome Sequencing Data: An Overview of Haplotype-Based Methods. Genes (Basel) 2022; 13:genes13050926. [PMID: 35627311 PMCID: PMC9141518 DOI: 10.3390/genes13050926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Revised: 05/19/2022] [Accepted: 05/20/2022] [Indexed: 11/16/2022] Open
Abstract
Signatures of positive selection in the genome are a characteristic mark of adaptation that can reveal an ongoing, recent, or ancient response to environmental change throughout the evolution of a population. New sources of food, climate conditions, and exposure to pathogens are only some of the possible sources of selective pressure, and the rise of advantageous genetic variants is a crucial determinant of survival and reproduction. In this context, the ability to detect these signatures of selection may pinpoint genetic variants that are responsible for a significant change in gene regulation, gene expression, or protein synthesis, structure, and function. This review focuses on statistical methods that take advantage of linkage disequilibrium and haplotype determination to reveal signatures of positive selection in whole-genome sequencing data, showing that they emerge from different descriptions of the same underlying event. Moreover, considerations are provided around the application of these statistics to different species, their suitability for ancient DNA, and the usefulness of discovering variants under selection for biomedicine and public health in an evolutionary medicine framework.
Collapse
Affiliation(s)
- Paolo Abondio
- Department of Cultural Heritage, University of Bologna, Via Degli Ariani 1, 48121 Ravenna, Italy; (E.C.); (D.L.)
- Laboratory of Molecular Anthropology and Center for Genome Biology, Department of Biological, Geological and Environmental Sciences, University of Bologna, Via Selmi 3, 40126 Bologna, Italy
- Correspondence:
| | - Elisabetta Cilli
- Department of Cultural Heritage, University of Bologna, Via Degli Ariani 1, 48121 Ravenna, Italy; (E.C.); (D.L.)
| | - Donata Luiselli
- Department of Cultural Heritage, University of Bologna, Via Degli Ariani 1, 48121 Ravenna, Italy; (E.C.); (D.L.)
- Fano Marine Center, The Inter-Institute Center for Research on Marine Biodiversity, Resources and Biotechnologies (FMC), Viale Adriatico 1/N, 61032 Fano, Italy
| |
Collapse
|
4
|
Otte KA, Nolte V, Mallard F, Schlötterer C. The genetic architecture of temperature adaptation is shaped by population ancestry and not by selection regime. Genome Biol 2021; 22:211. [PMID: 34271951 PMCID: PMC8285869 DOI: 10.1186/s13059-021-02425-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2020] [Accepted: 06/29/2021] [Indexed: 12/28/2022] Open
Abstract
Background Understanding the genetic architecture of temperature adaptation is key for characterizing and predicting the effect of climate change on natural populations. One particularly promising approach is Evolve and Resequence, which combines advantages of experimental evolution such as time series, replicate populations, and controlled environmental conditions, with whole genome sequencing. Recent analysis of replicate populations from two different Drosophila simulans founder populations, which were adapting to the same novel hot environment, uncovered very different architectures—either many selection targets with large heterogeneity among replicates or fewer selection targets with a consistent response among replicates. Results Here, we expose the founder population from Portugal to a cold temperature regime. Although almost no selection targets are shared between the hot and cold selection regime, the adaptive architecture was similar. We identify a moderate number of targets under strong selection (19 selection targets, mean selection coefficient = 0.072) and parallel responses in the cold evolved replicates. This similarity across different environments indicates that the adaptive architecture depends more on the ancestry of the founder population than the specific selection regime. Conclusions These observations will have broad implications for the correct interpretation of the genomic responses to a changing climate in natural populations. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-021-02425-9.
Collapse
Affiliation(s)
- Kathrin A Otte
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria.,Present address: Institute for Zoology, University of Cologne, Cologne, Germany
| | - Viola Nolte
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
| | - François Mallard
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria.,Present address: Institut de Biologie de l'École Normale Supérieure, CNRS UMR 8197, Inserm U1024, PSL Research University, F-75005, Paris, France
| | | |
Collapse
|
5
|
Miranda I, Giska I, Farelo L, Pimenta J, Zimova M, Bryk J, Dalén L, Mills LS, Zub K, Melo-Ferreira J. Museomics dissects the genetic basis for adaptive seasonal colouration in the least weasel. Mol Biol Evol 2021; 38:4388-4402. [PMID: 34157721 PMCID: PMC8476133 DOI: 10.1093/molbev/msab177] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Dissecting the link between genetic variation and adaptive phenotypes provides outstanding opportunities to understand fundamental evolutionary processes. Here, we use a museomics approach to investigate the genetic basis and evolution of winter coat colouration morphs in least weasels (Mustela nivalis), a repeated adaptation for camouflage in mammals with seasonal pelage colour moults across regions with varying winter snow. Whole-genome sequence data was obtained from biological collections and mapped onto a newly assembled reference genome for the species. Sampling represented two replicate transition zones between nivalis and vulgaris colouration morphs in Europe, which typically develop white or brown winter coats, respectively. Population analyses showed that the morph distribution across transition zones is not a by-product of historical structure. Association scans linked a 200 kb genomic region to colouration morph, which was validated by genotyping museum specimens from inter-morph experimental crosses. Genotyping the wild populations narrowed down the association to pigmentation gene MC1R and pinpointed a candidate amino acid change co-segregating with colouration morph. This polymorphism replaces an ancestral leucine residue by lysine at the start of the first extracellular loop of the protein in the vulgaris morph. A selective sweep signature overlapped the association region in vulgaris, suggesting that past adaptation favoured winter-brown morphs and can anchor future adaptive responses to decreasing winter snow. Using biological collections as valuable resources to study natural adaptations, our study showed a new evolutionary route generating winter colour variation in mammals and that seasonal camouflage can be modulated by changes at single key genes.
Collapse
Affiliation(s)
- Inês Miranda
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Vairão, 4485-661, Portugal.,Departamento de Biologia, Faculdade de Ciências da Universidade do Porto, Porto, 4169-007, Portugal
| | - Iwona Giska
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Vairão, 4485-661, Portugal
| | - Liliana Farelo
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Vairão, 4485-661, Portugal
| | - João Pimenta
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Vairão, 4485-661, Portugal
| | - Marketa Zimova
- School for Environment and Sustainability, University of Michigan, Dana Natural Resources Building, 440 Church St, Ann Arbor, MI, 49109, USA
| | - Jarosław Bryk
- School of Applied Sciences, University of Huddersfield, Quennsgate, Huddersfield, UK
| | - Love Dalén
- Centre for Palaeogenetics, Svante Arrhenius väg 20C, Stockholm, SE-10691, Sweden.,Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Box 50007, Stockholm, SE-10405, Sweden
| | - L Scott Mills
- Wildlife Biology Program, University of Montana, Missoula, MT, 59812, USA.,Office of Research and Creative Scholarship, University of Montana, Missoula, MT, 59812, USA
| | - Karol Zub
- Mammal Research Institute, Polish Academy of Sciences, Stoczek 1, Białowieża 17-230, Poland
| | - José Melo-Ferreira
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Vairão, 4485-661, Portugal.,Departamento de Biologia, Faculdade de Ciências da Universidade do Porto, Porto, 4169-007, Portugal
| |
Collapse
|
6
|
Harris AM, DeGiorgio M. A Likelihood Approach for Uncovering Selective Sweep Signatures from Haplotype Data. Mol Biol Evol 2021; 37:3023-3046. [PMID: 32392293 PMCID: PMC7530616 DOI: 10.1093/molbev/msaa115] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Selective sweeps are frequent and varied signatures in the genomes of natural populations, and detecting them is consequently important in understanding mechanisms of adaptation by natural selection. Following a selective sweep, haplotypic diversity surrounding the site under selection decreases, and this deviation from the background pattern of variation can be applied to identify sweeps. Multiple methods exist to locate selective sweeps in the genome from haplotype data, but none leverages the power of a model-based approach to make their inference. Here, we propose a likelihood ratio test statistic T to probe whole-genome polymorphism data sets for selective sweep signatures. Our framework uses a simple but powerful model of haplotype frequency spectrum distortion to find sweeps and additionally make an inference on the number of presently sweeping haplotypes in a population. We found that the T statistic is suitable for detecting both hard and soft sweeps across a variety of demographic models, selection strengths, and ages of the beneficial allele. Accordingly, we applied the T statistic to variant calls from European and sub-Saharan African human populations, yielding primarily literature-supported candidates, including LCT, RSPH3, and ZNF211 in CEU, SYT1, RGS18, and NNT in YRI, and HLA genes in both populations. We also searched for sweep signatures in Drosophila melanogaster, finding expected candidates at Ace, Uhg1, and Pimet. Finally, we provide open-source software to compute the T statistic and the inferred number of presently sweeping haplotypes from whole-genome data.
Collapse
Affiliation(s)
- Alexandre M Harris
- Department of Biology, Pennsylvania State University, University Park, PA.,Molecular, Cellular, and Integrative Biosciences, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA
| | - Michael DeGiorgio
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL
| |
Collapse
|
7
|
Lindo J, DeGiorgio M. Understanding the Adaptive Evolutionary Histories of South American Ancient and Present-Day Populations via Genomics. Genes (Basel) 2021; 12:360. [PMID: 33801556 PMCID: PMC8001801 DOI: 10.3390/genes12030360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Revised: 02/18/2021] [Accepted: 02/22/2021] [Indexed: 12/03/2022] Open
Abstract
The South American continent is remarkably diverse in its ecological zones, spanning the Amazon rainforest, the high-altitude Andes, and Tierra del Fuego. Yet the original human populations of the continent successfully inhabited all these zones, well before the buffering effects of modern technology. Therefore, it is likely that the various cultures were successful, in part, due to positive natural selection that allowed them to successfully establish populations for thousands of years. Detecting positive selection in these populations is still in its infancy, as the ongoing effects of European contact have decimated many of these populations and introduced gene flow from outside of the continent. In this review, we explore hypotheses of possible human biological adaptation, methods to identify positive selection, the utilization of ancient DNA, and the integration of modern genomes through the identification of genomic tracts that reflect the ancestry of the first populations of the Americas.
Collapse
Affiliation(s)
- John Lindo
- Department of Anthropology, Emory University, Atlanta, GA 30322, USA
| | - Michael DeGiorgio
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| |
Collapse
|
8
|
Kautt AF, Kratochwil CF, Nater A, Machado-Schiaffino G, Olave M, Henning F, Torres-Dowdall J, Härer A, Hulsey CD, Franchini P, Pippel M, Myers EW, Meyer A. Contrasting signatures of genomic divergence during sympatric speciation. Nature 2020; 588:106-111. [PMID: 33116308 PMCID: PMC7759464 DOI: 10.1038/s41586-020-2845-0] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Accepted: 07/23/2020] [Indexed: 01/25/2023]
Abstract
The transition from 'well-marked varieties' of a single species into 'well-defined species'-especially in the absence of geographic barriers to gene flow (sympatric speciation)-has puzzled evolutionary biologists ever since Darwin1,2. Gene flow counteracts the buildup of genome-wide differentiation, which is a hallmark of speciation and increases the likelihood of the evolution of irreversible reproductive barriers (incompatibilities) that complete the speciation process3. Theory predicts that the genetic architecture of divergently selected traits can influence whether sympatric speciation occurs4, but empirical tests of this theory are scant because comprehensive data are difficult to collect and synthesize across species, owing to their unique biologies and evolutionary histories5. Here, within a young species complex of neotropical cichlid fishes (Amphilophus spp.), we analysed genomic divergence among populations and species. By generating a new genome assembly and re-sequencing 453 genomes, we uncovered the genetic architecture of traits that have been suggested to be important for divergence. Species that differ in monogenic or oligogenic traits that affect ecological performance and/or mate choice show remarkably localized genomic differentiation. By contrast, differentiation among species that have diverged in polygenic traits is genomically widespread and much higher overall, consistent with the evolution of effective and stable genome-wide barriers to gene flow. Thus, we conclude that simple trait architectures are not always as conducive to speciation with gene flow as previously suggested, whereas polygenic architectures can promote rapid and stable speciation in sympatry.
Collapse
Affiliation(s)
- Andreas F Kautt
- Department of Biology, University of Konstanz, Konstanz, Germany
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | | | - Alexander Nater
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Gonzalo Machado-Schiaffino
- Department of Biology, University of Konstanz, Konstanz, Germany
- Department of Functional Biology, Area of Genetics, University of Oviedo, Oviedo, Spain
| | - Melisa Olave
- Department of Biology, University of Konstanz, Konstanz, Germany
- Argentine Dryland Research Institute of the National Council for Scientific Research (IADIZA-CONICET), Mendoza, Argentina
| | - Frederico Henning
- Department of Biology, University of Konstanz, Konstanz, Germany
- Department of Genetics, Institute of Biology, Federal University of Rio de Janeiro (UFRJ), Rio de Janeiro, Brazil
| | | | - Andreas Härer
- Department of Biology, University of Konstanz, Konstanz, Germany
- Division of Biological Sciences, Section of Ecology, Behavior & Evolution, University of California San Diego, La Jolla, CA, USA
| | - C Darrin Hulsey
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Paolo Franchini
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Martin Pippel
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- Center for Systems Biology Dresden, Dresden, Germany
| | - Eugene W Myers
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- Center for Systems Biology Dresden, Dresden, Germany
| | - Axel Meyer
- Department of Biology, University of Konstanz, Konstanz, Germany.
| |
Collapse
|
9
|
Hartfield M, Bataillon T. Selective Sweeps Under Dominance and Inbreeding. G3 (BETHESDA, MD.) 2020; 10:1063-1075. [PMID: 31974096 PMCID: PMC7056974 DOI: 10.1534/g3.119.400919] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Accepted: 01/18/2020] [Indexed: 12/26/2022]
Abstract
A major research goal in evolutionary genetics is to uncover loci experiencing positive selection. One approach involves finding 'selective sweeps' patterns, which can either be 'hard sweeps' formed by de novo mutation, or 'soft sweeps' arising from recurrent mutation or existing standing variation. Existing theory generally assumes outcrossing populations, and it is unclear how dominance affects soft sweeps. We consider how arbitrary dominance and inbreeding via self-fertilization affect hard and soft sweep signatures. With increased self-fertilization, they are maintained over longer map distances due to reduced effective recombination and faster beneficial allele fixation times. Dominance can affect sweep patterns in outcrossers if the derived variant originates from either a single novel allele, or from recurrent mutation. These models highlight the challenges in distinguishing hard and soft sweeps, and propose methods to differentiate between scenarios.
Collapse
Affiliation(s)
- Matthew Hartfield
- Department of Ecology and Evolutionary Biology, University of Toronto, Ontario M5S 3B2, Canada,
- Bioinformatics Research Centre, Aarhus University, Aarhus 8000, Denmark, and
- Institute of Evolutionary Biology, The University of Edinburgh, Edinburgh EH9 3FL, United Kingdom
| | - Thomas Bataillon
- Bioinformatics Research Centre, Aarhus University, Aarhus 8000, Denmark, and
| |
Collapse
|