1
|
Lynch M, Wei W, Ye Z, Pfrender M. The genome-wide signature of short-term temporal selection. Proc Natl Acad Sci U S A 2024; 121:e2307107121. [PMID: 38959040 DOI: 10.1073/pnas.2307107121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 06/03/2024] [Indexed: 07/04/2024] Open
Abstract
Despite evolutionary biology's obsession with natural selection, few studies have evaluated multigenerational series of patterns of selection on a genome-wide scale in natural populations. Here, we report on a 10-y population-genomic survey of the microcrustacean Daphnia pulex. The genome sequences of [Formula: see text]800 isolates provide insights into patterns of selection that cannot be obtained from long-term molecular-evolution studies, including the following: the pervasiveness of near quasi-neutrality across the genome (mean net selection coefficients near zero, but with significant temporal variance about the mean, and little evidence of positive covariance of selection across time intervals); the preponderance of weak positive selection operating on minor alleles; and a genome-wide distribution of numerous small linkage islands of observable selection influencing levels of nucleotide diversity. These results suggest that interannual fluctuating selection is a major determinant of standing levels of variation in natural populations, challenge the conventional paradigm for interpreting patterns of nucleotide diversity and divergence, and motivate the need for the further development of theoretical expressions for the interpretation of population-genomic data.
Collapse
Affiliation(s)
- Michael Lynch
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287
| | - Wen Wei
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287
| | - Zhiqiang Ye
- Hubei Key Laboratory of Genetic Regulation and Integrative Biology, School of Life Sciences, Central China Normal University, Wuhan 430079, China
| | - Michael Pfrender
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556
| |
Collapse
|
2
|
Brenman-Suttner D, Zayed A. An integrative genomic toolkit for studying the genetic, evolutionary and molecular underpinnings of eusociality in insects. CURRENT OPINION IN INSECT SCIENCE 2024:101231. [PMID: 38977215 DOI: 10.1016/j.cois.2024.101231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 06/26/2024] [Accepted: 07/02/2024] [Indexed: 07/10/2024]
Abstract
While genomic resources for social insects have vastly increased over the past two decades, we are still far from understanding the genetic and molecular basis of eusociality. Here, we briefly review three scientific advancements that, when integrated, can be highly synergistic for advancing our knowledge of the genetics and evolution of eusocial traits. Population genomics provides a natural way to quantify the strength of natural selection on coding and regulatory sequences, highlighting genes that have undergone adaptive evolution during the evolution or maintenance of eusociality. Genome wide association studies (GWAS) can be used to characterize the complex genetic architecture underlying eusocial traits and identify candidate causal variants. Concurrently, CRISPR/Cas9 enables the precise manipulation of gene function to both validate genotype-phenotype associations and study the molecular biology underlying interesting traits. While each approach has its own advantages and disadvantages, which we discuss herein, we argue that their combination will ultimately help us better understand the genetics and evolution of eusocial behaviour. Specifically, by triangulating across these three different approaches, researchers can directly identify and study loci that have a causal association with key phenotypes and have evidence of positive selection over the relevant timescales associated with the evolution and maintenance of eusociality in insects.
Collapse
Affiliation(s)
| | - Amro Zayed
- Department of Biology, York University, Toronto, Ontario, Canada.
| |
Collapse
|
3
|
Marsh JI, Johri P. Biases in ARG-Based Inference of Historical Population Size in Populations Experiencing Selection. Mol Biol Evol 2024; 41:msae118. [PMID: 38874402 PMCID: PMC11245712 DOI: 10.1093/molbev/msae118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 06/05/2024] [Accepted: 06/11/2024] [Indexed: 06/15/2024] Open
Abstract
Inferring the demographic history of populations provides fundamental insights into species dynamics and is essential for developing a null model to accurately study selective processes. However, background selection and selective sweeps can produce genomic signatures at linked sites that mimic or mask signals associated with historical population size change. While the theoretical biases introduced by the linked effects of selection have been well established, it is unclear whether ancestral recombination graph (ARG)-based approaches to demographic inference in typical empirical analyses are susceptible to misinference due to these effects. To address this, we developed highly realistic forward simulations of human and Drosophila melanogaster populations, including empirically estimated variability of gene density, mutation rates, recombination rates, purifying, and positive selection, across different historical demographic scenarios, to broadly assess the impact of selection on demographic inference using a genealogy-based approach. Our results indicate that the linked effects of selection minimally impact demographic inference for human populations, although it could cause misinference in populations with similar genome architecture and population parameters experiencing more frequent recurrent sweeps. We found that accurate demographic inference of D. melanogaster populations by ARG-based methods is compromised by the presence of pervasive background selection alone, leading to spurious inferences of recent population expansion, which may be further worsened by recurrent sweeps, depending on the proportion and strength of beneficial mutations. Caution and additional testing with species-specific simulations are needed when inferring population history with non-human populations using ARG-based approaches to avoid misinference due to the linked effects of selection.
Collapse
Affiliation(s)
- Jacob I Marsh
- Department of Biology, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Parul Johri
- Department of Biology, University of North Carolina, Chapel Hill, NC 27599, USA
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
- Integrative Program for Biological and Genome Sciences, University of North Carolina, Chapel Hill, NC 27599, USA
| |
Collapse
|
4
|
Modica A, Lalagüe H, Muratorio S, Scotti I. Rolling down that mountain: microgeographical adaptive divergence during a fast population expansion along a steep environmental gradient in European beech. Heredity (Edinb) 2024:10.1038/s41437-024-00696-z. [PMID: 38890557 DOI: 10.1038/s41437-024-00696-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 05/23/2024] [Accepted: 05/23/2024] [Indexed: 06/20/2024] Open
Abstract
Forest tree populations harbour high genetic diversity thanks to large effective population sizes and strong gene flow, allowing them to diversify through adaptation to local environmental pressures within dispersal distance. Many tree populations also experienced historical demographic fluctuations, including spatial population contraction or expansions at various temporal scales, which may constrain their ability to adapt to environmental variations. Our aim is to investigate how recent contraction and expansion events interfere with local adaptation, by studying patterns of adaptive divergence between closely related stands undergoing environmentally contrasted conditions, and having or not recently expanded. To investigate genome-wide signatures of local adaptation while accounting for demography, we analysed divergence in a European beech population by testing pairwise differentiation among four tree stands at ~35k Single Nucleotide Polymorphisms from ~9k genomic regions. We applied three divergence outlier search methods resting on different assumptions and targeting either single SNPs or contiguous genomic regions, while accounting for the effect of population size variations on genetic divergence. We found 27 signals of selective signatures in 19 target regions. Putatively adaptive divergence involved all stand pairs. We retrieved signals both when comparing old-growth stands and recently colonised areas and when comparing stands within the old-growth area. Therefore, adaptive divergence processes have taken place both over short time spans, under strong environmental contrasts, and over short ecological gradients, in populations that have been stable in the long term. This suggests that standing genetic variation supports local, microgeographic divergence processes, which can maintain genetic diversity at the landscape level.
Collapse
Affiliation(s)
- Andrea Modica
- INRAE, URFM, 228, Route de l'Aérodrome, 84914, Avignon, France
| | - Hadrien Lalagüe
- INRAE, EcoFoG, Campus agronomique, 97310, Kourou, French Guiana
| | - Sylvie Muratorio
- INRAE, EcoBioP, 173, Route de Saint-Jean-de-Luz RD 918, 64310, Saint-Pée-sur-Nivelle, France
| | - Ivan Scotti
- INRAE, URFM, 228, Route de l'Aérodrome, 84914, Avignon, France.
| |
Collapse
|
5
|
Soni V, Jensen JD. Temporal challenges in detecting balancing selection from population genomic data. G3 (BETHESDA, MD.) 2024; 14:jkae069. [PMID: 38551137 DOI: 10.1093/g3journal/jkae069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 12/21/2023] [Accepted: 03/19/2024] [Indexed: 04/28/2024]
Abstract
The role of balancing selection in maintaining genetic variation remains an open question in population genetics. Recent years have seen numerous studies identifying candidate loci potentially experiencing balancing selection, most predominantly in human populations. There are however numerous alternative evolutionary processes that may leave similar patterns of variation, thereby potentially confounding inference, and the expected signatures of balancing selection additionally change in a temporal fashion. Here we use forward-in-time simulations to quantify expected statistical power to detect balancing selection using both site frequency spectrum- and linkage disequilibrium-based methods under a variety of evolutionarily realistic null models. We find that whilst site frequency spectrum-based methods have little power immediately after a balanced mutation begins segregating, power increases with time since the introduction of the balanced allele. Conversely, linkage disequilibrium-based methods have considerable power whilst the allele is young, and power dissipates rapidly as the time since introduction increases. Taken together, this suggests that site frequency spectrum-based methods are most effective at detecting long-term balancing selection (>25N generations since the introduction of the balanced allele) whilst linkage disequilibrium-based methods are effective over much shorter timescales (<1N generations), thereby leaving a large time frame over which current methods have little power to detect the action of balancing selection. Finally, we investigate the extent to which alternative evolutionary processes may mimic these patterns, and demonstrate the need for caution in attempting to distinguish the signatures of balancing selection from those of both neutral processes (e.g. population structure and admixture) as well as of alternative selective processes (e.g. partial selective sweeps).
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ 85281, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ 85281, USA
| |
Collapse
|
6
|
Mikula LC, Vogl C. The expected sample allele frequencies from populations of changing size via orthogonal polynomials. Theor Popul Biol 2024; 157:55-85. [PMID: 38552964 DOI: 10.1016/j.tpb.2024.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 03/24/2024] [Accepted: 03/26/2024] [Indexed: 04/11/2024]
Abstract
In this article, discrete and stochastic changes in (effective) population size are incorporated into the spectral representation of a biallelic diffusion process for drift and small mutation rates. A forward algorithm inspired by Hidden-Markov-Model (HMM) literature is used to compute exact sample allele frequency spectra for three demographic scenarios: single changes in (effective) population size, boom-bust dynamics, and stochastic fluctuations in (effective) population size. An approach for fully agnostic demographic inference from these sample allele spectra is explored, and sufficient statistics for stepwise changes in population size are found. Further, convergence behaviours of the polymorphic sample spectra for population size changes on different time scales are examined and discussed within the context of inference of the effective population size. Joint visual assessment of the sample spectra and the temporal coefficients of the spectral decomposition of the forward diffusion process is found to be important in determining departure from equilibrium. Stochastic changes in (effective) population size are shown to shape sample spectra particularly strongly.
Collapse
Affiliation(s)
- Lynette Caitlin Mikula
- Centre for Biological Diversity, School of Biology, University of St. Andrews, St, Andrews KY16 9TH, UK.
| | - Claus Vogl
- Department of Biomedical Sciences and Pathobiology, Vetmeduni Vienna, Veterinärplatz 1, A-1210 Wien, Austria; Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Veterinärplatz 1, A-1210 Wien, Austria.
| |
Collapse
|
7
|
Roberts M, Josephs EB. Previously unmeasured genetic diversity explains part of Lewontin's paradox in a k-mer-based meta-analysis of 112 plant species. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.17.594778. [PMID: 38798362 PMCID: PMC11118579 DOI: 10.1101/2024.05.17.594778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
At the molecular level, most evolution is expected to be neutral. A key prediction of this expectation is that the level of genetic diversity in a population should scale with population size. However, as was noted by Richard Lewontin in 1974 and reaffirmed by later studies, the slope of the population size-diversity relationship in nature is much weaker than expected under neutral theory. We hypothesize that one contributor to this paradox is that current methods relying on single nucleotide polymorphisms (SNPs) called from aligning short reads to a reference genome underestimate levels of genetic diversity in many species. To test this idea, we calculated nucleotide diversity ( π ) and k-mer-based metrics of genetic diversity across 112 plant species, amounting to over 205 terabases of DNA sequencing data from 27,488 individual plants. We then compared how these different metrics correlated with proxies of population size that account for both range size and population density variation across species. We found that our population size proxies scaled anywhere from about 3 to over 20 times faster with k-mer diversity than nucleotide diversity after adjusting for evolutionary history, mating system, life cycle habit, cultivation status, and invasiveness. The relationship between k-mer diversity and population size proxies also remains significant after correcting for genome size, whereas the analogous relationship for nucleotide diversity does not. These results suggest that variation not captured by common SNP-based analyses explains part of Lewontin's paradox in plants.
Collapse
|
8
|
Daigle A, Johri P. Hill-Robertson interference may bias the inference of fitness effects of new mutations in highly selfing species. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.06.579142. [PMID: 38370745 PMCID: PMC10871249 DOI: 10.1101/2024.02.06.579142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
The accurate estimation of the distribution of fitness effects (DFE) of new mutations is critical for population genetic inference but remains a challenging task. While various methods have been developed for DFE inference using the site frequency spectrum of putatively neutral and selected sites, their applicability in species with diverse life history traits and complex demographic scenarios is not well understood. Selfing is common among eukaryotic species and can lead to decreased effective recombination rates, increasing the effects of selection at linked sites, including interference between selected alleles. We employ forward simulations to investigate the limitations of current DFE estimation approaches in the presence of selfing and other model violations, such as linkage, departures from semidominance, population structure, and uneven sampling. We find that distortions of the site frequency spectrum due to Hill-Robertson interference in highly selfing populations lead to mis-inference of the deleterious DFE of new mutations. Specifically, while accounting for the decrease in the effective population size due to linked effects of selection allows an accurate estimation of selection coefficients in moderately selfing populations, this correction is unable to accurately estimate selection coefficients in highly selfing populations when interference between selected alleles is pervasive. In addition, the presence of cryptic population structure with low rates of migration and uneven sampling across subpopulations leads to the false inference of a deleterious DFE skewed towards effectively neutral/mildly deleterious mutations. Finally, the proportion of adaptive substitutions estimated at high rates of selfing is substantially overestimated. Our observations apply broadly to species and genomic regions with little/no recombination and where interference might be pervasive.
Collapse
|
9
|
Korfmann K, Temple-Boyer M, Sellinger T, Tellier A. Determinants of rapid adaptation in species with large variance in offspring production. Mol Ecol 2024; 33:e16982. [PMID: 37199145 DOI: 10.1111/mec.16982] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 04/26/2023] [Accepted: 05/02/2023] [Indexed: 05/19/2023]
Abstract
The speed of population adaptation to changing biotic and abiotic environments is determined by the interaction between genetic drift, positive selection and linkage effects. Many marine species (fish, crustaceans), invertebrates and pathogens of humans and crops, exhibit sweepstakes reproduction characterized by the production of a very large amount of offspring (fecundity phase) from which only a small fraction may survive to the next generation (viability phase). Using stochastic simulations, we investigate whether the occurrence of sweepstakes reproduction affects the efficiency of a positively selected unlinked locus, and thus, the speed of adaptation since fecundity and/or viability have distinguishable consequences on mutation rate, probability and fixation time of advantageous alleles. We observe that the mean number of mutations at the next generation is always the function of the population size, but the variance increases with stronger sweepstakes reproduction when mutations occur in the parents. On the one hand, stronger sweepstakes reproduction magnifies the effect of genetic drift thus increasing the probability of fixation of neutral allele and decreasing that of selected alleles. On the other hand, the time to fixation of advantageous (as well as neutral) alleles is shortened by stronger sweepstakes reproduction. Importantly, fecundity and viability selection exhibit different probabilities and times to fixation of advantageous alleles under intermediate and weak sweepstakes reproduction. Finally, alleles under both strong fecundity and viability selection display a synergistic efficiency of selection. We conclude that measuring and modelling accurately fecundity and/or viability selection are crucial to predict the adaptive potential of species with sweepstakes reproduction.
Collapse
Affiliation(s)
- Kevin Korfmann
- Professorship for Population Genetics, Department of Life Science Systems, Technical University of Munich, Freising, Germany
| | - Marie Temple-Boyer
- Professorship for Population Genetics, Department of Life Science Systems, Technical University of Munich, Freising, Germany
| | - Thibaut Sellinger
- Professorship for Population Genetics, Department of Life Science Systems, Technical University of Munich, Freising, Germany
- Department of Environment and Biodiversity, Paris Lodron University of Salzburg, Salzburg, Austria
| | - Aurélien Tellier
- Professorship for Population Genetics, Department of Life Science Systems, Technical University of Munich, Freising, Germany
| |
Collapse
|
10
|
Saubin M, Tellier A, Stoeckel S, Andrieux A, Halkett F. Approximate Bayesian Computation applied to time series of population genetic data disentangles rapid genetic changes and demographic variations in a pathogen population. Mol Ecol 2024; 33:e16965. [PMID: 37150947 DOI: 10.1111/mec.16965] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 04/04/2023] [Accepted: 04/12/2023] [Indexed: 05/09/2023]
Abstract
Adaptation can occur at remarkably short timescales in natural populations, leading to drastic changes in phenotypes and genotype frequencies over a few generations only. The inference of demographic parameters can allow understanding how evolutionary forces interact and shape the genetic trajectories of populations during rapid adaptation. Here we propose a new Approximate Bayesian Computation (ABC) framework that couples a forward and individual-based model with temporal genetic data to disentangle genetic changes and demographic variations in a case of rapid adaptation. We test the accuracy of our inferential framework and evaluate the benefit of considering a dense versus sparse sampling. Theoretical investigations demonstrate high accuracy in both model and parameter estimations, even if a strong thinning is applied to time series data. Then, we apply our ABC inferential framework to empirical data describing the population genetic changes of the poplar rust pathogen following a major event of resistance overcoming. We successfully estimate key demographic and genetic parameters, including the proportion of resistant hosts deployed in the landscape and the level of standing genetic variation from which selection occurred. Inferred values are in accordance with our empirical knowledge of this biological system. This new inferential framework, which contrasts with coalescent-based ABC analyses, is promising for a better understanding of evolutionary trajectories of populations subjected to rapid adaptation.
Collapse
Affiliation(s)
- Méline Saubin
- Université de Lorraine, INRAE, IAM, Nancy, France
- Department for Life Science Systems, Technical University of Munich, Freising, Germany
| | - Aurélien Tellier
- Department for Life Science Systems, Technical University of Munich, Freising, Germany
| | - Solenn Stoeckel
- INRAE, Agrocampus Ouest, Université de Rennes, IGEPP, Le Rheu, France
| | | | | |
Collapse
|
11
|
Soni V, Terbot JW, Jensen JD. Population genetic considerations regarding the interpretation of within-patient SARS-CoV-2 polymorphism data. Nat Commun 2024; 15:3240. [PMID: 38627371 PMCID: PMC11021480 DOI: 10.1038/s41467-024-46261-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 01/29/2024] [Indexed: 04/19/2024] Open
Affiliation(s)
- Vivak Soni
- Center for Evolution & Medicine, Arizona State University, School of Life Sciences, Tempe, AZ, USA
| | - John W Terbot
- Center for Evolution & Medicine, Arizona State University, School of Life Sciences, Tempe, AZ, USA
- Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Jeffrey D Jensen
- Center for Evolution & Medicine, Arizona State University, School of Life Sciences, Tempe, AZ, USA.
| |
Collapse
|
12
|
Murga-Moreno J, Casillas S, Barbadilla A, Uricchio L, Enard D. An efficient and robust ABC approach to infer the rate and strength of adaptation. G3 (BETHESDA, MD.) 2024; 14:jkae031. [PMID: 38365205 PMCID: PMC11090462 DOI: 10.1093/g3journal/jkae031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 10/10/2023] [Accepted: 01/29/2024] [Indexed: 02/18/2024]
Abstract
Inferring the effects of positive selection on genomes remains a critical step in characterizing the ultimate and proximate causes of adaptation across species, and quantifying positive selection remains a challenge due to the confounding effects of many other evolutionary processes. Robust and efficient approaches for adaptation inference could help characterize the rate and strength of adaptation in nonmodel species for which demographic history, mutational processes, and recombination patterns are not currently well-described. Here, we introduce an efficient and user-friendly extension of the McDonald-Kreitman test (ABC-MK) for quantifying long-term protein adaptation in specific lineages of interest. We characterize the performance of our approach with forward simulations and find that it is robust to many demographic perturbations and positive selection configurations, demonstrating its suitability for applications to nonmodel genomes. We apply ABC-MK to the human proteome and a set of known virus interacting proteins (VIPs) to test the long-term adaptation in genes interacting with viruses. We find substantially stronger signatures of positive selection on RNA-VIPs than DNA-VIPs, suggesting that RNA viruses may be an important driver of human adaptation over deep evolutionary time scales.
Collapse
Affiliation(s)
- Jesús Murga-Moreno
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85719, USA
| | - Sònia Casillas
- Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| | - Antonio Barbadilla
- Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| | | | - David Enard
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85719, USA
| |
Collapse
|
13
|
Song H, Chu J, Li W, Li X, Fang L, Han J, Zhao S, Ma Y. A Novel Approach Utilizing Domain Adversarial Neural Networks for the Detection and Classification of Selective Sweeps. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2304842. [PMID: 38308186 PMCID: PMC11005742 DOI: 10.1002/advs.202304842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 01/10/2024] [Indexed: 02/04/2024]
Abstract
The identification and classification of selective sweeps are of great significance for improving the understanding of biological evolution and exploring opportunities for precision medicine and genetic improvement. Here, a domain adaptation sweep detection and classification (DASDC) method is presented to balance the alignment of two domains and the classification performance through a domain-adversarial neural network and its adversarial learning modules. DASDC effectively addresses the issue of mismatch between training data and real genomic data in deep learning models, leading to a significant improvement in its generalization capability, prediction robustness, and accuracy. The DASDC method demonstrates improved identification performance compared to existing methods and excels in classification performance, particularly in scenarios where there is a mismatch between application data and training data. The successful implementation of DASDC in real data of three distinct species highlights its potential as a useful tool for identifying crucial functional genes and investigating adaptive evolutionary mechanisms, particularly with the increasing availability of genomic data.
Collapse
Affiliation(s)
- Hui Song
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
| | - Jinyu Chu
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
| | - Wangjiao Li
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
| | - Xinyun Li
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
- Hubei Hongshan LaboratoryWuhan430070China
| | - Lingzhao Fang
- Center for Quantitative Genetics and GenomicsAarhus UniversityAarhus8000Denmark
| | - Jianlin Han
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
- CAAS‐ILRI Joint Laboratory on Livestock and Forage Genetic ResourcesInstitute of Animal ScienceChinese Academy of Agricultural Sciences (CAAS)Beijing100193China
- Livestock Genetics ProgramInternational Livestock Research Institute (ILRI)Nairobi00100Kenya
| | - Shuhong Zhao
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
- Hubei Hongshan LaboratoryWuhan430070China
- Lingnan Modern Agricultural Science and Technology Guangdong LaboratoryGuangzhou510642China
| | - Yunlong Ma
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
- Hubei Hongshan LaboratoryWuhan430070China
- Lingnan Modern Agricultural Science and Technology Guangdong LaboratoryGuangzhou510642China
| |
Collapse
|
14
|
Dinnage R, Sarre SD, Duncan RP, Dickman CR, Edwards SV, Greenville AC, Wardle GM, Gruber B. slimr: An R package for tailor-made integrations of data in population genomic simulations over space and time. Mol Ecol Resour 2024; 24:e13916. [PMID: 38124500 DOI: 10.1111/1755-0998.13916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 11/20/2023] [Accepted: 11/30/2023] [Indexed: 12/23/2023]
Abstract
Software for realistically simulating complex population genomic processes is revolutionizing our understanding of evolutionary processes, and providing novel opportunities for integrating empirical data with simulations. However, the integration between standalone simulation software and R is currently not well developed. Here, we present slimr, an R package designed to create a seamless link between standalone software SLiM >3.0, one of the most powerful population genomic simulation frameworks, and the R development environment, with its powerful data manipulation and analysis tools. We show how slimr facilitates smooth integration between genetic data, ecological data and simulation in a single environment. The package enables pipelines that begin with data reading, cleaning and manipulation, proceed to constructing empirically based parameters and initial conditions for simulations, then to running numerical simulations and finally to retrieving simulation results in a format suitable for comparisons with empirical data - aided by advanced analysis and visualization tools provided by R. We demonstrate the use of slimr with an example from our own work on the landscape population genomics of desert mammals, highlighting the advantage of having a single integrated tool for both data analysis and simulation. slimr makes the powerful simulation ability of SLiM directly accessible to R users, allowing integrated simulation projects that incorporate empirical data without the need to switch between software environments. This should provide more opportunities for evolutionary biologists and ecologists to use realistic simulations to better understand the interplay between ecological and evolutionary processes.
Collapse
Affiliation(s)
- Russell Dinnage
- Institute of Environment, Department of Biological Sciences, Florida International University, Miami, Florida, USA
- Centre for Conservation Ecology and Genomics, Institute for Applied Ecology, University of Canberra, Canberra, Australian Capital Territory, Australia
| | - Stephen D Sarre
- Centre for Conservation Ecology and Genomics, Institute for Applied Ecology, University of Canberra, Canberra, Australian Capital Territory, Australia
| | - Richard P Duncan
- Centre for Conservation Ecology and Genomics, Institute for Applied Ecology, University of Canberra, Canberra, Australian Capital Territory, Australia
| | - Christopher R Dickman
- Desert Ecology Research Group, School of Life and Environmental Sciences, University of Sydney, Camperdown, New South Wales, Australia
| | - Scott V Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA
- Museum of Comparative Zoology, Harvard University, Cambridge, Massachusetts, USA
| | - Aaron C Greenville
- Desert Ecology Research Group, School of Life and Environmental Sciences, University of Sydney, Camperdown, New South Wales, Australia
| | - Glenda M Wardle
- Desert Ecology Research Group, School of Life and Environmental Sciences, University of Sydney, Camperdown, New South Wales, Australia
| | - Bernd Gruber
- Centre for Conservation Ecology and Genomics, Institute for Applied Ecology, University of Canberra, Canberra, Australian Capital Territory, Australia
| |
Collapse
|
15
|
Yu Q, Ascensao JA, Okada T, Boyd O, Volz E, Hallatschek O. Lineage frequency time series reveal elevated levels of genetic drift in SARS-CoV-2 transmission in England. PLoS Pathog 2024; 20:e1012090. [PMID: 38620033 PMCID: PMC11045146 DOI: 10.1371/journal.ppat.1012090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 04/25/2024] [Accepted: 03/03/2024] [Indexed: 04/17/2024] Open
Abstract
Genetic drift in infectious disease transmission results from randomness of transmission and host recovery or death. The strength of genetic drift for SARS-CoV-2 transmission is expected to be high due to high levels of superspreading, and this is expected to substantially impact disease epidemiology and evolution. However, we don't yet have an understanding of how genetic drift changes over time or across locations. Furthermore, noise that results from data collection can potentially confound estimates of genetic drift. To address this challenge, we develop and validate a method to jointly infer genetic drift and measurement noise from time-series lineage frequency data. Our method is highly scalable to increasingly large genomic datasets, which overcomes a limitation in commonly used phylogenetic methods. We apply this method to over 490,000 SARS-CoV-2 genomic sequences from England collected between March 2020 and December 2021 by the COVID-19 Genomics UK (COG-UK) consortium and separately infer the strength of genetic drift for pre-B.1.177, B.1.177, Alpha, and Delta. We find that even after correcting for measurement noise, the strength of genetic drift is consistently, throughout time, higher than that expected from the observed number of COVID-19 positive individuals in England by 1 to 3 orders of magnitude, which cannot be explained by literature values of superspreading. Our estimates of genetic drift suggest low and time-varying establishment probabilities for new mutations, inform the parametrization of SARS-CoV-2 evolutionary models, and motivate future studies of the potential mechanisms for increased stochasticity in this system.
Collapse
Affiliation(s)
- QinQin Yu
- Department of Physics, University of California, Berkeley, California, United States of America
| | - Joao A. Ascensao
- Department of Bioengineering, University of California, Berkeley, California, United States of America
| | - Takashi Okada
- Department of Physics, University of California, Berkeley, California, United States of America
- Department of Integrative Biology, University of California, Berkeley, California, United States of America
- Institute for Life and Medical Sciences, Kyoto University, Kyoto, Japan
- RIKEN iTHEMS, Wako, Saitama, Japan
| | | | - Olivia Boyd
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| | - Erik Volz
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| | - Oskar Hallatschek
- Department of Physics, University of California, Berkeley, California, United States of America
- Department of Integrative Biology, University of California, Berkeley, California, United States of America
- Peter Debye Institute for Soft Matter Physics, Leipzig University, Leipzig, Germany
| |
Collapse
|
16
|
Kyriazis CC, Lohmueller KE. Constraining models of dominance for nonsynonymous mutations in the human genome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.25.582010. [PMID: 38463985 PMCID: PMC10925099 DOI: 10.1101/2024.02.25.582010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Dominance is a fundamental parameter in genetics, determining the dynamics of natural selection on deleterious and beneficial mutations, the patterns of genetic variation in natural populations, and the severity of inbreeding depression in a population. Despite this importance, dominance parameters remain poorly known, particularly in humans or other non-model organisms. A key reason for this lack of information about dominance is that it is extremely challenging to disentangle the selection coefficient (s) of a mutation from its dominance coefficient (h). Here, we explore dominance and selection parameters in humans by fitting models to the site frequency spectrum (SFS) for nonsynonymous mutations. When assuming a single dominance coefficient for all nonsynonymous mutations, we find that numerous h values can fit the data, so long as h is greater than ~0.15. Moreover, we also observe that theoretically-predicted models with a negative relationship between h and s can also fit the data well, including models with h=0.05 for strongly deleterious mutations. Finally, we use our estimated dominance and selection parameters to inform simulations revisiting the question of whether the out-of-Africa bottleneck has led to differences in genetic load between African and non-African human populations. These simulations suggest that the relative burden of genetic load in non-African populations depends on the dominance model assumed, with slight increases for more weakly recessive models and slight decreases shown for more strongly recessive models. Moreover, these results also demonstrate that models of partially recessive nonsynonymous mutations can explain the observed severity of inbreeding depression in humans, bridging the gap between molecular population genetics and direct measures of fitness in humans. Our work represents a comprehensive assessment of dominance and deleterious variation in humans, with implications for parameterizing models of deleterious variation in humans and other mammalian species.
Collapse
Affiliation(s)
| | - Kirk E. Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, USA
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, USA
- Department of Human Genetics, David Geffen School of Medicine, Los Angeles, USA
| |
Collapse
|
17
|
Soni V, Pfeifer SP, Jensen JD. The Effects of Mutation and Recombination Rate Heterogeneity on the Inference of Demography and the Distribution of Fitness Effects. Genome Biol Evol 2024; 16:evae004. [PMID: 38207127 PMCID: PMC10834165 DOI: 10.1093/gbe/evae004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 12/12/2023] [Accepted: 01/07/2024] [Indexed: 01/13/2024] Open
Abstract
Disentangling the effects of demography and selection has remained a focal point of population genetic analysis. Knowledge about mutation and recombination is essential in this endeavor; however, despite clear evidence that both mutation and recombination rates vary across genomes, it is common practice to model both rates as fixed. In this study, we quantify how this unaccounted for rate heterogeneity may impact inference using common approaches for inferring selection (DFE-alpha, Grapes, and polyDFE) and/or demography (fastsimcoal2 and δaδi). We demonstrate that, if not properly modeled, this heterogeneity can increase uncertainty in the estimation of demographic and selective parameters and in some scenarios may result in mis-leading inference. These results highlight the importance of quantifying the fundamental evolutionary parameters of mutation and recombination before utilizing population genomic data to quantify the effects of genetic drift (i.e. as modulated by demographic history) and selection; or, at the least, that the effects of uncertainty in these parameters can and should be directly modeled in downstream inference.
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ, USA
| | - Susanne P Pfeifer
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
18
|
Forsdyke DR. Speciation, natural selection, and networks: three historians versus theoretical population geneticists. Theory Biosci 2024; 143:1-26. [PMID: 38282046 DOI: 10.1007/s12064-024-00412-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 01/06/2024] [Indexed: 01/30/2024]
Abstract
In 1913, the geneticist William Bateson called for a halt in studies of genetic phenomena until evolutionary fundamentals had been sufficiently addressed at the molecular level. Nevertheless, in the 1960s, the theoretical population geneticists celebrated a "modern synthesis" of the teachings of Mendel and Darwin, with an exclusive role for natural selection in speciation. This was supported, albeit with minor reservations, by historians Mark Adams and William Provine, who taught it to generations of students. In subsequent decades, doubts were raised by molecular biologists and, despite the deep influence of various mentors, Adams and Provine noted serious anomalies and began to question traditional "just-so-stories." They were joined in challenging the genetic orthodoxy by a scientist-historian, Donald Forsdyke, who suggested that a "collective variation" postulated by Darwin's young research associate, George Romanes, and a mysterious "residue" postulated by Bateson, might relate to differences in short runs of DNA bases (oligonucleotides). The dispute between a small network of historians and a large network of geneticists can be understood in the context of national politics. Contrasts are drawn between democracies, where capturing the narrative makes reversal difficult, and dictatorships, where overthrow of a supportive dictator can result in rapid reversal.
Collapse
Affiliation(s)
- Donald R Forsdyke
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, ON, K7L3N6, Canada.
| |
Collapse
|
19
|
Thom G, Moreira LR, Batista R, Gehara M, Aleixo A, Smith BT. Genomic Architecture Predicts Tree Topology, Population Structuring, and Demographic History in Amazonian Birds. Genome Biol Evol 2024; 16:evae002. [PMID: 38236173 PMCID: PMC10823491 DOI: 10.1093/gbe/evae002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 10/26/2023] [Accepted: 12/12/2023] [Indexed: 01/19/2024] Open
Abstract
Geographic barriers are frequently invoked to explain genetic structuring across the landscape. However, inferences on the spatial and temporal origins of population variation have been largely limited to evolutionary neutral models, ignoring the potential role of natural selection and intrinsic genomic processes known as genomic architecture in producing heterogeneity in differentiation across the genome. To test how variation in genomic characteristics (e.g. recombination rate) impacts our ability to reconstruct general patterns of differentiation between species that cooccur across geographic barriers, we sequenced the whole genomes of multiple bird populations that are distributed across rivers in southeastern Amazonia. We found that phylogenetic relationships within species and demographic parameters varied across the genome in predictable ways. Genetic diversity was positively associated with recombination rate and negatively associated with species tree support. Gene flow was less pervasive in genomic regions of low recombination, making these windows more likely to retain patterns of population structuring that matched the species tree. We further found that approximately a third of the genome showed evidence of selective sweeps and linked selection, skewing genome-wide estimates of effective population sizes and gene flow between populations toward lower values. In sum, we showed that the effects of intrinsic genomic characteristics and selection can be disentangled from neutral processes to elucidate spatial patterns of population differentiation.
Collapse
Affiliation(s)
- Gregory Thom
- Department of Ornithology, American Museum of Natural History, New York, NY, USA
- Museum of Natural Science, Louisiana State University, Baton Rouge, LA, USA
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
| | - Lucas Rocha Moreira
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
- Department of Vertebrate Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Romina Batista
- Programa de Coleções Biológicas, Instituto Nacional de Pesquisas da Amazônia, Manaus, Brazil
- School of Science, Engineering and Environment, University of Salford, Manchester, UK
| | - Marcelo Gehara
- Department of Earth and Environmental Sciences, Rutgers University, Newark, NJ, USA
| | - Alexandre Aleixo
- Finnish Museum of Natural History, University of Helsinki, Helsinki, Finland
- Department of Environmental Genomics, Instituto Tecnológico Vale, Belém, Brazil
| | - Brian Tilston Smith
- Department of Ornithology, American Museum of Natural History, New York, NY, USA
| |
Collapse
|
20
|
Versoza CJ, Weiss S, Johal R, La Rosa B, Jensen JD, Pfeifer SP. Novel Insights into the Landscape of Crossover and Noncrossover Events in Rhesus Macaques (Macaca mulatta). Genome Biol Evol 2024; 16:evad223. [PMID: 38051960 PMCID: PMC10773715 DOI: 10.1093/gbe/evad223] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 11/04/2023] [Accepted: 11/28/2023] [Indexed: 12/07/2023] Open
Abstract
Meiotic recombination landscapes differ greatly between distantly and closely related taxa, populations, individuals, sexes, and even within genomes; however, the factors driving this variation are yet to be well elucidated. Here, we directly estimate contemporary crossover rates and, for the first time, noncrossover rates in rhesus macaques (Macaca mulatta) from four three-generation pedigrees comprising 32 individuals. We further compare these results with historical, demography-aware, linkage disequilibrium-based recombination rate estimates. From paternal meioses in the pedigrees, 165 crossover events with a median resolution of 22.3 kb were observed, corresponding to a male autosomal map length of 2,357 cM-approximately 15% longer than an existing linkage map based on human microsatellite loci. In addition, 85 noncrossover events with a mean tract length of 155 bp were identified-similar to the tract lengths observed in the only other two primates in which noncrossovers have been studied to date, humans and baboons. Consistent with observations in other placental mammals with PRDM9-directed recombination, crossover (and to a lesser extent noncrossover) events in rhesus macaques clustered in intergenic regions and toward the chromosomal ends in males-a pattern in broad agreement with the historical, sex-averaged recombination rate estimates-and evidence of GC-biased gene conversion was observed at noncrossover sites.
Collapse
Affiliation(s)
- Cyril J Versoza
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| | - Sarah Weiss
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Ravneet Johal
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Bruno La Rosa
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| | - Susanne P Pfeifer
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
21
|
Huang X, Rymbekova A, Dolgova O, Lao O, Kuhlwilm M. Harnessing deep learning for population genetic inference. Nat Rev Genet 2024; 25:61-78. [PMID: 37666948 DOI: 10.1038/s41576-023-00636-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2023] [Indexed: 09/06/2023]
Abstract
In population genetics, the emergence of large-scale genomic data for various species and populations has provided new opportunities to understand the evolutionary forces that drive genetic diversity using statistical inference. However, the era of population genomics presents new challenges in analysing the massive amounts of genomes and variants. Deep learning has demonstrated state-of-the-art performance for numerous applications involving large-scale data. Recently, deep learning approaches have gained popularity in population genetics; facilitated by the advent of massive genomic data sets, powerful computational hardware and complex deep learning architectures, they have been used to identify population structure, infer demographic history and investigate natural selection. Here, we introduce common deep learning architectures and provide comprehensive guidelines for implementing deep learning models for population genetic inference. We also discuss current challenges and future directions for applying deep learning in population genetics, focusing on efficiency, robustness and interpretability.
Collapse
Affiliation(s)
- Xin Huang
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria.
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria.
| | - Aigerim Rymbekova
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria
| | - Olga Dolgova
- Integrative Genomics Laboratory, CIC bioGUNE - Centro de Investigación Cooperativa en Biociencias, Derio, Biscaya, Spain
| | - Oscar Lao
- Institute of Evolutionary Biology, CSIC-Universitat Pompeu Fabra, Barcelona, Spain.
| | - Martin Kuhlwilm
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria.
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria.
| |
Collapse
|
22
|
Long H, Johri P, Gout JF, Ni J, Hao Y, Licknack T, Wang Y, Pan J, Jiménez-Marín B, Lynch M. Paramecium Genetics, Genomics, and Evolution. Annu Rev Genet 2023; 57:391-410. [PMID: 38012024 DOI: 10.1146/annurev-genet-071819-104035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
The ciliate genus Paramecium served as one of the first model systems in microbial eukaryotic genetics, contributing much to the early understanding of phenomena as diverse as genome rearrangement, cryptic speciation, cytoplasmic inheritance, and endosymbiosis, as well as more recently to the evolution of mating types, introns, and roles of small RNAs in DNA processing. Substantial progress has recently been made in the area of comparative and population genomics. Paramecium species combine some of the lowest known mutation rates with some of the largest known effective populations, along with likely very high recombination rates, thereby harboring a population-genetic environment that promotes an exceptionally efficient capacity for selection. As a consequence, the genomes are extraordinarily streamlined, with very small intergenic regions combined with small numbers of tiny introns. The subject of the bulk of Paramecium research, the ancient Paramecium aurelia species complex, is descended from two whole-genome duplication events that retain high degrees of synteny, thereby providing an exceptional platform for studying the fates of duplicate genes. Despite having a common ancestor dating to several hundred million years ago, the known descendant species are morphologically indistinguishable, raising significant questions about the common view that gene duplications lead to the origins of evolutionary novelties.
Collapse
Affiliation(s)
- Hongan Long
- Institute of Evolution and Marine Biodiversity, KLMME, Ocean University of China, Qingdao, Shandong Province, China;
- Laboratory for Marine Biology and Biotechnology, Laoshan Laboratory, Qingdao, Shandong Province, China
| | - Parul Johri
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Jean-Francois Gout
- Department of Biological Sciences, Mississippi State University, Starkville, Mississippi, USA
| | - Jiahao Ni
- Institute of Evolution and Marine Biodiversity, KLMME, Ocean University of China, Qingdao, Shandong Province, China;
| | - Yue Hao
- Cancer and Cell Biology Division, Translational Genomics Research Institute, Phoenix, Arizona, USA
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, Arizona, USA;
| | - Timothy Licknack
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, Arizona, USA;
| | - Yaohai Wang
- Institute of Evolution and Marine Biodiversity, KLMME, Ocean University of China, Qingdao, Shandong Province, China;
| | - Jiao Pan
- Institute of Evolution and Marine Biodiversity, KLMME, Ocean University of China, Qingdao, Shandong Province, China;
| | - Berenice Jiménez-Marín
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, Arizona, USA;
| | - Michael Lynch
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, Arizona, USA;
| |
Collapse
|
23
|
Soni V, Pfeifer SP, Jensen JD. The effects of mutation and recombination rate heterogeneity on the inference of demography and the distribution of fitness effects. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.11.566703. [PMID: 38014252 PMCID: PMC10680612 DOI: 10.1101/2023.11.11.566703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Disentangling the effects of demography and selection has remained a focal point of population genetic analysis. Knowledge about mutation and recombination is essential in this endeavour; however, despite clear evidence that both mutation and recombination rates vary across genomes, it is common practice to model both rates as fixed. In this study, we quantify how this unaccounted for rate heterogeneity may impact inference using common approaches for inferring selection (DFE-alpha, Grapes, and polyDFE) and/or demography (fastsimcoal2 and δaδi). We demonstrate that, if not properly modelled, this heterogeneity can increase uncertainty in the estimation of demographic and selective parameters and in some scenarios may result in mis-leading inference. These results highlight the importance of quantifying the fundamental evolutionary parameters of mutation and recombination prior to utilizing population genomic data to quantify the effects of genetic drift (i.e., as modulated by demographic history) and selection; or, at the least, that the effects of uncertainty in these parameters can and should be directly modelled in downstream inference.
Collapse
Affiliation(s)
- Vivak Soni
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine
| | - Susanne P. Pfeifer
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine
| | - Jeffrey D. Jensen
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine
| |
Collapse
|
24
|
Ye Z, Wei W, Pfrender ME, Lynch M. Evolutionary Insights from a Large-Scale Survey of Population-Genomic Variation. Mol Biol Evol 2023; 40:msad233. [PMID: 37863047 PMCID: PMC10630549 DOI: 10.1093/molbev/msad233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 09/11/2023] [Accepted: 10/03/2023] [Indexed: 10/22/2023] Open
Abstract
The field of genomics has ushered in new methods for studying molecular-genetic variation in natural populations. However, most population-genomic studies still rely on small sample sizes (typically, <100 individuals) from single time points, leaving considerable uncertainties with respect to the behavior of relatively young (and rare) alleles and, owing to the large sampling variance of measures of variation, to the specific gene targets of unusually strong selection. Genomic sequences of ∼1,700 haplotypes distributed over a 10-year period from a natural population of the microcrustacean Daphnia pulex reveal evolutionary-genomic features at a refined scale, including previously hidden information on the behavior of rare alleles predicted by recent theory. Background selection, resulting from the recurrent introduction of deleterious alleles, appears to strongly influence the dynamics of neutral alleles, inducing indirect negative selection on rare variants and positive selection on common variants. Temporally fluctuating selection increases the persistence of nonsynonymous alleles with intermediate frequencies, while reducing standing levels of variation at linked silent sites. Combined with the results from an equally large metapopulation survey of the study species, classes of genes that are under strong positive selection can now be confidently identified in this key model organism. Most notable among rapidly evolving Daphnia genes are those associated with ribosomes, mitochondrial functions, sensory systems, and lifespan determination.
Collapse
Affiliation(s)
- Zhiqiang Ye
- Hubei Key Laboratory of Genetic Regulation & Integrative Biology, School of Life Sciences, Central China Normal University, Wuhan 430079, China
| | - Wen Wei
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287, USA
| | - Michael E Pfrender
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Michael Lynch
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287, USA
| |
Collapse
|
25
|
Terbot JW, Cooper BS, Good JM, Jensen JD. A Simulation Framework for Modeling the Within-Patient Evolutionary Dynamics of SARS-CoV-2. Genome Biol Evol 2023; 15:evad204. [PMID: 37950882 PMCID: PMC10664409 DOI: 10.1093/gbe/evad204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 10/31/2023] [Accepted: 11/07/2023] [Indexed: 11/13/2023] Open
Abstract
The global impact of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has led to considerable interest in detecting novel beneficial mutations and other genomic changes that may signal the development of variants of concern (VOCs). The ability to accurately detect these changes within individual patient samples is important in enabling early detection of VOCs. Such genomic scans for rarely acting positive selection are best performed via comparison of empirical data with simulated data wherein commonly acting evolutionary factors, including mutation and recombination, reproductive and infection dynamics, and purifying and background selection, can be carefully accounted for and parameterized. Although there has been work to quantify these factors in SARS-CoV-2, they have yet to be integrated into a baseline model describing intrahost evolutionary dynamics. To construct such a baseline model, we develop a simulation framework that enables one to establish expectations for underlying levels and patterns of patient-level variation. By varying eight key parameters, we evaluated 12,096 different model-parameter combinations and compared them with existing empirical data. Of these, 592 models (∼5%) were plausible based on the resulting mean expected number of segregating variants. These plausible models shared several commonalities shedding light on intrahost SARS-CoV-2 evolutionary dynamics: severe infection bottlenecks, low levels of reproductive skew, and a distribution of fitness effects skewed toward strongly deleterious mutations. We also describe important areas of model uncertainty and highlight additional sequence data that may help to further refine a baseline model. This study lays the groundwork for the improved analysis of existing and future SARS-CoV-2 within-patient data.
Collapse
Affiliation(s)
- John W Terbot
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, Arizona, USA
- Division of Biological Sciences, University of Montana, Missoula, Montana, USA
| | - Brandon S Cooper
- Division of Biological Sciences, University of Montana, Missoula, Montana, USA
| | - Jeffrey M Good
- Division of Biological Sciences, University of Montana, Missoula, Montana, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, Arizona, USA
| |
Collapse
|
26
|
Mo Z, Siepel A. Domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data. PLoS Genet 2023; 19:e1011032. [PMID: 37934781 PMCID: PMC10655966 DOI: 10.1371/journal.pgen.1011032] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 11/17/2023] [Accepted: 10/23/2023] [Indexed: 11/09/2023] Open
Abstract
Investigators have recently introduced powerful methods for population genetic inference that rely on supervised machine learning from simulated data. Despite their performance advantages, these methods can fail when the simulated training data does not adequately resemble data from the real world. Here, we show that this "simulation mis-specification" problem can be framed as a "domain adaptation" problem, where a model learned from one data distribution is applied to a dataset drawn from a different distribution. By applying an established domain-adaptation technique based on a gradient reversal layer (GRL), originally introduced for image classification, we show that the effects of simulation mis-specification can be substantially mitigated. We focus our analysis on two state-of-the-art deep-learning population genetic methods-SIA, which infers positive selection from features of the ancestral recombination graph (ARG), and ReLERNN, which infers recombination rates from genotype matrices. In the case of SIA, the domain adaptive framework also compensates for ARG inference error. Using the domain-adaptive SIA (dadaSIA) model, we estimate improved selection coefficients at selected loci in the 1000 Genomes CEU population. We anticipate that domain adaptation will prove to be widely applicable in the growing use of supervised machine learning in population genetics.
Collapse
Affiliation(s)
- Ziyi Mo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
- School of Biological Sciences, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
- School of Biological Sciences, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| |
Collapse
|
27
|
Mathur S, Mason AJ, Bradburd GS, Gibbs HL. Functional genomic diversity is correlated with neutral genomic diversity in populations of an endangered rattlesnake. Proc Natl Acad Sci U S A 2023; 120:e2303043120. [PMID: 37844221 PMCID: PMC10614936 DOI: 10.1073/pnas.2303043120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 09/19/2023] [Indexed: 10/18/2023] Open
Abstract
Theory predicts that genetic erosion in small, isolated populations of endangered species can be assessed using estimates of neutral genetic variation, yet this widely used approach has recently been questioned in the genomics era. Here, we leverage a chromosome-level genome assembly of an endangered rattlesnake (Sistrurus catenatus) combined with whole genome resequencing data (N = 110 individuals) to evaluate the relationship between levels of genome-wide neutral and functional diversity over historical and future timescales. As predicted, we found positive correlations between genome-wide estimates of neutral genetic diversity (π) and inferred levels of adaptive variation and an estimate of inbreeding mutation load, and a negative relationship between neutral diversity and an estimate of drift mutation load. However, these correlations were half as strong for projected future levels of neutral diversity based on contemporary effective population sizes. Broadly, our results confirm that estimates of neutral genetic diversity provide an accurate measure of genetic erosion in populations of a threatened vertebrate. They also provide nuance to the neutral-functional diversity controversy by suggesting that while these correlations exist, anthropogenetic impacts may have weakened these associations in the recent past and into the future.
Collapse
Affiliation(s)
- Samarth Mathur
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH48824
- Ohio Biodiversity Conservation Partnership, The Ohio State University, Columbus, OH43210
| | - Andrew J. Mason
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH48824
- Ohio Biodiversity Conservation Partnership, The Ohio State University, Columbus, OH43210
| | - Gideon S. Bradburd
- Evolution and Behavior Program, Department of Integrative Biology, Ecology, Michigan State University, East Lansing, MI48824
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI48109
| | - H. Lisle Gibbs
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH48824
- Ohio Biodiversity Conservation Partnership, The Ohio State University, Columbus, OH43210
| |
Collapse
|
28
|
Soni V, Johri P, Jensen JD. Evaluating power to detect recurrent selective sweeps under increasingly realistic evolutionary null models. Evolution 2023; 77:2113-2127. [PMID: 37395482 PMCID: PMC10547124 DOI: 10.1093/evolut/qpad120] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 06/15/2023] [Accepted: 06/30/2023] [Indexed: 07/04/2023]
Abstract
The detection of selective sweeps from population genomic data often relies on the premise that the beneficial mutations in question have fixed very near the sampling time. As it has been previously shown that the power to detect a selective sweep is strongly dependent on the time since fixation as well as the strength of selection, it is naturally the case that strong, recent sweeps leave the strongest signatures. However, the biological reality is that beneficial mutations enter populations at a rate, one that partially determines the mean wait time between sweep events and hence their age distribution. An important question thus remains about the power to detect recurrent selective sweeps when they are modeled by a realistic mutation rate and as part of a realistic distribution of fitness effects, as opposed to a single, recent, isolated event on a purely neutral background as is more commonly modeled. Here we use forward-in-time simulations to study the performance of commonly used sweep statistics, within the context of more realistic evolutionary baseline models incorporating purifying and background selection, population size change, and mutation and recombination rate heterogeneity. Results demonstrate the important interplay of these processes, necessitating caution when interpreting selection scans; specifically, false-positive rates are in excess of true-positive across much of the evaluated parameter space, and selective sweeps are often undetectable unless the strength of selection is exceptionally strong.
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| |
Collapse
|
29
|
Laetsch DR, Bisschop G, Martin SH, Aeschbacher S, Setter D, Lohse K. Demographically explicit scans for barriers to gene flow using gIMble. PLoS Genet 2023; 19:e1010999. [PMID: 37816069 PMCID: PMC10610087 DOI: 10.1371/journal.pgen.1010999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 10/27/2023] [Accepted: 09/25/2023] [Indexed: 10/12/2023] Open
Abstract
Identifying regions of the genome that act as barriers to gene flow between recently diverged taxa has remained challenging given the many evolutionary forces that generate variation in genetic diversity and divergence along the genome, and the stochastic nature of this variation. Progress has been impeded by a conceptual and methodological divide between analyses that infer the demographic history of speciation and genome scans aimed at identifying locally maladaptive alleles i.e. genomic barriers to gene flow. Here we implement genomewide IM blockwise likelihood estimation (gIMble), a composite likelihood approach for the quantification of barriers, that bridges this divide. This analytic framework captures background selection and selection against barriers in a model of isolation with migration (IM) as heterogeneity in effective population size (Ne) and effective migration rate (me), respectively. Variation in both effective demographic parameters is estimated in sliding windows via pre-computed likelihood grids. gIMble includes modules for pre-processing/filtering of genomic data and performing parametric bootstraps using coalescent simulations. To demonstrate the new approach, we analyse data from a well-studied pair of sister species of tropical butterflies with a known history of post-divergence gene flow: Heliconius melpomene and H. cydno. Our analyses uncover both large-effect barrier loci (including well-known wing-pattern genes) and a genome-wide signal of a polygenic barrier architecture.
Collapse
Affiliation(s)
- Dominik R. Laetsch
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh, United Kingdom
| | - Gertjan Bisschop
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh, United Kingdom
| | - Simon H. Martin
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh, United Kingdom
| | - Simon Aeschbacher
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
| | - Derek Setter
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh, United Kingdom
| | - Konrad Lohse
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
30
|
Andersson BA, Zhao W, Haller BC, Brännström Å, Wang XR. Inference of the distribution of fitness effects of mutations is affected by single nucleotide polymorphism filtering methods, sample size and population structure. Mol Ecol Resour 2023; 23:1589-1603. [PMID: 37340611 DOI: 10.1111/1755-0998.13825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 06/02/2023] [Accepted: 06/08/2023] [Indexed: 06/22/2023]
Abstract
The distribution of fitness effects (DFE) of new mutations has been of interest to evolutionary biologists since the concept of mutations arose. Modern population genomic data enable us to quantify the DFE empirically, but few studies have examined how data processing, sample size and cryptic population structure might affect the accuracy of DFE inference. We used simulated and empirical data (from Arabidopsis lyrata) to show the effects of missing data filtering, sample size, number of single nucleotide polymorphisms (SNPs) and population structure on the accuracy and variance of DFE estimates. Our analyses focus on three filtering methods-downsampling, imputation and subsampling-with sample sizes of 4-100 individuals. We show that (1) the choice of missing-data treatment directly affects the estimated DFE, with downsampling performing better than imputation and subsampling; (2) the estimated DFE is less reliable in small samples (<8 individuals), and becomes unpredictable with too few SNPs (<5000, the sum of 0- and 4-fold SNPs); and (3) population structure may skew the inferred DFE towards more strongly deleterious mutations. We suggest that future studies should consider downsampling for small data sets, and use samples larger than 4 (ideally larger than 8) individuals, with more than 5000 SNPs in order to improve the robustness of DFE inference and enable comparative analyses.
Collapse
Affiliation(s)
| | - Wei Zhao
- Department of Ecology and Environmental Sciences, Umeå University, Umeå, Sweden
| | - Benjamin C Haller
- Department of Computational Biology, Cornell University, Ithaca, New York, USA
| | - Åke Brännström
- Department of Mathematics and Mathematical Statistics, Umeå University, Umeå, Sweden
- Advancing Systems Analysis Program, International Institute for Applied Systems Analysis, Laxenburg, Austria
- Complexity Science and Evolution Unit, Okinawa Institute of Science and Technology Graduate University, Kunigami, Japan
| | - Xiao-Ru Wang
- Department of Ecology and Environmental Sciences, Umeå University, Umeå, Sweden
| |
Collapse
|
31
|
Murga-Moreno J, Casillas S, Barbadilla A, Uricchio L, Enard D. An efficient and robust ABC approach to infer the rate and strength of adaptation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.29.555322. [PMID: 37693550 PMCID: PMC10491248 DOI: 10.1101/2023.08.29.555322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Inferring the effects of positive selection on genomes remains a critical step in characterizing the ultimate and proximate causes of adaptation across species, and quantifying positive selection remains a challenge due to the confounding effects of many other evolutionary processes. Robust and efficient approaches for adaptation inference could help characterize the rate and strength of adaptation in non-model species for which demographic history, mutational processes, and recombination patterns are not currently well-described. Here, we introduce an efficient and user-friendly extension of the McDonald-Kreitman test (ABC-MK) for quantifying long-term protein adaptation in specific lineages of interest. We characterize the performance of our approach with forward simulations and find that it is robust to many demographic perturbations and positive selection configurations, demonstrating its suitability for applications to non-model genomes. We apply ABC-MK to the human proteome and a set of known Virus Interacting Proteins (VIPs) to test the long-term adaptation in genes interacting with viruses. We find substantially stronger signatures of positive selection on RNA-VIPs than DNA-VIPs, suggesting that RNA viruses may be an important driver of human adaptation over deep evolutionary time scales.
Collapse
Affiliation(s)
- Jesús Murga-Moreno
- University of Arizona Department of Ecology and Evolutionary Biology, Tucson, USA
| | - Sònia Casillas
- Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| | - Antonio Barbadilla
- Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| | | | - David Enard
- University of Arizona Department of Ecology and Evolutionary Biology, Tucson, USA
| |
Collapse
|
32
|
Scheben A, Mendivil Ramos O, Kramer M, Goodwin S, Oppenheim S, Becker DJ, Schatz MC, Simmons NB, Siepel A, McCombie WR. Long-Read Sequencing Reveals Rapid Evolution of Immunity- and Cancer-Related Genes in Bats. Genome Biol Evol 2023; 15:evad148. [PMID: 37728212 PMCID: PMC10510315 DOI: 10.1093/gbe/evad148] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/03/2023] [Indexed: 09/21/2023] Open
Abstract
Bats are exceptional among mammals for their powered flight, extended lifespans, and robust immune systems and therefore have been of particular interest in comparative genomics. Using the Oxford Nanopore Technologies long-read platform, we sequenced the genomes of two bat species with key phylogenetic positions, the Jamaican fruit bat (Artibeus jamaicensis) and the Mesoamerican mustached bat (Pteronotus mesoamericanus), and carried out a comprehensive comparative genomic analysis with a diverse collection of bats and other mammals. The high-quality, long-read genome assemblies revealed a contraction of interferon (IFN)-α at the immunity-related type I IFN locus in bats, resulting in a shift in relative IFN-ω and IFN-α copy numbers. Contradicting previous hypotheses of constitutive expression of IFN-α being a feature of the bat immune system, three bat species lost all IFN-α genes. This shift to IFN-ω could contribute to the increased viral tolerance that has made bats a common reservoir for viruses that can be transmitted to humans. Antiviral genes stimulated by type I IFNs also showed evidence of rapid evolution, including a lineage-specific duplication of IFN-induced transmembrane genes and positive selection in IFIT2. In addition, 33 tumor suppressors and 6 DNA-repair genes showed signs of positive selection, perhaps contributing to increased longevity and reduced cancer rates in bats. The robust immune systems of bats rely on both bat-wide and lineage-specific evolution in the immune gene repertoire, suggesting diverse immune strategies. Our study provides new genomic resources for bats and sheds new light on the extraordinary molecular evolution in this critically important group of mammals.
Collapse
Affiliation(s)
- Armin Scheben
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | | | - Melissa Kramer
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Sara Oppenheim
- American Museum of Natural History, Institute for Comparative Genomics, New York, New York, USA
| | - Daniel J Becker
- School of Biological Sciences, University of Oklahoma, Norman, Oklahoma, USA
| | - Michael C Schatz
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, Maryland, USA
| | - Nancy B Simmons
- Department of Mammalogy, Division of Vertebrate Zoology, American Museum of Natural History, New York, New York, USA
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | | |
Collapse
|
33
|
Hoelzel AR, Lynch M. The raw material of evolution. Science 2023; 381:942-943. [PMID: 37651506 DOI: 10.1126/science.adk0121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
Estimates of whale mutation rates contribute to understanding evolutionary processes.
Collapse
Affiliation(s)
- A Rus Hoelzel
- Department of Biosciences, Durham University, Durham, UK
| | - Michael Lynch
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
34
|
Terbot JW, Cooper BS, Good JM, Jensen JD. A simulation framework for modeling the within-patient evolutionary dynamics of SARS-CoV-2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.13.548462. [PMID: 37503016 PMCID: PMC10370031 DOI: 10.1101/2023.07.13.548462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
The global impact of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has led to considerable interest in detecting novel beneficial mutations and other genomic changes that may signal the development of variants of concern (VOCs). The ability to accurately detect these changes within individual patient samples is important in enabling early detection of VOCs. Such genomic scans for positive selection are best performed via comparison of empirical data to simulated data wherein evolutionary factors, including mutation and recombination rates, reproductive and infection dynamics, and purifying and background selection, can be carefully accounted for and parameterized. While there has been work to quantify these factors in SARS-CoV-2, they have yet to be integrated into a baseline model describing intra-host evolutionary dynamics. To construct such a baseline model, we develop a simulation framework that enables one to establish expectations for underlying levels and patterns of patient-level variation. By varying eight key parameters, we evaluated 12,096 different model-parameter combinations and compared them to existing empirical data. Of these, 592 models (~5%) were plausible based on the resulting mean expected number of segregating variants. These plausible models shared several commonalities shedding light on intra-host SARS-CoV-2 evolutionary dynamics: severe infection bottlenecks, low levels of reproductive skew, and a distribution of fitness effects skewed towards strongly deleterious mutations. We also describe important areas of model uncertainty and highlight additional sequence data that may help to further refine a baseline model. This study lays the groundwork for the improved analysis of existing and future SARS-CoV-2 within-patient data.
Collapse
Affiliation(s)
- John W Terbot
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine, Tempe, Arizona, United States of America
- University of Montana, Division of Biological Sciences, Missoula, Montana, United States of America
| | - Brandon S. Cooper
- University of Montana, Division of Biological Sciences, Missoula, Montana, United States of America
| | - Jeffrey M. Good
- University of Montana, Division of Biological Sciences, Missoula, Montana, United States of America
| | - Jeffrey D. Jensen
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine, Tempe, Arizona, United States of America
| |
Collapse
|
35
|
Whitehouse LS, Schrider DR. Timesweeper: accurately identifying selective sweeps using population genomic time series. Genetics 2023; 224:iyad084. [PMID: 37157914 PMCID: PMC10324941 DOI: 10.1093/genetics/iyad084] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 07/25/2022] [Accepted: 04/25/2023] [Indexed: 05/10/2023] Open
Abstract
Despite decades of research, identifying selective sweeps, the genomic footprints of positive selection, remains a core problem in population genetics. Of the myriad methods that have been developed to tackle this task, few are designed to leverage the potential of genomic time-series data. This is because in most population genetic studies of natural populations, only a single period of time can be sampled. Recent advancements in sequencing technology, including improvements in extracting and sequencing ancient DNA, have made repeated samplings of a population possible, allowing for more direct analysis of recent evolutionary dynamics. Serial sampling of organisms with shorter generation times has also become more feasible due to improvements in the cost and throughput of sequencing. With these advances in mind, here we present Timesweeper, a fast and accurate convolutional neural network-based tool for identifying selective sweeps in data consisting of multiple genomic samplings of a population over time. Timesweeper analyzes population genomic time-series data by first simulating training data under a demographic model appropriate for the data of interest, training a one-dimensional convolutional neural network on said simulations, and inferring which polymorphisms in this serialized data set were the direct target of a completed or ongoing selective sweep. We show that Timesweeper is accurate under multiple simulated demographic and sampling scenarios, identifies selected variants with high resolution, and estimates selection coefficients more accurately than existing methods. In sum, we show that more accurate inferences about natural selection are possible when genomic time-series data are available; such data will continue to proliferate in coming years due to both the sequencing of ancient samples and repeated samplings of extant populations with faster generation times, as well as experimentally evolved populations where time-series data are often generated. Methodological advances such as Timesweeper thus have the potential to help resolve the controversy over the role of positive selection in the genome. We provide Timesweeper as a Python package for use by the community.
Collapse
Affiliation(s)
- Logan S Whitehouse
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27514, USA
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27514, USA
| |
Collapse
|
36
|
Soni V, Johri P, Jensen JD. Evaluating power to detect recurrent selective sweeps under increasingly realistic evolutionary null models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.15.545166. [PMID: 37398347 PMCID: PMC10312679 DOI: 10.1101/2023.06.15.545166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
The detection of selective sweeps from population genomic data often relies on the premise that the beneficial mutations in question have fixed very near the sampling time. As it has been previously shown that the power to detect a selective sweep is strongly dependent on the time since fixation as well as the strength of selection, it is naturally the case that strong, recent sweeps leave the strongest signatures. However, the biological reality is that beneficial mutations enter populations at a rate, one that partially determines the mean wait time between sweep events and hence their age distribution. An important question thus remains about the power to detect recurrent selective sweeps when they are modelled by a realistic mutation rate and as part of a realistic distribution of fitness effects (DFE), as opposed to a single, recent, isolated event on a purely neutral background as is more commonly modelled. Here we use forward-in-time simulations to study the performance of commonly used sweep statistics, within the context of more realistic evolutionary baseline models incorporating purifying and background selection, population size change, and mutation and recombination rate heterogeneity. Results demonstrate the important interplay of these processes, necessitating caution when interpreting selection scans; specifically, false positive rates are in excess of true positive across much of the evaluated parameter space, and selective sweeps are often undetectable unless the strength of selection is exceptionally strong. Teaser Text Outlier-based genomic scans have proven a popular approach for identifying loci that have potentially experienced recent positive selection. However, it has previously been shown that an evolutionarily appropriate baseline model that incorporates non-equilibrium population histories, purifying and background selection, and variation in mutation and recombination rates is necessary to reduce often extreme false positive rates when performing genomic scans. Here we evaluate the power to detect recurrent selective sweeps using common SFS-based and haplotype-based methods under these increasingly realistic models. We find that while these appropriate evolutionary baselines are essential to reduce false positive rates, the power to accurately detect recurrent selective sweeps is generally low across much of the biologically relevant parameter space.
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
- Present address: Department of Biology, Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | | |
Collapse
|
37
|
Ye Z, Wei W, Pfrender M, Lynch M. Evolutionary Insights from a Large-scale Survey of Population-genomic Variation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.03.539276. [PMID: 37205430 PMCID: PMC10187179 DOI: 10.1101/2023.05.03.539276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Results from data on > 1000 haplotypes distributed over a nine-year period from a natural population of the microcrustacean Daphnia pulex reveal evolutionary-genomic features at a refined scale, including key population-genetic properties that are obscured in studies with smaller sample sizes. Background selection, resulting from the recurrent introduction of deleterious alleles, appears to strongly influence the dynamics of neutral alleles, inducing indirect negative selection on rare variants and positive selection on common variants. Fluctuating selection increases the persistence of nonsynonymous alleles with intermediate frequencies, while reducing standing levels of variation at linked silent sites. Combined with the results from an equally large metapopulation survey of the study species, regions of gene structure that are under strong purifying selection and classes of genes that are under strong positive selection in this key species can be confidently identified. Most notable among rapidly evolving Daphnia genes are those associated with ribosomes, mitochondrial functions, sensory systems, and lifespan determination.
Collapse
Affiliation(s)
- Zhiqiang Ye
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287
| | - Wen Wei
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287
| | - Michael Pfrender
- Department of Biological Sciences, Notre Dame University, Notre Dame, IN 46556
| | - Michael Lynch
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287
| |
Collapse
|
38
|
Lynch M, Wei W, Ye Z, Pfrender M. The Genome-wide Signature of Short-term Temporal Selection. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.28.538790. [PMID: 37162919 PMCID: PMC10168312 DOI: 10.1101/2023.04.28.538790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Despite evolutionary biology's obsession with natural selection, few studies have evaluated multi-generational series of patterns of selection on a genome-wide scale in natural populations. Here, we report on a nine-year population-genomic survey of the microcrustacean Daphnia pulex. The genome-sequences of > 800 isolates provide insights into patterns of selection that cannot be obtained from long-term molecular-evolution studies, including the pervasiveness of near quasi-neutrality across the genome (mean net selection coefficients near zero, but with significant temporal variance about the mean, and little evidence of positive covariance of selection across time intervals), the preponderance of weak negative selection operating on minor alleles, and a genome-wide distribution of numerous small linkage islands of observable selection influencing levels of nucleotide diversity. These results suggest that fluctuating selection is a major determinant of standing levels of variation in natural populations, challenge the conventional paradigm for interpreting patterns of nucleotide diversity and divergence, and motivate the need for the development of new theoretical expressions for the interpretation of population-genomic data.
Collapse
Affiliation(s)
- Michael Lynch
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287
| | - Wen Wei
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287
| | - Zhiqiang Ye
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287
| | - Michael Pfrender
- Department of Biological Sciences, Notre Dame University, Notre Dame, IN 46556
| |
Collapse
|
39
|
Johri P, Pfeifer SP, Jensen JD. Developing an evolutionary baseline model for humans: jointly inferring purifying selection with population history. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.11.536488. [PMID: 37090533 PMCID: PMC10120674 DOI: 10.1101/2023.04.11.536488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
Building evolutionarily appropriate baseline models for natural populations is not only important for answering fundamental questions in population genetics - including quantifying the relative contributions of adaptive vs. non-adaptive processes - but it is also essential for identifying candidate loci experiencing relatively rare and episodic forms of selection ( e.g., positive or balancing selection). Here, a baseline model was developed for a human population of West African ancestry, the Yoruba, comprising processes constantly operating on the genome ( i.e. , purifying and background selection, population size changes, recombination rate heterogeneity, and gene conversion). Specifically, to perform joint inference of selective effects with demography, an approximate Bayesian approach was employed that utilizes the decay of background selection effects around functional elements, taking into account genomic architecture. This approach inferred a recent 6-fold population growth together with a distribution of fitness effects that is skewed towards effectively neutral mutations. Importantly, these results further suggest that, while strong and/or frequent recurrent positive selection is inconsistent with observed data, weak to moderate positive selection is consistent but unidentifiable if rare.
Collapse
|
40
|
Terbot JW, Johri P, Liphardt SW, Soni V, Pfeifer SP, Cooper BS, Good JM, Jensen JD. Developing an appropriate evolutionary baseline model for the study of SARS-CoV-2 patient samples. PLoS Pathog 2023; 19:e1011265. [PMID: 37018331 PMCID: PMC10075409 DOI: 10.1371/journal.ppat.1011265] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2023] Open
Abstract
Over the past 3 years, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has spread through human populations in several waves, resulting in a global health crisis. In response, genomic surveillance efforts have proliferated in the hopes of tracking and anticipating the evolution of this virus, resulting in millions of patient isolates now being available in public databases. Yet, while there is a tremendous focus on identifying newly emerging adaptive viral variants, this quantification is far from trivial. Specifically, multiple co-occurring and interacting evolutionary processes are constantly in operation and must be jointly considered and modeled in order to perform accurate inference. We here outline critical individual components of such an evolutionary baseline model-mutation rates, recombination rates, the distribution of fitness effects, infection dynamics, and compartmentalization-and describe the current state of knowledge pertaining to the related parameters of each in SARS-CoV-2. We close with a series of recommendations for future clinical sampling, model construction, and statistical analysis.
Collapse
Affiliation(s)
- John W Terbot
- University of Montana, Division of Biological Sciences, Missoula, Montana, United States of America
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine, Tempe, Arizona, United States of America
| | - Parul Johri
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine, Tempe, Arizona, United States of America
| | - Schuyler W Liphardt
- University of Montana, Division of Biological Sciences, Missoula, Montana, United States of America
| | - Vivak Soni
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine, Tempe, Arizona, United States of America
| | - Susanne P Pfeifer
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine, Tempe, Arizona, United States of America
| | - Brandon S Cooper
- University of Montana, Division of Biological Sciences, Missoula, Montana, United States of America
| | - Jeffrey M Good
- University of Montana, Division of Biological Sciences, Missoula, Montana, United States of America
| | - Jeffrey D Jensen
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine, Tempe, Arizona, United States of America
| |
Collapse
|
41
|
Ghafoor S, Santos J, Versoza CJ, Jensen JD, Pfeifer SP. The Impact of Sample Size and Population History on Observed Mutational Spectra: A Case Study in Human and Chimpanzee Populations. Genome Biol Evol 2023; 15:7039701. [PMID: 36790107 PMCID: PMC9989333 DOI: 10.1093/gbe/evad019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 01/20/2023] [Accepted: 02/08/2023] [Indexed: 02/16/2023] Open
Abstract
Recent studies have highlighted variation in the mutational spectra among human populations as well as closely related hominoids-yet little remains known about the genetic and nongenetic factors driving these rate changes across the genome. Pinpointing the root causes of these differences is an important endeavor that requires careful comparative analyses of population-specific mutational landscapes at both broad and fine genomic scales. However, several factors can confound such analyses. Although previous studies have shown that technical artifacts, such as sequencing errors and batch effects, can contribute to observed mutational shifts, other potentially confounding parameters have received less attention thus far. Using population genetic simulations of human and chimpanzee populations as an illustrative example, we here show that the sample size required for robust inference of mutational spectra depends on the population-specific demographic history. As a consequence, the power to detect rate changes is high in certain hominoid populations while, for others, currently available sample sizes preclude analyses at fine genomic scales.
Collapse
Affiliation(s)
- Suhail Ghafoor
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| | - João Santos
- School of Life Sciences, Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| | - Cyril J Versoza
- School of Life Sciences, Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| | - Susanne P Pfeifer
- School of Life Sciences, Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
42
|
Jensen JD. Population genetic concerns related to the interpretation of empirical outliers and the neglect of common evolutionary processes. Heredity (Edinb) 2023; 130:109-110. [PMID: 36829044 PMCID: PMC9981695 DOI: 10.1038/s41437-022-00575-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 10/27/2022] [Accepted: 10/28/2022] [Indexed: 02/26/2023] Open
Affiliation(s)
- Jeffrey D Jensen
- School of Life Science, Arizona State University, Tempe, AZ, USA.
| |
Collapse
|
43
|
Charlesworth B, Jensen JD. Population Genetic Considerations Regarding Evidence for Biased Mutation Rates in Arabidopsis thaliana. Mol Biol Evol 2023; 40:6961073. [PMID: 36572441 PMCID: PMC9907473 DOI: 10.1093/molbev/msac275] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
It has recently been proposed that lower mutation rates in gene bodies compared with upstream and downstream sequences in Arabidopsis thaliana are the result of an "adaptive" modification of the rate of beneficial and deleterious mutations in these functional regions. This claim was based both on analyses of mutation accumulation lines and on population genomics data. Here, we show that several questionable assumptions were used in the population genomics analyses. In particular, we demonstrate that the difference between gene bodies and less selectively constrained sequences in the magnitude of Tajima's D can in principle be explained by the presence of sites subject to purifying selection and does not require lower mutation rates in regions experiencing selective constraints.
Collapse
Affiliation(s)
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, 85281 AZ
| |
Collapse
|
44
|
Performance evaluation of six popular short-read simulators. Heredity (Edinb) 2023; 130:55-63. [PMID: 36496447 PMCID: PMC9905089 DOI: 10.1038/s41437-022-00577-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 11/10/2022] [Accepted: 11/11/2022] [Indexed: 12/14/2022] Open
Abstract
High-throughput sequencing data enables the comprehensive study of genomes and the variation therein. Essential for the interpretation of this genomic data is a thorough understanding of the computational methods used for processing and analysis. Whereas "gold-standard" empirical datasets exist for this purpose in humans, synthetic (i.e., simulated) sequencing data can offer important insights into the capabilities and limitations of computational pipelines for any arbitrary species and/or study design-yet, the ability of read simulator software to emulate genomic characteristics of empirical datasets remains poorly understood. We here compare the performance of six popular short-read simulators-ART, DWGSIM, InSilicoSeq, Mason, NEAT, and wgsim-and discuss important considerations for selecting suitable models for benchmarking.
Collapse
|
45
|
Steiner MC, Novembre J. Population genetic models for the spatial spread of adaptive variants: A review in light of SARS-CoV-2 evolution. PLoS Genet 2022; 18:e1010391. [PMID: 36137003 PMCID: PMC9498967 DOI: 10.1371/journal.pgen.1010391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Theoretical population genetics has long studied the arrival and geographic spread of adaptive variants through the analysis of mathematical models of dispersal and natural selection. These models take on a renewed interest in the context of the COVID-19 pandemic, especially given the consequences that novel adaptive variants have had on the course of the pandemic as they have spread through global populations. Here, we review theoretical models for the spatial spread of adaptive variants and identify areas to be improved in future work, toward a better understanding of variants of concern in Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) evolution and other contemporary applications. As we describe, characteristics of pandemics such as COVID-19-such as the impact of long-distance travel patterns and the overdispersion of lineages due to superspreading events-suggest new directions for improving upon existing population genetic models.
Collapse
Affiliation(s)
- Margaret C. Steiner
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - John Novembre
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Department of Ecology & Evolution, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
46
|
Abstract
We discuss the genetic, demographic, and selective forces that are likely to be at play in restricting observed levels of DNA sequence variation in natural populations to a much smaller range of values than would be expected from the distribution of census population sizes alone-Lewontin's Paradox. While several processes that have previously been strongly emphasized must be involved, including the effects of direct selection and genetic hitchhiking, it seems unlikely that they are sufficient to explain this observation without contributions from other factors. We highlight a potentially important role for the less-appreciated contribution of population size change; specifically, the likelihood that many species and populations may be quite far from reaching the relatively high equilibrium diversity values that would be expected given their current census sizes.
Collapse
Affiliation(s)
- Brian Charlesworth
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
47
|
Johri P, Eyre-Walker A, Gutenkunst RN, Lohmueller KE, Jensen JD. On the prospect of achieving accurate joint estimation of selection with population history. Genome Biol Evol 2022; 14:6604401. [PMID: 35675379 PMCID: PMC9254643 DOI: 10.1093/gbe/evac088] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/02/2022] [Indexed: 11/15/2022] Open
Abstract
As both natural selection and population history can affect genome-wide patterns of variation, disentangling the contributions of each has remained as a major challenge in population genetics. We here discuss historical and recent progress towards this goal—highlighting theoretical and computational challenges that remain to be addressed, as well as inherent difficulties in dealing with model complexity and model violations—and offer thoughts on potentially fruitful next steps.
Collapse
Affiliation(s)
- Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | | | - Ryan N Gutenkunst
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ, USA
| | - Kirk E Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA, USA.,Department of Human Genetics, University of California, Los Angeles, CA, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| |
Collapse
|