1
|
Bendall EE, Zhu Y, Fitzsimmons WJ, Rolfes M, Mellis A, Halasa N, Martin ET, Grijalva CG, Talbot HK, Lauring AS. Influenza A virus within-host evolution and positive selection in a densely sampled household cohort over three seasons. Virus Evol 2024; 10:veae084. [PMID: 39444487 PMCID: PMC11498174 DOI: 10.1093/ve/veae084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 09/24/2024] [Accepted: 09/30/2024] [Indexed: 10/25/2024] Open
Abstract
While influenza A virus (IAV) antigenic drift has been documented globally, in experimental animal infections, and in immunocompromised hosts, positive selection has generally not been detected in acute infections. This is likely due to challenges in distinguishing selected rare mutations from sequencing error, a reliance on cross-sectional sampling, and/or the lack of formal tests of selection for individual sites. Here, we sequenced IAV populations from 346 serial, daily nasal swabs from 143 individuals collected over three influenza seasons in a household cohort. Viruses were sequenced in duplicate, and intrahost single nucleotide variants (iSNVs) were identified at a 0.5% frequency threshold. Within-host populations exhibited low diversity, with >75% mutations present at <2% frequency. Children (0-5 years) had marginally higher within-host evolutionary rates than adolescents (6-18 years) and adults (>18 years, 4.4 × 10-6 vs. 9.42 × 10-7 and 3.45 × 10-6, P < .001). Forty-five iSNVs had evidence of parallel evolution but were not over-represented in HA and NA. Several increased from minority to consensus level, with strong linkage among iSNVs across segments. A Wright-Fisher approximate Bayesian computational model identified positive selection at 23/256 loci (9%) in A(H3N2) specimens and 19/176 loci (11%) in A(H1N1)pdm09 specimens, and these were infrequently found in circulation. Overall, we found that within-host IAV populations were subject to genetic drift and purifying selection, with only subtle differences across seasons, subtypes, and age strata. Positive selection was rare and inconsistently detected.
Collapse
Affiliation(s)
- Emily E Bendall
- Department of Microbiology & Immunology, University of Michigan, Ann Arbor, MI 48109, United States
| | - Yuwei Zhu
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - William J Fitzsimmons
- Division of Infectious Diseases, University of Michigan, Ann Arbor, MI 48109, United States
| | - Melissa Rolfes
- Influenza Division, Centers for Disease Control and Prevention, Atlanta, GA 30333, United States
| | - Alexandra Mellis
- Influenza Division, Centers for Disease Control and Prevention, Atlanta, GA 30333, United States
| | - Natasha Halasa
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Emily T Martin
- Department of Epidemiology, University of Michigan, Ann Arbor, MI 48109, United States
| | - Carlos G Grijalva
- Department of Health Policy, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - H Keipp Talbot
- Department of Health Policy, Vanderbilt University Medical Center, Nashville, TN 37203, United States
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Adam S Lauring
- Department of Microbiology & Immunology, University of Michigan, Ann Arbor, MI 48109, United States
- Division of Infectious Diseases, University of Michigan, Ann Arbor, MI 48109, United States
| |
Collapse
|
2
|
Bendall EE, Zhu Y, Fitzsimmons WJ, Rolfes M, Mellis A, Halasa N, Martin ET, Grijalva CG, Talbot HK, Lauring AS. Influenza A virus within-host evolution and positive selection in a densely sampled household cohort over three seasons. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.15.608152. [PMID: 39229225 PMCID: PMC11370358 DOI: 10.1101/2024.08.15.608152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
While influenza A virus (IAV) antigenic drift has been documented globally, in experimental animal infections, and in immunocompromised hosts, positive selection has generally not been detected in acute infections. This is likely due to challenges in distinguishing selected rare mutations from sequencing error, a reliance on cross-sectional sampling, and/or the lack of formal tests of selection for individual sites. Here, we sequenced IAV populations from 346 serial, daily nasal swabs from 143 individuals collected over three influenza seasons in a household cohort. Viruses were sequenced in duplicate, and intrahost single nucleotide variants (iSNV) were identified at a 0.5% frequency threshold. Within-host populations were subject to purifying selection with >75% mutations present at <2% frequency. Children (0-5 years) had marginally higher within-host evolutionary rates than adolescents (6-18 years) and adults (>18 years, 4.4×10-6 vs. 9.42×10-7 and 3.45×10-6, p <0.001). Forty-five iSNV had evidence of parallel evolution, but were not overrepresented in HA and NA. Several increased from minority to consensus level, with strong linkage among iSNV across segments. A Wright Fisher Approximate Bayesian Computational model identified positive selection at 23/256 loci (9%) in A(H3N2) specimens and 19/176 loci (11%) in A(H1N1)pdm09 specimens, and these were infrequently found in circulation. Overall, we found that within-host IAV populations were subject to purifying selection and genetic drift, with only subtle differences across seasons, subtypes, and age strata. Positive selection was rare and inconsistently detected.
Collapse
Affiliation(s)
- Emily E. Bendall
- Department of Microbiology & Immunology, University of Michigan, Ann Arbor, MI, USA
| | - Yuwei Zhu
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| | | | - Melissa Rolfes
- Influenza Division, Centers for Disease Control and Prevention, Atlanta, GA USA
| | - Alexandra Mellis
- Influenza Division, Centers for Disease Control and Prevention, Atlanta, GA USA
| | - Natasha Halasa
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Emily T. Martin
- Department of Epidemiology, University of Michigan, Ann Arbor, MI, USA
| | - Carlos G. Grijalva
- Department of Health Policy, Vanderbilt University Medical Center, Nashville, TN, USA
| | - H. Keipp Talbot
- Department of Health Policy, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Adam S. Lauring
- Department of Microbiology & Immunology, University of Michigan, Ann Arbor, MI, USA
- Division of Infectious Diseases, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
3
|
Di C, Lohmueller KE. Revisiting Dominance in Population Genetics. Genome Biol Evol 2024; 16:evae147. [PMID: 39114967 PMCID: PMC11306932 DOI: 10.1093/gbe/evae147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/24/2024] [Indexed: 08/11/2024] Open
Abstract
Dominance refers to the effect of a heterozygous genotype relative to that of the two homozygous genotypes. The degree of dominance of mutations for fitness can have a profound impact on how deleterious and beneficial mutations change in frequency over time as well as on the patterns of linked neutral genetic variation surrounding such selected alleles. Since dominance is such a fundamental concept, it has received immense attention throughout the history of population genetics. Early work from Fisher, Wright, and Haldane focused on understanding the conceptual basis for why dominance exists. More recent work has attempted to test these theories and conceptual models by estimating dominance effects of mutations. However, estimating dominance coefficients has been notoriously challenging and has only been done in a few species in a limited number of studies. In this review, we first describe some of the early theoretical and conceptual models for understanding the mechanisms for the existence of dominance. Second, we discuss several approaches used to estimate dominance coefficients and summarize estimates of dominance coefficients. We note trends that have been observed across species, types of mutations, and functional categories of genes. By comparing estimates of dominance coefficients for different types of genes, we test several hypotheses for the existence of dominance. Lastly, we discuss how dominance influences the dynamics of beneficial and deleterious mutations in populations and how the degree of dominance of deleterious mutations influences the impact of inbreeding on fitness.
Collapse
Affiliation(s)
- Chenlu Di
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA, USA
| | - Kirk E Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA, USA
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, Los Angeles, CA, USA
| |
Collapse
|
4
|
Fine AG, Steinrücken M. A novel expectation-maximization approach to infer general diploid selection from time-series genetic data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.10.593575. [PMID: 38798346 PMCID: PMC11118272 DOI: 10.1101/2024.05.10.593575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Detecting and quantifying the strength of selection is a main objective in population genetics. Since selection acts over multiple generations, many approaches have been developed to detect and quantify selection using genetic data sampled at multiple points in time. Such time series genetic data is commonly analyzed using Hidden Markov Models, but in most cases, under the assumption of additive selection. However, many examples of genetic variation exhibiting non-additive mechanisms exist, making it critical to develop methods that can characterize selection in more general scenarios. Thus, we extend a previously introduced expectation-maximization algorithm for the inference of additive selection coefficients to the case of general diploid selection, in which heterozygote and homozygote fitnesses are parameterized independently. We furthermore introduce a framework to identify bespoke modes of diploid selection from given data, as well as a procedure for aggregating data across linked loci to increase power and robustness. Using extensive simulation studies, we find that our method accurately and efficiently estimates selection coefficients for different modes of diploid selection across a wide range of scenarios; however, power to classify the mode of selection is low unless selection is very strong. We apply our method to ancient DNA samples from Great Britain in the last 4,450 years, and detect evidence for selection in six genomic regions, including the well-characterized LCT locus. Our work is the first genome-wide scan characterizing signals of general diploid selection.
Collapse
Affiliation(s)
- Adam G Fine
- Department of Ecology and Evolution, University of Chicago
- Graduate Program in Biophysical Sciences, University of Chicago
| | - Matthias Steinrücken
- Department of Ecology and Evolution, University of Chicago
- Department of Human Genetics, University of Chicago
| |
Collapse
|
5
|
Saubin M, Tellier A, Stoeckel S, Andrieux A, Halkett F. Approximate Bayesian Computation applied to time series of population genetic data disentangles rapid genetic changes and demographic variations in a pathogen population. Mol Ecol 2024; 33:e16965. [PMID: 37150947 DOI: 10.1111/mec.16965] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 04/04/2023] [Accepted: 04/12/2023] [Indexed: 05/09/2023]
Abstract
Adaptation can occur at remarkably short timescales in natural populations, leading to drastic changes in phenotypes and genotype frequencies over a few generations only. The inference of demographic parameters can allow understanding how evolutionary forces interact and shape the genetic trajectories of populations during rapid adaptation. Here we propose a new Approximate Bayesian Computation (ABC) framework that couples a forward and individual-based model with temporal genetic data to disentangle genetic changes and demographic variations in a case of rapid adaptation. We test the accuracy of our inferential framework and evaluate the benefit of considering a dense versus sparse sampling. Theoretical investigations demonstrate high accuracy in both model and parameter estimations, even if a strong thinning is applied to time series data. Then, we apply our ABC inferential framework to empirical data describing the population genetic changes of the poplar rust pathogen following a major event of resistance overcoming. We successfully estimate key demographic and genetic parameters, including the proportion of resistant hosts deployed in the landscape and the level of standing genetic variation from which selection occurred. Inferred values are in accordance with our empirical knowledge of this biological system. This new inferential framework, which contrasts with coalescent-based ABC analyses, is promising for a better understanding of evolutionary trajectories of populations subjected to rapid adaptation.
Collapse
Affiliation(s)
- Méline Saubin
- Université de Lorraine, INRAE, IAM, Nancy, France
- Department for Life Science Systems, Technical University of Munich, Freising, Germany
| | - Aurélien Tellier
- Department for Life Science Systems, Technical University of Munich, Freising, Germany
| | - Solenn Stoeckel
- INRAE, Agrocampus Ouest, Université de Rennes, IGEPP, Le Rheu, France
| | | | | |
Collapse
|
6
|
Vellnow N, Gossmann TI, Waxman D. The pseudoentropy of allele frequency trajectories, the persistence of variation, and the effective population size. Biosystems 2024; 238:105176. [PMID: 38479654 DOI: 10.1016/j.biosystems.2024.105176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 03/01/2024] [Accepted: 03/01/2024] [Indexed: 03/24/2024]
Abstract
To concisely describe how genetic variation, at individual loci or across whole genomes, changes over time, and to follow transitory allelic changes, we introduce a quantity related to entropy, that we term pseudoentropy. This quantity emerges in a diffusion analysis of the mean time a mutation segregates in a population. For a neutral locus with an arbitrary number of alleles, the mean time of segregation is generally proportional to the pseudoentropy of initial allele frequencies. After the initial time point, pseudoentropy generally decreases, but other behaviours are possible, depending on the genetic diversity and selective forces present. For a biallelic locus, pseudoentropy and entropy coincide, but they are distinct quantities with more than two alleles. Thus for populations with multiple biallelic loci, the language of entropy suffices. Then entropy, combined across loci, serves as a concise description of genetic variation. We used individual based simulations to explore how this entropy behaves under different evolutionary scenarios. In agreement with predictions, the entropy associated with unlinked neutral loci decreases over time. However, deviations from free recombination and neutrality have clear and informative effects on the entropy's behaviour over time. Analysis of publicly available data of a natural D. melanogaster population, that had been sampled over seven years, using a sliding-window approach, yielded considerable variation in entropy trajectories of different genomic regions. These mostly follow a pattern that suggests a substantial effective population size and a limited effect of positive selection on genome-wide diversity over short time scales.
Collapse
Affiliation(s)
- Nikolas Vellnow
- TU Dortmund University, Computational Systems Biology, Faculty of Biochemical and Chemical Engineering, Emil-Figge-Str. 66, 44227 Dortmund, Germany.
| | - Toni I Gossmann
- TU Dortmund University, Computational Systems Biology, Faculty of Biochemical and Chemical Engineering, Emil-Figge-Str. 66, 44227 Dortmund, Germany.
| | - David Waxman
- Fudan University, Centre for Computational Systems Biology, ISTBI, 220 Handan Road, Shanghai 200433, People's Republic of China.
| |
Collapse
|
7
|
Bernatchez L, Ferchaud AL, Berger CS, Venney CJ, Xuereb A. Genomics for monitoring and understanding species responses to global climate change. Nat Rev Genet 2024; 25:165-183. [PMID: 37863940 DOI: 10.1038/s41576-023-00657-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/29/2023] [Indexed: 10/22/2023]
Abstract
All life forms across the globe are experiencing drastic changes in environmental conditions as a result of global climate change. These environmental changes are happening rapidly, incur substantial socioeconomic costs, pose threats to biodiversity and diminish a species' potential to adapt to future environments. Understanding and monitoring how organisms respond to human-driven climate change is therefore a major priority for the conservation of biodiversity in a rapidly changing environment. Recent developments in genomic, transcriptomic and epigenomic technologies are enabling unprecedented insights into the evolutionary processes and molecular bases of adaptation. This Review summarizes methods that apply and integrate omics tools to experimentally investigate, monitor and predict how species and communities in the wild cope with global climate change, which is by genetically adapting to new environmental conditions, through range shifts or through phenotypic plasticity. We identify advantages and limitations of each method and discuss future research avenues that would improve our understanding of species' evolutionary responses to global climate change, highlighting the need for holistic, multi-omics approaches to ecosystem monitoring during global climate change.
Collapse
Affiliation(s)
- Louis Bernatchez
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, Quebec, Canada
| | - Anne-Laure Ferchaud
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, Quebec, Canada.
- Parks Canada, Office of the Chief Ecosystem Scientist, Protected Areas Establishment, Quebec City, Quebec, Canada.
| | - Chloé Suzanne Berger
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, Quebec, Canada
| | - Clare J Venney
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, Quebec, Canada
| | - Amanda Xuereb
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, Quebec, Canada
| |
Collapse
|
8
|
Whitehouse LS, Schrider DR. Timesweeper: accurately identifying selective sweeps using population genomic time series. Genetics 2023; 224:iyad084. [PMID: 37157914 PMCID: PMC10324941 DOI: 10.1093/genetics/iyad084] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 07/25/2022] [Accepted: 04/25/2023] [Indexed: 05/10/2023] Open
Abstract
Despite decades of research, identifying selective sweeps, the genomic footprints of positive selection, remains a core problem in population genetics. Of the myriad methods that have been developed to tackle this task, few are designed to leverage the potential of genomic time-series data. This is because in most population genetic studies of natural populations, only a single period of time can be sampled. Recent advancements in sequencing technology, including improvements in extracting and sequencing ancient DNA, have made repeated samplings of a population possible, allowing for more direct analysis of recent evolutionary dynamics. Serial sampling of organisms with shorter generation times has also become more feasible due to improvements in the cost and throughput of sequencing. With these advances in mind, here we present Timesweeper, a fast and accurate convolutional neural network-based tool for identifying selective sweeps in data consisting of multiple genomic samplings of a population over time. Timesweeper analyzes population genomic time-series data by first simulating training data under a demographic model appropriate for the data of interest, training a one-dimensional convolutional neural network on said simulations, and inferring which polymorphisms in this serialized data set were the direct target of a completed or ongoing selective sweep. We show that Timesweeper is accurate under multiple simulated demographic and sampling scenarios, identifies selected variants with high resolution, and estimates selection coefficients more accurately than existing methods. In sum, we show that more accurate inferences about natural selection are possible when genomic time-series data are available; such data will continue to proliferate in coming years due to both the sequencing of ancient samples and repeated samplings of extant populations with faster generation times, as well as experimentally evolved populations where time-series data are often generated. Methodological advances such as Timesweeper thus have the potential to help resolve the controversy over the role of positive selection in the genome. We provide Timesweeper as a Python package for use by the community.
Collapse
Affiliation(s)
- Logan S Whitehouse
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27514, USA
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27514, USA
| |
Collapse
|
9
|
Terbot JW, Johri P, Liphardt SW, Soni V, Pfeifer SP, Cooper BS, Good JM, Jensen JD. Developing an appropriate evolutionary baseline model for the study of SARS-CoV-2 patient samples. PLoS Pathog 2023; 19:e1011265. [PMID: 37018331 PMCID: PMC10075409 DOI: 10.1371/journal.ppat.1011265] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2023] Open
Abstract
Over the past 3 years, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has spread through human populations in several waves, resulting in a global health crisis. In response, genomic surveillance efforts have proliferated in the hopes of tracking and anticipating the evolution of this virus, resulting in millions of patient isolates now being available in public databases. Yet, while there is a tremendous focus on identifying newly emerging adaptive viral variants, this quantification is far from trivial. Specifically, multiple co-occurring and interacting evolutionary processes are constantly in operation and must be jointly considered and modeled in order to perform accurate inference. We here outline critical individual components of such an evolutionary baseline model-mutation rates, recombination rates, the distribution of fitness effects, infection dynamics, and compartmentalization-and describe the current state of knowledge pertaining to the related parameters of each in SARS-CoV-2. We close with a series of recommendations for future clinical sampling, model construction, and statistical analysis.
Collapse
Affiliation(s)
- John W Terbot
- University of Montana, Division of Biological Sciences, Missoula, Montana, United States of America
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine, Tempe, Arizona, United States of America
| | - Parul Johri
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine, Tempe, Arizona, United States of America
| | - Schuyler W Liphardt
- University of Montana, Division of Biological Sciences, Missoula, Montana, United States of America
| | - Vivak Soni
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine, Tempe, Arizona, United States of America
| | - Susanne P Pfeifer
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine, Tempe, Arizona, United States of America
| | - Brandon S Cooper
- University of Montana, Division of Biological Sciences, Missoula, Montana, United States of America
| | - Jeffrey M Good
- University of Montana, Division of Biological Sciences, Missoula, Montana, United States of America
| | - Jeffrey D Jensen
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine, Tempe, Arizona, United States of America
| |
Collapse
|
10
|
Shimagaki K, Barton JP. Bézier interpolation improves the inference of dynamical models from data. Phys Rev E 2023; 107:024116. [PMID: 36932614 PMCID: PMC10027371 DOI: 10.1103/physreve.107.024116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Accepted: 01/23/2023] [Indexed: 06/18/2023]
Abstract
Many dynamical systems, from quantum many-body systems to evolving populations to financial markets, are described by stochastic processes. Parameters characterizing such processes can often be inferred using information integrated over stochastic paths. However, estimating time-integrated quantities from real data with limited time resolution is challenging. Here, we propose a framework for accurately estimating time-integrated quantities using Bézier interpolation. We applied our approach to two dynamical inference problems: Determining fitness parameters for evolving populations and inferring forces driving Ornstein-Uhlenbeck processes. We found that Bézier interpolation reduces the estimation bias for both dynamical inference problems. This improvement was especially noticeable for data sets with limited time resolution. Our method could be broadly applied to improve accuracy for other dynamical inference problems using finitely sampled data.
Collapse
Affiliation(s)
- Kai Shimagaki
- Department of Physics and Astronomy, University of California, Riverside, California 92521, USA
| | - John P. Barton
- Department of Physics and Astronomy, University of California, Riverside, California 92521, USA
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania 15213, USA
| |
Collapse
|
11
|
Barata C, Borges R, Kosiol C. Bait-ER: A Bayesian method to detect targets of selection in Evolve-and-Resequence experiments. J Evol Biol 2023; 36:29-44. [PMID: 36544394 PMCID: PMC10108205 DOI: 10.1111/jeb.14134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Revised: 11/09/2022] [Accepted: 11/11/2022] [Indexed: 12/24/2022]
Abstract
For over a decade, experimental evolution has been combined with high-throughput sequencing techniques. In so-called Evolve-and-Resequence (E&R) experiments, populations are kept in the laboratory under controlled experimental conditions where their genomes are sampled and allele frequencies monitored. However, identifying signatures of adaptation in E&R datasets is far from trivial, and it is still necessary to develop more efficient and statistically sound methods for detecting selection in genome-wide data. Here, we present Bait-ER - a fully Bayesian approach based on the Moran model of allele evolution to estimate selection coefficients from E&R experiments. The model has overlapping generations, a feature that describes several experimental designs found in the literature. We tested our method under several different demographic and experimental conditions to assess its accuracy and precision, and it performs well in most scenarios. Nevertheless, some care must be taken when analysing trajectories where drift largely dominates and starting frequencies are low. We compare our method with other available software and report that ours has generally high accuracy even for trajectories whose complexity goes beyond a classical sweep model. Furthermore, our approach avoids the computational burden of simulating an empirical null distribution, outperforming available software in terms of computational time and facilitating its use on genome-wide data. We implemented and released our method in a new open-source software package that can be accessed at https://doi.org/10.5281/zenodo.7351736.
Collapse
Affiliation(s)
- Carolina Barata
- Centre for Biological Diversity, University of St Andrews, St Andrews, UK
| | - Rui Borges
- Institute of Population Genetics, Wien, Austria
| | - Carolin Kosiol
- Centre for Biological Diversity, University of St Andrews, St Andrews, UK.,Institute of Population Genetics, Wien, Austria
| |
Collapse
|
12
|
Sohail MS, Louie RHY, Hong Z, Barton JP, McKay MR. Inferring Epistasis from Genetic Time-series Data. Mol Biol Evol 2022; 39:6710201. [PMID: 36130322 PMCID: PMC9558069 DOI: 10.1093/molbev/msac199] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Epistasis refers to fitness or functional effects of mutations that depend on the sequence background in which these mutations arise. Epistasis is prevalent in nature, including populations of viruses, bacteria, and cancers, and can contribute to the evolution of drug resistance and immune escape. However, it is difficult to directly estimate epistatic effects from sampled observations of a population. At present, there are very few methods that can disentangle the effects of selection (including epistasis), mutation, recombination, genetic drift, and genetic linkage in evolving populations. Here we develop a method to infer epistasis, along with the fitness effects of individual mutations, from observed evolutionary histories. Simulations show that we can accurately infer pairwise epistatic interactions provided that there is sufficient genetic diversity in the data. Our method also allows us to identify which fitness parameters can be reliably inferred from a particular data set and which ones are unidentifiable. Our approach therefore allows for the inference of more complex models of selection from time-series genetic data, while also quantifying uncertainty in the inferred parameters.
Collapse
Affiliation(s)
- Muhammad Saqib Sohail
- Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong SAR, People’s Republic of China
| | - Raymond H Y Louie
- The Kirby Institute, University of New South Wales, Sydney, New South Wales, Australia
| | - Zhenchen Hong
- Department of Physics and Astronomy, University of California, Riverside, CA, USA
| | | | | |
Collapse
|
13
|
Stern DB, Anderson NW, Diaz JA, Lee CE. Genome-wide signatures of synergistic epistasis during parallel adaptation in a Baltic Sea copepod. Nat Commun 2022; 13:4024. [PMID: 35821220 PMCID: PMC9276764 DOI: 10.1038/s41467-022-31622-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Accepted: 06/27/2022] [Indexed: 01/01/2023] Open
Abstract
The role of epistasis in driving adaptation has remained an unresolved problem dating back to the Evolutionary Synthesis. In particular, whether epistatic interactions among genes could promote parallel evolution remains unexplored. To address this problem, we employ an Evolve and Resequence (E&R) experiment, using the copepod Eurytemora affinis, to elucidate the evolutionary genomic response to rapid salinity decline. Rapid declines in coastal salinity at high latitudes are a predicted consequence of global climate change. Based on time-resolved pooled whole-genome sequencing, we uncover a remarkably parallel, polygenic response across ten replicate selection lines, with 79.4% of selected alleles shared between lines by the tenth generation of natural selection. Using extensive computer simulations of our experiment conditions, we find that this polygenic parallelism is consistent with positive synergistic epistasis among alleles, far more so than other mechanisms tested. Our study provides experimental and theoretical support for a novel mechanism promoting repeatable polygenic adaptation, a phenomenon that may be common for selection on complex physiological traits.
Collapse
Affiliation(s)
- David B Stern
- Department of Integrative Biology, University of Wisconsin-Madison, 430 Lincoln Drive, Birge Hall, Madison, WI, 53706, USA.
- National Biodefense Analysis and Countermeasures Center (NBACC), Operated by Battelle National Biodefense Institute (BNBI) for the U.S. Department of Homeland Security Science and Technology Directorate, Fort Detrick, MD, 21702, USA.
| | - Nathan W Anderson
- Department of Integrative Biology, University of Wisconsin-Madison, 430 Lincoln Drive, Birge Hall, Madison, WI, 53706, USA
| | - Juanita A Diaz
- Department of Integrative Biology, University of Wisconsin-Madison, 430 Lincoln Drive, Birge Hall, Madison, WI, 53706, USA
| | - Carol Eunmi Lee
- Department of Integrative Biology, University of Wisconsin-Madison, 430 Lincoln Drive, Birge Hall, Madison, WI, 53706, USA.
| |
Collapse
|
14
|
Amandine C, Ebert D, Stukenbrock E, Rodríguez de la Vega RC, Tiffin P, Croll D, Tellier A. Unraveling coevolutionary dynamics using ecological genomics. Trends Genet 2022; 38:1003-1012. [PMID: 35715278 DOI: 10.1016/j.tig.2022.05.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Revised: 05/08/2022] [Accepted: 05/10/2022] [Indexed: 11/27/2022]
Abstract
Coevolutionary interactions, from the delicate co-dependency in mutualistic interactions to the antagonistic relationship of hosts and parasites, are a ubiquitous driver of adaptation. Surprisingly, little is known about the genomic processes underlying coevolution in an ecological context. However, species comprise genetically differentiated populations that interact with temporally variable abiotic and biotic environments. We discuss the recent advances in coevolutionary theory and genomics as well as shortcomings, to identify coevolving genes that take into account this spatial and temporal variability of coevolution, and propose a practical guide to understand the dynamic of coevolution using an ecological genomics lens.
Collapse
Affiliation(s)
- Cornille Amandine
- Université Paris Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190 Gif-sur-Yvette, France.
| | - Dieter Ebert
- Department of Environmental Sciences, Zoology, University of Basel, Vesalgasse 1, 4051 Basel, Switzerland
| | - Eva Stukenbrock
- Max Planck Institute for Terrestrial Microbiology, Max Planck Research Group, Fungal Biodiversity, Marburg, Germany
| | | | - Peter Tiffin
- Department of Plant and Microbial Biology, 250 Biological Sciences, 1445 Gortner Ave., University of Minnesota, Saint Paul, MN 55108, USA
| | - Daniel Croll
- Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchâtel, 2000 Neuchâtel, Switzerland.
| | - Aurélien Tellier
- Population Genetics, Department of Life Science Systems, Technical University of Munich, Liesel-Beckman-Str. 2, 85354 Freising, Germany.
| |
Collapse
|
15
|
Shim H. Investigating the Genomic Background of CRISPR-Cas Genomes for CRISPR-Based Antimicrobials. Evol Bioinform Online 2022; 18:11769343221103887. [PMID: 35692726 PMCID: PMC9185011 DOI: 10.1177/11769343221103887] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 05/05/2022] [Indexed: 12/01/2022] Open
Abstract
CRISPR-Cas systems are an adaptive immunity that protects prokaryotes against foreign genetic elements. Genetic templates acquired during past infection events enable DNA-interacting enzymes to recognize foreign DNA for destruction. Due to the programmability and specificity of these genetic templates, CRISPR-Cas systems are potential alternative antibiotics that can be engineered to self-target antimicrobial resistance genes on the chromosome or plasmid. However, several fundamental questions remain to repurpose these tools against drug-resistant bacteria. For endogenous CRISPR-Cas self-targeting, antimicrobial resistance genes and functional CRISPR-Cas systems have to co-occur in the target cell. Furthermore, these tools have to outplay DNA repair pathways that respond to the nuclease activities of Cas proteins, even for exogenous CRISPR-Cas delivery. Here, we conduct a comprehensive survey of CRISPR-Cas genomes. First, we address the co-occurrence of CRISPR-Cas systems and antimicrobial resistance genes in the CRISPR-Cas genomes. We show that the average number of these genes varies greatly by the CRISPR-Cas type, and some CRISPR-Cas types (IE and IIIA) have over 20 genes per genome. Next, we investigate the DNA repair pathways of these CRISPR-Cas genomes, revealing that the diversity and frequency of these pathways differ by the CRISPR-Cas type. The interplay between CRISPR-Cas systems and DNA repair pathways is essential for the acquisition of new spacers in CRISPR arrays. We conduct simulation studies to demonstrate that the efficiency of these DNA repair pathways may be inferred from the time-series patterns in the RNA structure of CRISPR repeats. This bioinformatic survey of CRISPR-Cas genomes elucidates the necessity to consider multifaceted interactions between different genes and systems, to design effective CRISPR-based antimicrobials that can specifically target drug-resistant bacteria in natural microbial communities.
Collapse
Affiliation(s)
- Hyunjin Shim
- Center for Biosystems and Biotech Data Science,
Ghent University Global Campus, Incheon, South Korea
| |
Collapse
|
16
|
Johri P, Aquadro CF, Beaumont M, Charlesworth B, Excoffier L, Eyre-Walker A, Keightley PD, Lynch M, McVean G, Payseur BA, Pfeifer SP, Stephan W, Jensen JD. Recommendations for improving statistical inference in population genomics. PLoS Biol 2022; 20:e3001669. [PMID: 35639797 PMCID: PMC9154105 DOI: 10.1371/journal.pbio.3001669] [Citation(s) in RCA: 48] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
The field of population genomics has grown rapidly in response to the recent advent of affordable, large-scale sequencing technologies. As opposed to the situation during the majority of the 20th century, in which the development of theoretical and statistical population genetic insights outpaced the generation of data to which they could be applied, genomic data are now being produced at a far greater rate than they can be meaningfully analyzed and interpreted. With this wealth of data has come a tendency to focus on fitting specific (and often rather idiosyncratic) models to data, at the expense of a careful exploration of the range of possible underlying evolutionary processes. For example, the approach of directly investigating models of adaptive evolution in each newly sequenced population or species often neglects the fact that a thorough characterization of ubiquitous nonadaptive processes is a prerequisite for accurate inference. We here describe the perils of these tendencies, present our consensus views on current best practices in population genomic data analysis, and highlight areas of statistical inference and theory that are in need of further attention. Thereby, we argue for the importance of defining a biologically relevant baseline model tuned to the details of each new analysis, of skepticism and scrutiny in interpreting model fitting results, and of carefully defining addressable hypotheses and underlying uncertainties.
Collapse
Affiliation(s)
- Parul Johri
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | - Charles F. Aquadro
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, United States of America
| | - Mark Beaumont
- School of Biological Sciences, University of Bristol, Bristol, United Kingdom
| | - Brian Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Laurent Excoffier
- Institute of Ecology and Evolution, University of Berne, Berne, Switzerland
| | - Adam Eyre-Walker
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Peter D. Keightley
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Michael Lynch
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | - Gil McVean
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Bret A. Payseur
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Susanne P. Pfeifer
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | | | - Jeffrey D. Jensen
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| |
Collapse
|
17
|
Avecilla G, Chuong JN, Li F, Sherlock G, Gresham D, Ram Y. Neural networks enable efficient and accurate simulation-based inference of evolutionary parameters from adaptation dynamics. PLoS Biol 2022; 20:e3001633. [PMID: 35622868 PMCID: PMC9140244 DOI: 10.1371/journal.pbio.3001633] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 04/14/2022] [Indexed: 11/24/2022] Open
Abstract
The rate of adaptive evolution depends on the rate at which beneficial mutations are introduced into a population and the fitness effects of those mutations. The rate of beneficial mutations and their expected fitness effects is often difficult to empirically quantify. As these 2 parameters determine the pace of evolutionary change in a population, the dynamics of adaptive evolution may enable inference of their values. Copy number variants (CNVs) are a pervasive source of heritable variation that can facilitate rapid adaptive evolution. Previously, we developed a locus-specific fluorescent CNV reporter to quantify CNV dynamics in evolving populations maintained in nutrient-limiting conditions using chemostats. Here, we use CNV adaptation dynamics to estimate the rate at which beneficial CNVs are introduced through de novo mutation and their fitness effects using simulation-based likelihood-free inference approaches. We tested the suitability of 2 evolutionary models: a standard Wright-Fisher model and a chemostat model. We evaluated 2 likelihood-free inference algorithms: the well-established Approximate Bayesian Computation with Sequential Monte Carlo (ABC-SMC) algorithm, and the recently developed Neural Posterior Estimation (NPE) algorithm, which applies an artificial neural network to directly estimate the posterior distribution. By systematically evaluating the suitability of different inference methods and models, we show that NPE has several advantages over ABC-SMC and that a Wright-Fisher evolutionary model suffices in most cases. Using our validated inference framework, we estimate the CNV formation rate at the GAP1 locus in the yeast Saccharomyces cerevisiae to be 10-4.7 to 10-4 CNVs per cell division and a fitness coefficient of 0.04 to 0.1 per generation for GAP1 CNVs in glutamine-limited chemostats. We experimentally validated our inference-based estimates using 2 distinct experimental methods-barcode lineage tracking and pairwise fitness assays-which provide independent confirmation of the accuracy of our approach. Our results are consistent with a beneficial CNV supply rate that is 10-fold greater than the estimated rates of beneficial single-nucleotide mutations, explaining the outsized importance of CNVs in rapid adaptive evolution. More generally, our study demonstrates the utility of novel neural network-based likelihood-free inference methods for inferring the rates and effects of evolutionary processes from empirical data with possible applications ranging from tumor to viral evolution.
Collapse
Affiliation(s)
- Grace Avecilla
- Department of Biology, New York University, New York, New York, United States of America
- Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
| | - Julie N. Chuong
- Department of Biology, New York University, New York, New York, United States of America
- Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
| | - Fangfei Li
- Department of Genetics, Stanford University, California, Stanford, United States of America
| | - Gavin Sherlock
- Department of Genetics, Stanford University, California, Stanford, United States of America
| | - David Gresham
- Department of Biology, New York University, New York, New York, United States of America
- Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
| | - Yoav Ram
- School of Zoology, Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
18
|
Gossmann TI, Waxman D. Correcting Bias in Allele Frequency Estimates Due to an Observation Threshold: A Markov Chain Analysis. Genome Biol Evol 2022; 14:evac047. [PMID: 35349695 PMCID: PMC9016752 DOI: 10.1093/gbe/evac047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/23/2022] [Indexed: 11/30/2022] Open
Abstract
There are many problems in biology and related disciplines involving stochasticity, where a signal can only be detected when it lies above a threshold level, while signals lying below threshold are simply not detected. A consequence is that the detected signal is conditioned to lie above threshold, and is not representative of the actual signal. In this work, we present some general results for the conditioning that occurs due to the existence of such an observational threshold. We show that this conditioning is relevant, for example, to gene-frequency trajectories, where many loci in the genome are simultaneously measured in a given generation. Such a threshold can lead to severe biases of allele frequency estimates under purifying selection. In the analysis presented, within the context of Markov chains such as the Wright-Fisher model, we address two key questions: (1) "What is a natural measure of the strength of the conditioning associated with an observation threshold?" (2) "What is a principled way to correct for the effects of the conditioning?". We answer the first question in terms of a proportion. Starting with a large number of trajectories, the relevant quantity is the proportion of these trajectories that are above threshold at a later time and hence are detected. The smaller the value of this proportion, the stronger the effects of conditioning. We provide an approximate analytical answer to the second question, that corrects the bias produced by an observation threshold, and performs to reasonable accuracy in the Wright-Fisher model for biologically plausible parameter values.
Collapse
Affiliation(s)
- Toni I. Gossmann
- Department of Evolutionary Genetics, Bielefeld University, Konsequenz 45, 33501 Bielefeld, Germany
- Berlin Institute for Advanced Study, Wallotstrasse 19, 14193 Berlin, Germany
| | - David Waxman
- Centre for Computational Systems Biology, ISTBI, Fudan University, 220 Handan Road, Shanghai 20433, People’s Republic of China
| |
Collapse
|
19
|
Deffner D, Kandler A, Fogarty L. Effective population size for culturally evolving traits. PLoS Comput Biol 2022; 18:e1009430. [PMID: 35395004 PMCID: PMC9020689 DOI: 10.1371/journal.pcbi.1009430] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 04/20/2022] [Accepted: 02/21/2022] [Indexed: 11/18/2022] Open
Abstract
Population size has long been considered an important driver of cultural diversity and complexity. Results from population genetics, however, demonstrate that in populations with complex demographic structure or mode of inheritance, it is not the census population size, N, but the effective size of a population, Ne, that determines important evolutionary parameters. Here, we examine the concept of effective population size for traits that evolve culturally, through processes of innovation and social learning. We use mathematical and computational modeling approaches to investigate how cultural Ne and levels of diversity depend on (1) the way traits are learned, (2) population connectedness, and (3) social network structure. We show that one-to-many and frequency-dependent transmission can temporally or permanently lower effective population size compared to census numbers. We caution that migration and cultural exchange can have counter-intuitive effects on Ne. Network density in random networks leaves Ne unchanged, scale-free networks tend to decrease and small-world networks tend to increase Ne compared to census numbers. For one-to-many transmission and different network structures, larger effective sizes are closely associated with higher cultural diversity. For connectedness, however, even small amounts of migration and cultural exchange result in high diversity independently of Ne. Extending previous work, our results highlight the importance of carefully defining effective population size for cultural systems and show that inferring Ne requires detailed knowledge about underlying cultural and demographic processes.
Collapse
Affiliation(s)
- Dominik Deffner
- Department of Human Behavior, Ecology and Culture, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Science of Intelligence Excellence Cluster, Technical University Berlin, Berlin, Germany
- Center for Adaptive Rationality, Max Planck Institute for Human Development, Berlin, Germany
- * E-mail:
| | - Anne Kandler
- Department of Human Behavior, Ecology and Culture, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Laurel Fogarty
- Department of Human Behavior, Ecology and Culture, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| |
Collapse
|
20
|
Morales-Arce AY, Johri P, Jensen JD. Inferring the distribution of fitness effects in patient-sampled and experimental virus populations: two case studies. Heredity (Edinb) 2022; 128:79-87. [PMID: 34987185 PMCID: PMC8728706 DOI: 10.1038/s41437-021-00493-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 12/12/2021] [Accepted: 12/13/2021] [Indexed: 11/19/2022] Open
Abstract
We here propose an analysis pipeline for inferring the distribution of fitness effects (DFE) from either patient-sampled or experimentally-evolved viral populations, that explicitly accounts for non-Wright-Fisher and non-equilibrium population dynamics inherent to pathogens. We examine the performance of this approach via extensive power and performance analyses, and highlight two illustrative applications - one from an experimentally-passaged RNA virus, and the other from a clinically-sampled DNA virus. Finally, we discuss how such DFE inference may shed light on major research questions in virus evolution, ranging from a quantification of the population genetic processes governing genome size, to the role of Hill-Robertson interference in dictating adaptive outcomes, to the potential design of novel therapeutic approaches to eradicate within-patient viral populations via induced mutational meltdown.
Collapse
Affiliation(s)
- Ana Y Morales-Arce
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Parul Johri
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Jeffrey D Jensen
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA.
| |
Collapse
|
21
|
Okada T, Hallatschek O. Dynamic sampling bias and overdispersion induced by skewed offspring distributions. Genetics 2021; 219:6363801. [PMID: 34718557 DOI: 10.1093/genetics/iyab135] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 08/06/2021] [Indexed: 11/14/2022] Open
Abstract
Natural populations often show enhanced genetic drift consistent with a strong skew in their offspring number distribution. The skew arises because the variability of family sizes is either inherently strong or amplified by population expansions. The resulting allele-frequency fluctuations are large and, therefore, challenge standard models of population genetics, which assume sufficiently narrow offspring distributions. While the neutral dynamics backward in time can be readily analyzed using coalescent approaches, we still know little about the effect of broad offspring distributions on the forward-in-time dynamics, especially with selection. Here, we employ an asymptotic analysis combined with a scaling hypothesis to demonstrate that over-dispersed frequency trajectories emerge from the competition of conventional forces, such as selection or mutations, with an emerging time-dependent sampling bias against the minor allele. The sampling bias arises from the characteristic time-dependence of the largest sampled family size within each allelic type. Using this insight, we establish simple scaling relations for allele-frequency fluctuations, fixation probabilities, extinction times, and the site frequency spectra that arise when offspring numbers are distributed according to a power law.
Collapse
Affiliation(s)
- Takashi Okada
- Departments of Physics and Integrative Biology, University of California, Berkeley, CA 94720, USA.,RIKEN iTHEMS, Wako, Saitama 351-0198, Japan
| | - Oskar Hallatschek
- Departments of Physics and Integrative Biology, University of California, Berkeley, CA 94720, USA
| |
Collapse
|
22
|
Kidner J, Theodorou P, Engler JO, Taubert M, Husemann M. A brief history and popularity of methods and tools used to estimate micro-evolutionary forces. Ecol Evol 2021; 11:13723-13743. [PMID: 34707813 PMCID: PMC8525119 DOI: 10.1002/ece3.8076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 07/12/2021] [Accepted: 08/12/2021] [Indexed: 11/30/2022] Open
Abstract
Population genetics is a field of research that predates the current generations of sequencing technology. Those approaches, that were established before massively parallel sequencing methods, have been adapted to these new marker systems (in some cases involving the development of new methods) that allow genome-wide estimates of the four major micro-evolutionary forces-mutation, gene flow, genetic drift, and selection. Nevertheless, classic population genetic markers are still commonly used and a plethora of analysis methods and programs is available for these and high-throughput sequencing (HTS) data. These methods employ various and diverse theoretical and statistical frameworks, to varying degrees of success, to estimate similar evolutionary parameters making it difficult to get a concise overview across the available approaches. Presently, reviews on this topic generally focus on a particular class of methods to estimate one or two evolutionary parameters. Here, we provide a brief history of methods and a comprehensive list of available programs for estimating micro-evolutionary forces. We furthermore analyzed their usage within the research community based on popularity (citation bias) and discuss the implications of this bias for the software community. We found that a few programs received the majority of citations, with program success being independent of both the parameters estimated and the computing platform. The only deviation from a model of exponential growth in the number of citations was found for the presence of a graphical user interface (GUI). Interestingly, no relationship was found for the impact factor of the journals, when the tools were published, suggesting accessibility might be more important than visibility.
Collapse
Affiliation(s)
- Jonathan Kidner
- General Zoology Institute for Biology Martin Luther University Halle-Wittenberg Halle (Saale) Germany
| | - Panagiotis Theodorou
- General Zoology Institute for Biology Martin Luther University Halle-Wittenberg Halle (Saale) Germany
| | - Jan O Engler
- Terrestrial Ecology Unit Department of Biology Ghent University Ghent Belgium
| | - Martin Taubert
- Aquatic Geomicrobiology Institute for Biodiversity Friedrich Schiller University Jena Jena Germany
| | - Martin Husemann
- General Zoology Institute for Biology Martin Luther University Halle-Wittenberg Halle (Saale) Germany
- Centrum für Naturkunde University of Hamburg Hamburg Germany
| |
Collapse
|
23
|
Kepler L, Hamins-Puertolas M, Rasmussen DA. Decomposing the sources of SARS-CoV-2 fitness variation in the United States. Virus Evol 2021; 7:veab073. [PMID: 34642604 PMCID: PMC8499931 DOI: 10.1093/ve/veab073] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 08/13/2021] [Accepted: 08/20/2021] [Indexed: 12/14/2022] Open
Abstract
The fitness of a pathogen is a composite phenotype determined by many different factors influencing growth rates both within and between hosts. Determining what factors shape fitness at the host population-level is especially challenging because both intrinsic factors like pathogen genetics and extrinsic factors such as host behavior influence between-host transmission potential. This challenge has been highlighted by controversy surrounding the population-level fitness effects of mutations in the SARS-CoV-2 genome and their relative importance when compared against non-genetic factors shaping transmission dynamics. Building upon phylodynamic birth-death models, we develop a new framework to learn how hundreds of genetic and non-genetic factors have shaped the fitness of SARS-CoV-2. We estimate the fitness effects of all amino acid variants and several structural variants that have circulated in the United States between February 2020 and March 2021 from viral phylogenies. We also estimate how much fitness variation among pathogen lineages is attributable to genetic versus non-genetic factors such as spatial heterogeneity in transmission rates. Before September 2020, most fitness variation between lineages can be explained by background spatial heterogeneity in transmission rates across geographic regions. Starting in late 2020, genetic variation in fitness increased dramatically with the emergence of several new lineages including B.1.1.7, B.1.427, B.1.429 and B.1.526. Our analysis also indicates that genetic variants in less well-explored genomic regions outside of Spike may be contributing significantly to overall fitness variation in the viral population.
Collapse
Affiliation(s)
- Lenora Kepler
- Bioinformatics Research Center, North Carolina State University, 1 Lampe Drive, Raleigh, NC 27607, USA
| | - Marco Hamins-Puertolas
- Biomathematics Graduate Program, North Carolina State University, Campus Box 8213, Raleigh, NC 27695, USA
| | - David A Rasmussen
- Bioinformatics Research Center, North Carolina State University, 1 Lampe Drive, Raleigh, NC 27607, USA
- Department of Entomology and Plant Pathology, North Carolina State University, Campus Box 7613, Raleigh, NC 27695, USA
| |
Collapse
|
24
|
Gompert Z, Springer A, Brady M, Chaturvedi S, Lucas LK. Genomic time-series data show that gene flow maintains high genetic diversity despite substantial genetic drift in a butterfly species. Mol Ecol 2021; 30:4991-5008. [PMID: 34379852 DOI: 10.1111/mec.16111] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 07/09/2021] [Accepted: 07/19/2021] [Indexed: 11/29/2022]
Abstract
Effective population size affects the efficacy of selection, rate of evolution by drift, and neutral diversity levels. When species are subdivided into multiple populations connected by gene flow, evolutionary processes can depend on global or local effective population sizes. Theory predicts that high levels of diversity might be maintained by gene flow, even very low levels of gene flow, consistent with species long-term effective population size, but tests of this idea are mostly lacking. Here, we show that Lycaeides buttery populations maintain low contemporary (variance) effective population sizes (e.g., ~200 individuals) and thus evolve rapidly by genetic drift. In contrast, populations harbored high levels of genetic diversity consistent with an effective population size several orders of magnitude larger. We hypothesized that the differences in the magnitude and variability of contemporary versus long-term effective population sizes were caused by gene flow of sufficient magnitude to maintain diversity but only subtly affect evolution on generational time scales. Consistent with this hypothesis, we detected low but non-trivial gene flow among populations. Furthermore, using short-term population-genomic time-series data, we documented patterns consistent with predictions from this hypothesis, including a weak but detectable excess of evolutionary change in the direction of the mean (migrant gene pool) allele frequencies across populations, and consistency in the direction of allele frequency change over time. The documented decoupling of diversity levels and short-term change by drift in Lycaeides has implications for our understanding of contemporary evolution and the maintenance of genetic variation in the wild.
Collapse
Affiliation(s)
- Zachariah Gompert
- Department of Biology, Utah State University, Logan, UT, 84322, USA.,Ecology Center, Utah State University, Logan, UT, 84322, USA
| | - Amy Springer
- Department of Biology, Utah State University, Logan, UT, 84322, USA
| | - Megan Brady
- Department of Biology, Utah State University, Logan, UT, 84322, USA
| | - Samridhi Chaturvedi
- Department of Biology, Utah State University, Logan, UT, 84322, USA.,Department of Organismic & Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA
| | - Lauren K Lucas
- Department of Biology, Utah State University, Logan, UT, 84322, USA
| |
Collapse
|
25
|
Baltzegar J, Vella M, Gunning C, Vasquez G, Astete H, Stell F, Fisher M, Scott TW, Lenhart A, Lloyd AL, Morrison A, Gould F. Rapid evolution of knockdown resistance haplotypes in response to pyrethroid selection in Aedes aegypti. Evol Appl 2021; 14:2098-2113. [PMID: 34429751 PMCID: PMC8372076 DOI: 10.1111/eva.13269] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 05/10/2021] [Accepted: 06/03/2021] [Indexed: 11/29/2022] Open
Abstract
This study describes the evolution of knockdown resistance (kdr) haplotypes in Aedes aegypti in response to pyrethroid insecticide use over the course of 18 years in Iquitos, Peru. Based on the duration and intensiveness of sampling (~10,000 samples), this is the most thorough study of kdr population genetics in Ae. aegypti to date within a city. We provide evidence for the direct connection between programmatic citywide pyrethroid spraying and the increase in frequency of specific kdr haplotypes by identifying two evolutionary events in the population. The relatively high selection coefficients, even under infrequent insecticide pressure, emphasize how quickly Ae. aegypti populations can evolve. In our examination of the literature on mosquitoes and other insect pests, we could find no cases where a pest evolved so quickly to so few exposures to low or nonresidual insecticide applications. The observed rapid increase in frequency of resistance alleles might have been aided by the incomplete dominance of resistance-conferring alleles over corresponding susceptibility alleles. In addition to dramatic temporal shifts, spatial suppression experiments reveal that genetic heterogeneity existed not only at the citywide scale, but also on a very fine scale within the city.
Collapse
Affiliation(s)
- Jennifer Baltzegar
- Graduate Program in GeneticsCollege of SciencesNorth Carolina State UniversityRaleighNCUSA
- Genetic Engineering and Society CenterNorth Carolina State UniversityRaleighNCUSA
| | - Michael Vella
- Genetic Engineering and Society CenterNorth Carolina State UniversityRaleighNCUSA
- Biomathematics Graduate Program and Department of MathematicsNorth Carolina State UniversityRaleighNCUSA
| | | | - Gissella Vasquez
- Department of EntomologyU.S. Naval Medical Research Unit. No 6.BellavistaPeru
| | - Helvio Astete
- Department of EntomologyU.S. Naval Medical Research Unit. No 6.BellavistaPeru
| | - Fred Stell
- Department of EntomologyU.S. Naval Medical Research Unit. No 6.BellavistaPeru
| | - Michael Fisher
- Department of EntomologyU.S. Naval Medical Research Unit. No 6.BellavistaPeru
| | - Thomas W. Scott
- Department of Entomology and NematologyUniversity of CaliforniaDavisCAUSA
| | - Audrey Lenhart
- Division of Parasitic Diseases and MalariaCenters for Disease Control and PreventionAtlantaGAUSA
| | - Alun L. Lloyd
- Genetic Engineering and Society CenterNorth Carolina State UniversityRaleighNCUSA
- Biomathematics Graduate Program and Department of MathematicsNorth Carolina State UniversityRaleighNCUSA
| | - Amy Morrison
- Department of EntomologyU.S. Naval Medical Research Unit. No 6.BellavistaPeru
- Department of Entomology and NematologyUniversity of CaliforniaDavisCAUSA
| | - Fred Gould
- Genetic Engineering and Society CenterNorth Carolina State UniversityRaleighNCUSA
- Department of Entomology and Plant PathologyNorth Carolina State UniversityRaleighNCUSA
| |
Collapse
|
26
|
Lynch M, Ho WC. The Limits to Estimating Population-Genetic Parameters with Temporal Data. Genome Biol Evol 2021; 12:443-455. [PMID: 32181820 PMCID: PMC7197491 DOI: 10.1093/gbe/evaa056] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/13/2020] [Indexed: 11/14/2022] Open
Abstract
The ability to obtain genome-wide sequences of very large numbers of individuals from natural populations raises questions about optimal sampling designs and the limits to extracting information on key population-genetic parameters from temporal-survey data. Methods are introduced for evaluating whether observed temporal fluctuations in allele frequencies are consistent with the hypothesis of random genetic drift, and expressions for the expected sampling variances for the relevant statistics are given in terms of sample sizes and numbers. Estimation methods and aspects of statistical reliability are also presented for the mean and temporal variance of selection coefficients. For nucleotide sites that pass the test of neutrality, the current effective population size can be estimated by a method of moments, and expressions for its sampling variance provide insight into the degree to which such methodology can yield meaningful results under alternative sampling schemes. Finally, some caveats are raised regarding the use of the temporal covariance of allele-frequency change to infer selection. Taken together, these results provide a statistical view of the limits to population-genetic inference in even the simplest case of a closed population.
Collapse
Affiliation(s)
- Michael Lynch
- Biodesign Center for Mechanisms of Evolution, Arizona State University
| | - Wei-Chin Ho
- Biodesign Center for Mechanisms of Evolution, Arizona State University
| |
Collapse
|
27
|
Rowan TN, Durbin HJ, Seabury CM, Schnabel RD, Decker JE. Powerful detection of polygenic selection and evidence of environmental adaptation in US beef cattle. PLoS Genet 2021; 17:e1009652. [PMID: 34292938 PMCID: PMC8297814 DOI: 10.1371/journal.pgen.1009652] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 06/09/2021] [Indexed: 12/19/2022] Open
Abstract
Selection on complex traits can rapidly drive evolution, especially in stressful environments. This polygenic selection does not leave intense sweep signatures on the genome, rather many loci experience small allele frequency shifts, resulting in large cumulative phenotypic changes. Directional selection and local adaptation are changing populations; but, identifying loci underlying polygenic or environmental selection has been difficult. We use genomic data on tens of thousands of cattle from three populations, distributed over time and landscapes, in linear mixed models with novel dependent variables to map signatures of selection on complex traits and local adaptation. We identify 207 genomic loci associated with an animal's birth date, representing ongoing selection for monogenic and polygenic traits. Additionally, hundreds of additional loci are associated with continuous and discrete environments, providing evidence for historical local adaptation. These candidate loci highlight the nervous system's possible role in local adaptation. While advanced technologies have increased the rate of directional selection in cattle, it has likely been at the expense of historically generated local adaptation, which is especially problematic in changing climates. When applied to large, diverse cattle datasets, these selection mapping methods provide an insight into how selection on complex traits continually shapes the genome. Further, understanding the genomic loci implicated in adaptation may help us breed more adapted and efficient cattle, and begin to understand the basis for mammalian adaptation, especially in changing climates. These selection mapping approaches help clarify selective forces and loci in evolutionary, model, and agricultural contexts.
Collapse
Affiliation(s)
- Troy N. Rowan
- Division of Animal Sciences, University of Missouri, Columbia, Missouri, United States of America
- Genetics Area Program, University of Missouri, Columbia, Missouri, United States of America
- Department of Animal Science, University of Tennessee, Knoxville, Tennessee, United States of America
- College of Veterinary Medicine, Large Animal Clinical Science, University of Tennessee, Knoxville, Tennessee, United States of America
| | - Harly J. Durbin
- Division of Animal Sciences, University of Missouri, Columbia, Missouri, United States of America
- Genetics Area Program, University of Missouri, Columbia, Missouri, United States of America
| | - Christopher M. Seabury
- Department of Veterinary Pathobiology, Texas A&M University, College Station, Texas, United States of America
| | - Robert D. Schnabel
- Division of Animal Sciences, University of Missouri, Columbia, Missouri, United States of America
- Genetics Area Program, University of Missouri, Columbia, Missouri, United States of America
- Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri, United States of America
| | - Jared E. Decker
- Division of Animal Sciences, University of Missouri, Columbia, Missouri, United States of America
- Genetics Area Program, University of Missouri, Columbia, Missouri, United States of America
- Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri, United States of America
| |
Collapse
|
28
|
Zhu H, Allman BE, Koelle K. Fitness Estimation for Viral Variants in the Context of Cellular Coinfection. Viruses 2021; 13:v13071216. [PMID: 34201862 PMCID: PMC8310006 DOI: 10.3390/v13071216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 06/16/2021] [Accepted: 06/18/2021] [Indexed: 11/16/2022] Open
Abstract
Animal models are frequently used to characterize the within-host dynamics of emerging zoonotic viruses. More recent studies have also deep-sequenced longitudinal viral samples originating from experimental challenges to gain a better understanding of how these viruses may evolve in vivo and between transmission events. These studies have often identified nucleotide variants that can replicate more efficiently within hosts and also transmit more effectively between hosts. Quantifying the degree to which a mutation impacts viral fitness within a host can improve identification of variants that are of particular epidemiological concern and our ability to anticipate viral adaptation at the population level. While methods have been developed to quantify the fitness effects of mutations using observed changes in allele frequencies over the course of a host’s infection, none of the existing methods account for the possibility of cellular coinfection. Here, we develop mathematical models to project variant allele frequency changes in the context of cellular coinfection and, further, integrate these models with statistical inference approaches to demonstrate how variant fitness can be estimated alongside cellular multiplicity of infection. We apply our approaches to empirical longitudinally sampled H5N1 sequence data from ferrets. Our results indicate that previous studies may have significantly underestimated the within-host fitness advantage of viral variants. These findings underscore the importance of considering the process of cellular coinfection when studying within-host viral evolutionary dynamics.
Collapse
Affiliation(s)
- Huisheng Zhu
- Department of Biology, Emory University, Atlanta, GA 30322, USA;
| | - Brent E. Allman
- Graduate Program in Population Biology, Ecology, and Evolution, Emory University, Atlanta, GA 30322, USA;
| | - Katia Koelle
- Department of Biology, Emory University, Atlanta, GA 30322, USA;
- Emory-UGA Center of Excellence for Influenza Research and Surveillance (CEIRS), Atlanta, GA 30322, USA
- Correspondence:
| |
Collapse
|
29
|
Blanckaert A, Payseur BA. Finding hybrid incompatibilities using genome sequences from hybrid populations. Mol Biol Evol 2021; 38:4616-4627. [PMID: 34097068 PMCID: PMC8476132 DOI: 10.1093/molbev/msab168] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Natural hybrid zones offer a powerful framework for understanding the genetic basis of speciation in progress because ongoing hybridization continually creates unfavorable gene combinations. Evidence indicates that postzygotic reproductive isolation is often caused by epistatic interactions between mutations in different genes that evolved independently of one another (hybrid incompatibilities). We examined the potential to detect epistatic selection against incompatibilities from genome sequence data using the site frequency spectrum (SFS) of polymorphisms by conducting individual-based simulations in SLiM. We found that the genome-wide SFS in hybrid populations assumes a diagnostic shape, with the continual input of fixed differences between source populations via migration inducing a mass at intermediate allele frequency. Epistatic selection locally distorts the SFS as non-incompatibility alleles rise in frequency in a manner analogous to a selective sweep. Building on these results, we present a statistical method to identify genomic regions containing incompatibility loci that locates departures in the local SFS compared with the genome-wide SFS. Cross-validation studies demonstrate that our method detects recessive and codominant incompatibilities across a range of scenarios varying in the strength of epistatic selection, migration rate, and hybrid zone age. Our approach takes advantage of whole genome sequence data, does not require knowledge of demographic history, and can be applied to any pair of nascent species that forms a hybrid zone.
Collapse
Affiliation(s)
- Alexandre Blanckaert
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI 53706, United States
| | - Bret A Payseur
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI 53706, United States
| |
Collapse
|
30
|
van Spelde AM, Schroeder H, Kjellström A, Lidén K. Approaches to osteoporosis in paleopathology: How did methodology shape bone loss research? INTERNATIONAL JOURNAL OF PALEOPATHOLOGY 2021; 33:245-257. [PMID: 34044198 DOI: 10.1016/j.ijpp.2021.05.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 05/03/2021] [Accepted: 05/03/2021] [Indexed: 06/12/2023]
Abstract
OBJECTIVE This paper will review how different methods employed to study bone loss in the past were used to explore different questions and aspects of bone loss, how methodology has changed over time, and how these different approaches have informed our understanding of bone loss in the past. MATERIALS AND METHODS A review and discussion is conducted on research protocols and results of 84 paleopathology publications on bone loss in archaeological skeletal collections published between 1969 and 2021. CONCLUSIONS The variety in research protocols confounds accurate meta-analysis of previously published research; however, more recent publications incorporate a combination of bone mass and bone quality based methods. Biased sample selection has resulted in a predominance of European and Medieval publications, limiting more general observations on bone loss in the past. Collection of dietary or paleopathological covariables is underemployed in the effort to interpret bone loss patterns. SIGNIFICANCE Paleopathology publications have demonstrated differences in bone loss between distinct archaeological populations, between sex and age groups, and have suggested factors underlying observed differences. However, a lack of a gold standard has encouraged the use of a wide range of methods. Understanding how this array of methods effects results is crucial in contextualizing our knowledge of bone loss in the past. LIMITATIONS The development of a research protocol is also influenced by available expertise, available equipment, restrictions imposed by the curator, and site-specific taphonomic aspects. These factors will likely continue to cause (minor) biases even if a best practice can be established. SUGGESTIONS FOR FUTURE RESEARCH Greater effort to develop uniform terminology and operational definitions of osteoporosis in skeletal remains, as well as the expansion of time scale and geographical areas studied. The Next-Generation Sequencing revolution has also opened up the possibility of ancient DNA analyses to study genetic predisposition to bone loss in the past.
Collapse
Affiliation(s)
- Anne-Marijn van Spelde
- Archaeological Research Laboratory, Department of Archaeology and Classical Studies, Stockholm University, Lilla Frescativägen 7, 114 18 Stockholm, Sweden; The Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Øster Farimagsgade 5, 1353 Copenhagen, Denmark.
| | - Hannes Schroeder
- The Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Øster Farimagsgade 5, 1353 Copenhagen, Denmark
| | - Anna Kjellström
- Osteological Research Laboratory, Department of Archaeology and Classical Studies, Stockholm University, Lilla Frescativägen 7, 114 18 Stockholm, Sweden
| | - Kerstin Lidén
- Archaeological Research Laboratory, Department of Archaeology and Classical Studies, Stockholm University, Lilla Frescativägen 7, 114 18 Stockholm, Sweden
| |
Collapse
|
31
|
Sohail MS, Louie RHY, McKay MR, Barton JP. MPL resolves genetic linkage in fitness inference from complex evolutionary histories. Nat Biotechnol 2021; 39:472-479. [PMID: 33257862 PMCID: PMC8044047 DOI: 10.1038/s41587-020-0737-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 10/14/2020] [Indexed: 12/13/2022]
Abstract
Genetic linkage causes the fate of new mutations in a population to be contingent on the genetic background on which they appear. This makes it challenging to identify how individual mutations affect fitness. To overcome this challenge, we developed marginal path likelihood (MPL), a method to infer selection from evolutionary histories that resolves genetic linkage. Validation on real and simulated data sets shows that MPL is fast and accurate, outperforming existing inference approaches. We found that resolving linkage is crucial for accurately quantifying selection in complex evolving populations, which we demonstrate through a quantitative analysis of intrahost HIV-1 evolution using multiple patient data sets. Linkage effects generated by variants that sweep rapidly through the population are particularly strong, extending far across the genome. Taken together, our results argue for the importance of resolving linkage in studies of natural selection.
Collapse
Affiliation(s)
- Muhammad Saqib Sohail
- Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong, China
| | - Raymond H Y Louie
- Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong, China
- Institute for Advanced Study, Hong Kong University of Science and Technology, Hong Kong, China
- The Kirby Institute, University of New South Wales, Sydney, New South Wales, Australia
- School of Medical Sciences, University of New South Wales, Sydney, New South Wales, Australia
| | - Matthew R McKay
- Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong, China.
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Hong Kong, China.
| | - John P Barton
- Department of Physics and Astronomy, University of California, Riverside, Riverside, CA, USA.
| |
Collapse
|
32
|
Fortes-Lima CA, Laurent R, Thouzeau V, Toupance B, Verdu P. Complex genetic admixture histories reconstructed with Approximate Bayesian Computation. Mol Ecol Resour 2021; 21:1098-1117. [PMID: 33452723 PMCID: PMC8247995 DOI: 10.1111/1755-0998.13325] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Revised: 12/11/2020] [Accepted: 01/07/2021] [Indexed: 01/19/2023]
Abstract
Admixture is a fundamental evolutionary process that has influenced genetic patterns in numerous species. Maximum‐likelihood approaches based on allele frequencies and linkage‐disequilibrium have been extensively used to infer admixture processes from genome‐wide data sets, mostly in human populations. Nevertheless, complex admixture histories, beyond one or two pulses of admixture, remain methodologically challenging to reconstruct. We developed an Approximate Bayesian Computation (ABC) framework to reconstruct highly complex admixture histories from independent genetic markers. We built the software package methis to simulate independent SNPs or microsatellites in a two‐way admixed population for scenarios with multiple admixture pulses, monotonically decreasing or increasing recurring admixture, or combinations of these scenarios. methis allows users to draw model‐parameter values from prior distributions set by the user, and, for each simulation, methis can calculate numerous summary statistics describing genetic diversity patterns and moments of the distribution of individual admixture fractions. We coupled methis with existing machine‐learning ABC algorithms and investigated the admixture history of admixed populations. Results showed that random forest ABC scenario‐choice could accurately distinguish among most complex admixture scenarios, and errors were mainly found in regions of the parameter space where scenarios were highly nested, and, thus, biologically similar. We focused on African American and Barbadian populations as two study‐cases. We found that neural network ABC posterior parameter estimation was accurate and reasonably conservative under complex admixture scenarios. For both admixed populations, we found that monotonically decreasing contributions over time, from Europe and Africa, explained the observed data more accurately than multiple admixture pulses. This approach will allow for reconstructing detailed admixture histories when maximum‐likelihood methods are intractable.
Collapse
Affiliation(s)
- Cesar A Fortes-Lima
- UMR7206 Eco-anthropologie, CNRS, Muséum National d'Histoire Naturelle, Université de Paris, Paris, France.,Sub-department of Human Evolution, Department of Organismal Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Romain Laurent
- UMR7206 Eco-anthropologie, CNRS, Muséum National d'Histoire Naturelle, Université de Paris, Paris, France
| | - Valentin Thouzeau
- UMR7534 Centre de Recherche en Mathématiques de la Décision, CNRS, Université Paris-Dauphine, PSL University, Paris, France.,Laboratoire de Sciences Cognitives et Psycholinguistique, Département d'Etudes Cognitives, ENS, PSL University, EHESS, CNRS, Paris, France
| | - Bruno Toupance
- UMR7206 Eco-anthropologie, CNRS, Muséum National d'Histoire Naturelle, Université de Paris, Paris, France
| | - Paul Verdu
- UMR7206 Eco-anthropologie, CNRS, Muséum National d'Histoire Naturelle, Université de Paris, Paris, France
| |
Collapse
|
33
|
He Z, Dai X, Beaumont M, Yu F. Detecting and Quantifying Natural Selection at Two Linked Loci from Time Series Data of Allele Frequencies with Forward-in-Time Simulations. Genetics 2020; 216:521-541. [PMID: 32826299 PMCID: PMC7536848 DOI: 10.1534/genetics.120.303463] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Accepted: 08/15/2020] [Indexed: 12/16/2022] Open
Abstract
Recent advances in DNA sequencing techniques have made it possible to monitor genomes in great detail over time. This improvement provides an opportunity for us to study natural selection based on time serial samples of genomes while accounting for genetic recombination effect and local linkage information. Such time series genomic data allow for more accurate estimation of population genetic parameters and hypothesis testing on the recent action of natural selection. In this work, we develop a novel Bayesian statistical framework for inferring natural selection at a pair of linked loci by capitalising on the temporal aspect of DNA data with the additional flexibility of modeling the sampled chromosomes that contain unknown alleles. Our approach is built on a hidden Markov model where the underlying process is a two-locus Wright-Fisher diffusion with selection, which enables us to explicitly model genetic recombination and local linkage. The posterior probability distribution for selection coefficients is computed by applying the particle marginal Metropolis-Hastings algorithm, which allows us to efficiently calculate the likelihood. We evaluate the performance of our Bayesian inference procedure through extensive simulations, showing that our approach can deliver accurate estimates of selection coefficients, and the addition of genetic recombination and local linkage brings about significant improvement in the inference of natural selection. We also illustrate the utility of our method on real data with an application to ancient DNA data associated with white spotting patterns in horses.
Collapse
Affiliation(s)
- Zhangyi He
- School of Mathematics, University of Bristol, BS8 1UG, United Kingdom
| | - Xiaoyang Dai
- School of Biological Sciences, University of Bristol, BS8 1TQ, United Kingdom
| | - Mark Beaumont
- School of Biological Sciences, University of Bristol, BS8 1TQ, United Kingdom
| | - Feng Yu
- School of Mathematics, University of Bristol, BS8 1UG, United Kingdom
| |
Collapse
|
34
|
The Effects of Quantitative Trait Architecture on Detection Power in Short-Term Artificial Selection Experiments. G3-GENES GENOMES GENETICS 2020; 10:3213-3227. [PMID: 32646912 PMCID: PMC7466968 DOI: 10.1534/g3.120.401287] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Evolve and resequence (E&R) experiments, in which artificial selection is imposed on organisms in a controlled environment, are becoming an increasingly accessible tool for studying the genetic basis of adaptation. Previous work has assessed how different experimental design parameters affect the power to detect the quantitative trait loci (QTL) that underlie adaptive responses in such experiments, but so far there has been little exploration of how this power varies with the genetic architecture of the evolving traits. In this study, we use forward simulation to build a more realistic model of an E&R experiment in which a quantitative polygenic trait experiences a short, but strong, episode of truncation selection. We study the expected power for QTL detection in such an experiment and how this power is influenced by different aspects of trait architecture, including the number of QTL affecting the trait, their starting frequencies, effect sizes, clustering along a chromosome, dominance, and epistasis patterns. We show that all of these parameters can affect allele frequency dynamics at the QTL and linked loci in complex and often unintuitive ways, and thus influence our power to detect them. One consequence of this is that existing detection methods based on models of independent selective sweeps at individual QTL often have lower detection power than a simple measurement of allele frequency differences before and after selection. Our findings highlight the importance of taking trait architecture into account when designing and interpreting studies of molecular adaptation with temporal data. We provide a customizable modeling framework that will enable researchers to easily simulate E&R experiments with different trait architectures and parameters tuned to their specific study system, allowing for assessment of expected detection power and optimization of experimental design.
Collapse
|
35
|
Abstract
The evolutionary dynamics of a virus can differ within hosts and across populations. Studies of within-host evolution provide an important link between experimental studies of virus evolution and large-scale phylodynamic analyses. They can determine the extent to which global processes are recapitulated on local scales and how accurately experimental infections model natural ones. They may also inform epidemiologic models of disease spread and reveal how host-level dynamics contribute to a virus's evolution at a larger scale. Over the last decade, advances in viral sequencing have enabled detailed studies of viral genetic diversity within hosts. I review how within-host diversity is sampled, measured, and expressed, and how comparative studies of viral diversity can be leveraged to elucidate a virus's evolutionary dynamics. These concepts are illustrated with detailed reviews of recent research on the within-host evolution of influenza virus, dengue virus, and cytomegalovirus.
Collapse
Affiliation(s)
- Adam S Lauring
- Division of Infectious Diseases, Department of Internal Medicine, and Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan 48109, USA;
| |
Collapse
|
36
|
Abstract
Neutral models of evolution assume the absence of natural selection. Formerly confined to ecology and evolutionary biology, neutral models are spreading. In recent years they've been applied to explaining the diversity of baby names, scientific citations, cryptocurrencies, pot decorations, literary lexica, tumour variants and much more besides. Here, we survey important neutral models and highlight their similarities. We investigate the most widely used tests of neutrality, show that they are weak and suggest more powerful methods. We conclude by discussing the role of neutral models in the explanation of diversity. We suggest that the ability of neutral models to fit low-information distributions should not be taken as evidence for the absence of selection. Nevertheless, many studies, in increasingly diverse fields, make just such claims. We call this tendency 'neutral syndrome'.
Collapse
|
37
|
Morales-Arce AY, Harris RB, Stone AC, Jensen JD. Evaluating the contributions of purifying selection and progeny-skew in dictating within-host Mycobacterium tuberculosis evolution. Evolution 2020; 74:992-1001. [PMID: 32233086 DOI: 10.1111/evo.13954] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 03/08/2020] [Indexed: 12/28/2022]
Abstract
The within-host evolutionary dynamics of tuberculosis (TB) remain unclear, and underlying biological characteristics render standard population genetic approaches based upon the Wright-Fisher model largely inappropriate. In addition, the compact genome combined with an absence of recombination is expected to result in strong purifying selection effects. Thus, it is imperative to establish a biologically relevant evolutionary framework incorporating these factors in order to enable an accurate study of this important human pathogen. Further, such a model is critical for inferring fundamental evolutionary parameters related to patient treatment, including mutation rates and the severity of infection bottlenecks. We here implement such a model and infer the underlying evolutionary parameters governing within-patient evolutionary dynamics. Results demonstrate that the progeny skew associated with the clonal nature of TB severely reduces genetic diversity and that the neglect of this parameter in previous studies has led to significant mis-inference of mutation rates. As such, our results suggest an underlying de novo mutation rate that is considerably faster than previously inferred, and a progeny distribution differing significantly from Wright-Fisher assumptions. This inference represents a more appropriate evolutionary null model, against which the periodic effects of positive selection, associated with drug-resistance for example, may be better assessed.
Collapse
Affiliation(s)
- Ana Y Morales-Arce
- Center for Evolution and Medicine, Arizona State University, Tempe, Arizona, USA
| | - Rebecca B Harris
- Center for Evolution and Medicine, Arizona State University, Tempe, Arizona, USA
| | - Anne C Stone
- Center for Evolution and Medicine, Arizona State University, Tempe, Arizona, USA.,School of Human Evolution and Social Change, Arizona State University, Tempe, Arizona, USA
| | - Jeffrey D Jensen
- Center for Evolution and Medicine, Arizona State University, Tempe, Arizona, USA.,School of Life Sciences, Arizona State University, Tempe, Arizona, USA
| |
Collapse
|
38
|
Dehasque M, Ávila‐Arcos MC, Díez‐del‐Molino D, Fumagalli M, Guschanski K, Lorenzen ED, Malaspinas A, Marques‐Bonet T, Martin MD, Murray GGR, Papadopulos AST, Therkildsen NO, Wegmann D, Dalén L, Foote AD. Inference of natural selection from ancient DNA. Evol Lett 2020; 4:94-108. [PMID: 32313686 PMCID: PMC7156104 DOI: 10.1002/evl3.165] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Revised: 01/13/2020] [Accepted: 02/02/2020] [Indexed: 01/01/2023] Open
Abstract
Evolutionary processes, including selection, can be indirectly inferred based on patterns of genomic variation among contemporary populations or species. However, this often requires unrealistic assumptions of ancestral demography and selective regimes. Sequencing ancient DNA from temporally spaced samples can inform about past selection processes, as time series data allow direct quantification of population parameters collected before, during, and after genetic changes driven by selection. In this Comment and Opinion, we advocate for the inclusion of temporal sampling and the generation of paleogenomic datasets in evolutionary biology, and highlight some of the recent advances that have yet to be broadly applied by evolutionary biologists. In doing so, we consider the expected signatures of balancing, purifying, and positive selection in time series data, and detail how this can advance our understanding of the chronology and tempo of genomic change driven by selection. However, we also recognize the limitations of such data, which can suffer from postmortem damage, fragmentation, low coverage, and typically low sample size. We therefore highlight the many assumptions and considerations associated with analyzing paleogenomic data and the assumptions associated with analytical methods.
Collapse
Affiliation(s)
- Marianne Dehasque
- Centre for Palaeogenetics10691StockholmSweden
- Department of Bioinformatics and GeneticsSwedish Museum of Natural History10405StockholmSweden
- Department of ZoologyStockholm University10691StockholmSweden
| | - María C. Ávila‐Arcos
- International Laboratory for Human Genome Research (LIIGH)UNAM JuriquillaQueretaro76230Mexico
| | - David Díez‐del‐Molino
- Centre for Palaeogenetics10691StockholmSweden
- Department of ZoologyStockholm University10691StockholmSweden
| | - Matteo Fumagalli
- Department of Life Sciences, Silwood Park CampusImperial College LondonAscotSL5 7PYUnited Kingdom
| | - Katerina Guschanski
- Animal Ecology, Department of Ecology and Genetics, Science for Life LaboratoryUppsala University75236UppsalaSweden
| | | | - Anna‐Sapfo Malaspinas
- Department of Computational BiologyUniversity of Lausanne1015LausanneSwitzerland
- SIB Swiss Institute of Bioinformatics1015LausanneSwitzerland
| | - Tomas Marques‐Bonet
- Institut de Biologia Evolutiva(CSIC‐Universitat Pompeu Fabra), Parc de Recerca Biomèdica de BarcelonaBarcelonaSpain
- National Centre for Genomic Analysis—Centre for Genomic RegulationBarcelona Institute of Science and Technology08028BarcelonaSpain
- Institucio Catalana de Recerca i Estudis Avançats08010BarcelonaSpain
- Institut Català de Paleontologia Miquel CrusafontUniversitat Autònoma de BarcelonaCerdanyola del VallèsSpain
| | - Michael D. Martin
- Department of Natural History, NTNU University MuseumNorwegian University of Science and Technology (NTNU)TrondheimNorway
| | - Gemma G. R. Murray
- Department of Veterinary MedicineUniversity of CambridgeCambridgeCB2 1TNUnited Kingdom
| | - Alexander S. T. Papadopulos
- Molecular Ecology and Fisheries Genetics Laboratory, School of Biological SciencesBangor UniversityBangorLL57 2UWUnited Kingdom
| | | | - Daniel Wegmann
- Department of BiologyUniversité de Fribourg1700FribourgSwitzerland
- Swiss Institute of BioinformaticsFribourgSwitzerland
| | - Love Dalén
- Centre for Palaeogenetics10691StockholmSweden
- Department of Bioinformatics and GeneticsSwedish Museum of Natural History10405StockholmSweden
| | - Andrew D. Foote
- Molecular Ecology and Fisheries Genetics Laboratory, School of Biological SciencesBangor UniversityBangorLL57 2UWUnited Kingdom
| |
Collapse
|
39
|
Spitzer K, Pelizzola M, Futschik A. Modifying the Chi-square and the CMH test for population genetic inference: Adapting to overdispersion. Ann Appl Stat 2020. [DOI: 10.1214/19-aoas1301] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
40
|
Kojima Y, Matsumoto H, Kiryu H. Estimation of population genetic parameters using an EM algorithm and sequence data from experimental evolution populations. Bioinformatics 2020; 36:221-231. [PMID: 31218366 DOI: 10.1093/bioinformatics/btz498] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2017] [Revised: 05/14/2019] [Accepted: 06/12/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Evolve and resequence (E&R) experiments show promise in capturing real-time evolution at genome-wide scales, enabling the assessment of allele frequency changes SNPs in evolving populations and thus the estimation of population genetic parameters in the Wright-Fisher model (WF) that quantify the selection on SNPs. Currently, these analyses face two key difficulties: the numerous SNPs in E&R data and the frequent unreliability of estimates. Hence, a methodology for efficiently estimating WF parameters is needed to understand the evolutionary processes that shape genomes. RESULTS We developed a novel method for estimating WF parameters (EMWER), by applying an expectation maximization algorithm to the Kolmogorov forward equation associated with the WF model diffusion approximation. EMWER was used to infer the effective population size, selection coefficients and dominance parameters from E&R data. Of the methods examined, EMWER was the most efficient method for selection strength estimation in multi-core computing environments, estimating both selection and dominance with accurate confidence intervals. We applied EMWER to E&R data from experimental Drosophila populations adapting to thermally fluctuating environments and found a common selection affecting allele frequency of many SNPs within the cosmopolitan In(3R)P inversion. Furthermore, this application indicated that many of beneficial alleles in this experiment are dominant. AVAILABILITY AND IMPLEMENTATION Our C++ implementation of 'EMWER' is available at https://github.com/kojikoji/EMWER. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yasuhiro Kojima
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 1277-8561, Japan
| | - Hirotaka Matsumoto
- Bioinformatics Research Unit, Advanced Center for Computing and Communication, RIKEN, Wako, Saitama 351-0198, Japan
| | - Hisanori Kiryu
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 1277-8561, Japan
| |
Collapse
|
41
|
Qian C, Yan X, Shi Y, Yin H, Chang Y, Chen J, Ingvarsson PK, Nevo E, Ma XF. Adaptive signals of flowering time pathways in wild barley from Israel over 28 generations. Heredity (Edinb) 2020; 124:62-76. [PMID: 31527784 PMCID: PMC6906298 DOI: 10.1038/s41437-019-0264-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Revised: 08/13/2019] [Accepted: 08/13/2019] [Indexed: 01/06/2023] Open
Abstract
Flowering time is one of the most critical traits for plants' life cycles, which is influenced by various environment changes, such as global warming. Previous studies have suggested that to guarantee reproductive success, plants have shifted flowering times to adapt to global warming. Although many studies focused on the molecular mechanisms of early flowering, little was supported by the repeated sampling at different time points through the changing climate. To fully dissect the temporal and spatial evolutionary genetics of flowering time, we investigated nucleotide variation in ten flowering time candidate genes and nine reference genes for the same ten wild-barley populations sampled 28 years apart (1980-2008). The overall genetic differentiation was significantly greater in the descendant populations (2008) compared with the ancestral populations (1980); however, local adaptation tests failed to detect any single-nucleotide polymorphism (SNP)/indel under spatial-diversifying selection at either time point. By contrast, the WFABC (Wright-Fisher ABC-based approach) that detected 54 SNPs/indels was under strong selection during the past 28 generations. Moreover, all these 54 alleles were segregated in the ancestral populations, but fixed in the descendent populations. Among the top ten SNPs/indels, seven were located in genes of FT1 (FLOWERING TIME LOCUS T 1), CO1 (CONSTANS-LIKE PROTEIN 1), and VRN-H2 (VERNALIZATION-H2), which have been documented to be associated with flowering time regulation in barley cultivars. This study might suggest that all ten populations have undergone parallel evolution over the past few decades in response to global warming, and even an overwhelming local adaptation and ecological differentiation.
Collapse
Affiliation(s)
- Chaoju Qian
- Department of Ecology and Agriculture Research, Key Laboratory of Stress Physiology and Ecology in Cold and Arid Regions, Gansu Province, Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences, Lanzhou, 730000, Gansu, China
| | - Xia Yan
- School of Life Sciences, Nantong University, Nantong, 226019, Jiangsu, China
| | - Yong Shi
- Germplasm Bank of Wild Species in Southwest China, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
| | - Hengxia Yin
- State Key Laboratory of Plateau Ecology and Agriculture, Qinghai University, Xining, 810016, Qinghai, China
| | - Yuxiao Chang
- Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 450002, China
| | - Jun Chen
- College of Life Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Pär K Ingvarsson
- Department of Plant Biology, Swedish University of Agricultural Sciences, Uppsala BioCenter, SE-750 07, Uppsala, Sweden
| | - Eviatar Nevo
- Institute of Evolution, University of Haifa, Haifa, 3498838, Israel.
| | - Xiao-Fei Ma
- Department of Ecology and Agriculture Research, Key Laboratory of Stress Physiology and Ecology in Cold and Arid Regions, Gansu Province, Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences, Lanzhou, 730000, Gansu, China.
| |
Collapse
|
42
|
Inference of Selection from Genetic Time Series Using Various Parametric Approximations to the Wright-Fisher Model. G3-GENES GENOMES GENETICS 2019; 9:4073-4086. [PMID: 31597676 PMCID: PMC6893182 DOI: 10.1534/g3.119.400778] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Detecting genomic regions under selection is an important objective of population genetics. Typical analyses for this goal are based on exploiting genetic diversity patterns in present time data but rapid advances in DNA sequencing have increased the availability of time series genomic data. A common approach to analyze such data is to model the temporal evolution of an allele frequency as a Markov chain. Based on this principle, several methods have been proposed to infer selection intensity. One of their differences lies in how they model the transition probabilities of the Markov chain. Using the Wright-Fisher model is a natural choice but its computational cost is prohibitive for large population sizes so approximations to this model based on parametric distributions have been proposed. Here, we compared the performance of some of these approximations with respect to their power to detect selection and their estimation of the selection coefficient. We developped a new generic Hidden Markov Model likelihood calculator and applied it on genetic time series simulated under various evolutionary scenarios. The Beta with spikes approximation, which combines discrete fixation probabilities with a continuous Beta distribution, was found to perform consistently better than the others. This distribution provides an almost perfect fit to the Wright-Fisher model in terms of selection inference, for a computational cost that does not increase with population size. We further evaluated this model for population sizes not accessible to the Wright-Fisher model and illustrated its performance on a dataset of two divergently selected chicken populations.
Collapse
|
43
|
Vlachos C, Burny C, Pelizzola M, Borges R, Futschik A, Kofler R, Schlötterer C. Benchmarking software tools for detecting and quantifying selection in evolve and resequencing studies. Genome Biol 2019; 20:169. [PMID: 31416462 PMCID: PMC6694636 DOI: 10.1186/s13059-019-1770-8] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Accepted: 07/22/2019] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND The combination of experimental evolution with whole-genome resequencing of pooled individuals, also called evolve and resequence (E&R) is a powerful approach to study the selection processes and to infer the architecture of adaptive variation. Given the large potential of this method, a range of software tools were developed to identify selected SNPs and to measure their selection coefficients. RESULTS In this benchmarking study, we compare 15 test statistics implemented in 10 software tools using three different scenarios. We demonstrate that the power of the methods differs among the scenarios, but some consistently outperform others. LRT-1, CLEAR, and the CMH test perform best despite LRT-1 and the CMH test not requiring time series data. CLEAR provides the most accurate estimates of selection coefficients. CONCLUSION This benchmark study will not only facilitate the analysis of already existing data, but also affect the design of future data collections.
Collapse
Affiliation(s)
- Christos Vlachos
- Institut für Populationsgenetik, Vetmeduni Vienna, Veterinärplatz 1, Wien, 1210, Austria
- Vienna Graduate School of Population Genetics, Vienna, Austria
| | - Claire Burny
- Institut für Populationsgenetik, Vetmeduni Vienna, Veterinärplatz 1, Wien, 1210, Austria
- Vienna Graduate School of Population Genetics, Vienna, Austria
| | - Marta Pelizzola
- Institut für Populationsgenetik, Vetmeduni Vienna, Veterinärplatz 1, Wien, 1210, Austria
- Vienna Graduate School of Population Genetics, Vienna, Austria
| | - Rui Borges
- Institut für Populationsgenetik, Vetmeduni Vienna, Veterinärplatz 1, Wien, 1210, Austria
| | - Andreas Futschik
- Institute of Applied Statistics, Johannes Kepler University, Linz, 4040, Austria
- Plattform Bioinformatik und Biostatistik, Vetmeduni Vienna, Veterinärplatz 1, Wien, 1210, Austria
| | - Robert Kofler
- Institut für Populationsgenetik, Vetmeduni Vienna, Veterinärplatz 1, Wien, 1210, Austria.
| | - Christian Schlötterer
- Institut für Populationsgenetik, Vetmeduni Vienna, Veterinärplatz 1, Wien, 1210, Austria.
| |
Collapse
|
44
|
Robledo‐Arnuncio JJ, Unger GM. Measuring viability selection from prospective cohort mortality studies: A case study in maritime pine. Evol Appl 2019; 12:863-877. [PMID: 31080501 PMCID: PMC6503825 DOI: 10.1111/eva.12729] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Revised: 10/05/2018] [Accepted: 10/15/2018] [Indexed: 11/27/2022] Open
Abstract
By changing the genetic background available for selection at subsequent life stages, stage-specific selection can define adaptive potential across the life cycle. We propose and evaluate here a neutrality test and a Bayesian method to infer stage-specific viability selection coefficients using sequential random genotypic samples drawn from a longitudinal cohort mortality study, within a generation. The approach is suitable for investigating selective mortality in large natural or experimental cohorts of any organism in which individual tagging and tracking are unfeasible. Numerical simulation results indicate that the method can discriminate loci under strong viability selection, and provided samples are large, yield accurate estimates of the corresponding selection coefficients. Genotypic frequency changes are largely driven by sampling noise under weak selection, however, compromising inference in that case. We apply the proposed methods to analyze viability selection operating at early recruitment stages in a natural maritime pine (Pinus pinaster Ait.) population. We measured temporal genotypic frequency changes at 384 candidate-gene SNP loci among seedlings sampled from the time of emergence in autumn until the summer of the following year, a period with high elimination rates. We detected five loci undergoing allele frequency changes larger than expected from stochastic mortality and sampling, with putative functions that could influence survival at early seedling stages. Our results illustrate how new statistical and sampling schemes can be used to conduct genomic scans of contemporary selection on specific life stages.
Collapse
Affiliation(s)
| | - Gregor M. Unger
- Department of Forest Ecology & GeneticsINIA‐CIFORMadridSpain
- Escuela Internacional de DoctoradoUniversidad Rey Juan CarlosMóstolesSpain
- Present address:
Department of Forest GeneticsFederal Research and Training Centre for ForestsNatural Hazards and LandscapeViennaAustria
| |
Collapse
|
45
|
Rêgo A, Messina FJ, Gompert Z. Dynamics of genomic change during evolutionary rescue in the seed beetle
Callosobruchus maculatus. Mol Ecol 2019; 28:2136-2154. [DOI: 10.1111/mec.15085] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2018] [Revised: 03/18/2019] [Accepted: 03/18/2019] [Indexed: 12/21/2022]
Affiliation(s)
- Alexandre Rêgo
- Department of Biology Utah State University Logan Utah
- Ecology Center Utah State University Logan Utah
| | - Frank J. Messina
- Department of Biology Utah State University Logan Utah
- Ecology Center Utah State University Logan Utah
| | - Zachariah Gompert
- Department of Biology Utah State University Logan Utah
- Ecology Center Utah State University Logan Utah
| |
Collapse
|
46
|
|
47
|
Maximum Likelihood Estimation of Fitness Components in Experimental Evolution. Genetics 2019; 211:1005-1017. [PMID: 30679262 PMCID: PMC6404243 DOI: 10.1534/genetics.118.301893] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Accepted: 01/15/2019] [Indexed: 12/30/2022] Open
Abstract
Estimating fitness differences between allelic variants is a central goal of experimental evolution. Current methods for inferring such differences from allele frequency time series typically assume that the effects of selection can be described by a fixed selection coefficient. However, fitness is an aggregate of several components including mating success, fecundity, and viability. Distinguishing between these components could be critical in many scenarios. Here, we develop a flexible maximum likelihood framework that can disentangle different components of fitness from genotype frequency data, and estimate them individually in males and females. As a proof-of-principle, we apply our method to experimentally evolved cage populations of Drosophila melanogaster, in which we tracked the relative frequencies of a loss-of-function and wild-type allele of yellow This X-linked gene produces a recessive yellow phenotype when disrupted and is involved in male courtship ability. We find that the fitness costs of the yellow phenotype take the form of substantially reduced mating preference of wild-type females for yellow males, together with a modest reduction in the viability of yellow males and females. Our framework should be generally applicable to situations where it is important to quantify fitness components of specific genetic variants, including quantitative characterization of the population dynamics of CRISPR gene drives.
Collapse
|
48
|
Inferring Demography and Selection in Organisms Characterized by Skewed Offspring Distributions. Genetics 2019; 211:1019-1028. [PMID: 30651284 DOI: 10.1534/genetics.118.301684] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Accepted: 01/15/2019] [Indexed: 01/01/2023] Open
Abstract
The recent increase in time-series population genomic data from experimental, natural, and ancient populations has been accompanied by a promising growth in methodologies for inferring demographic and selective parameters from such data. However, these methods have largely presumed that the populations of interest are well-described by the Kingman coalescent. In reality, many groups of organisms, including viruses, marine organisms, and some plants, protists, and fungi, typified by high variance in progeny number, may be best characterized by multiple-merger coalescent models. Estimation of population genetic parameters under Wright-Fisher assumptions for these organisms may thus be prone to serious mis-inference. We propose a novel method for the joint inference of demography and selection under the Ψ-coalescent model, termed Multiple-Merger Coalescent Approximate Bayesian Computation, or MMC-ABC. We first demonstrate mis-inference under the Kingman, and then exhibit the superior performance of MMC-ABC under conditions of skewed offspring distributions. In order to highlight the utility of this approach, we reanalyzed previously published drug-selection lines of influenza A virus. We jointly inferred the extent of progeny-skew inherent to viral replication and identified putative drug-resistance mutations.
Collapse
|
49
|
Shim H. Feature Learning of Virus Genome Evolution With the Nucleotide Skip-Gram Neural Network. Evol Bioinform Online 2019; 15:1176934318821072. [PMID: 30692845 PMCID: PMC6335656 DOI: 10.1177/1176934318821072] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Accepted: 11/15/2018] [Indexed: 12/14/2022] Open
Abstract
Recent studies reveal that even the smallest genomes such as viruses evolve through complex and stochastic processes, and the assumption of independent alleles is not valid in most applications. Advances in sequencing technologies produce multiple time-point whole-genome data, which enable potential interactions between these alleles to be investigated empirically. To investigate these interactions, we represent alleles as distributed vectors that encode for relationships with other alleles in the course of evolution and apply artificial neural networks to time-sampled whole-genome datasets for feature learning. We build this platform using methods and algorithms derived from natural language processing (NLP), and we denote it as the nucleotide skip-gram neural network. We learn distributed vectors of alleles using the changes in allele frequency of echovirus 11 in the presence or absence of the disinfectant (ClO2) from the experimental evolution data. Results from the training using a new open-source software TensorFlow show that the learned distributed vectors can be clustered using principal component analysis and hierarchical clustering to reveal a list of non-synonymous mutations that arise on the structural protein VP1 in connection to the candidate mutation for ClO2 adaptation. Furthermore, this method can account for recombination rates by setting the extent of interactions as a biological hyper-parameter, and the results show that the most realistic scenario of mid-range interactions across the genome is most consistent with the previous studies.
Collapse
Affiliation(s)
- Hyunjin Shim
- Artificial Intelligence Laboratory, Stanford University, Stanford, CA, USA.,School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| |
Collapse
|
50
|
Mutations in Influenza A Virus Neuraminidase and Hemagglutinin Confer Resistance against a Broadly Neutralizing Hemagglutinin Stem Antibody. J Virol 2019; 93:JVI.01639-18. [PMID: 30381484 DOI: 10.1128/jvi.01639-18] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Accepted: 10/22/2018] [Indexed: 11/20/2022] Open
Abstract
Influenza A virus (IAV), a major cause of human morbidity and mortality, continuously evolves in response to selective pressures. Stem-directed, broadly neutralizing antibodies (sBnAbs) targeting the influenza virus hemagglutinin (HA) are a promising therapeutic strategy, but neutralization escape mutants can develop. We used an integrated approach combining viral passaging, deep sequencing, and protein structural analyses to define escape mutations and mechanisms of neutralization escape in vitro for the F10 sBnAb. IAV was propagated with escalating concentrations of F10 over serial passages in cultured cells to select for escape mutations. Viral sequence analysis revealed three mutations in HA and one in neuraminidase (NA). Introduction of these specific mutations into IAV through reverse genetics confirmed their roles in resistance to F10. Structural analyses revealed that the selected HA mutations (S123G, N460S, and N203V) are away from the F10 epitope but may indirectly impact influenza virus receptor binding, endosomal fusion, or budding. The NA mutation E329K, which was previously identified to be associated with antibody escape, affects the active site of NA, highlighting the importance of the balance between HA and NA function for viral survival. Thus, whole-genome population sequencing enables the identification of viral resistance mutations responding to antibody-induced selective pressure.IMPORTANCE Influenza A virus is a public health threat for which currently available vaccines are not always effective. Broadly neutralizing antibodies that bind to the highly conserved stem region of the influenza virus hemagglutinin (HA) can neutralize many influenza virus strains. To understand how influenza virus can become resistant or escape such antibodies, we propagated influenza A virus in vitro with escalating concentrations of antibody and analyzed viral populations by whole-genome sequencing. We identified HA mutations near and distal to the antibody binding epitope that conferred resistance to antibody neutralization. Additionally, we identified a neuraminidase (NA) mutation that allowed the virus to grow in the presence of high concentrations of the antibody. Virus carrying dual mutations in HA and NA also grew under high antibody concentrations. We show that NA mutations mediate the escape of neutralization by antibodies against HA, highlighting the importance of a balance between HA and NA for optimal virus function.
Collapse
|