1
|
Liao K, Carlson J, Zöllner S. The effect of mutation subtypes on the allele frequency spectrum and population genetics inference. G3 (Bethesda) 2023; 13:jkad035. [PMID: 36759699 PMCID: PMC10085755 DOI: 10.1093/g3journal/jkad035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 01/23/2023] [Accepted: 01/26/2023] [Indexed: 02/11/2023]
Abstract
Population genetics has adapted as technological advances in next-generation sequencing have resulted in an exponential increase of genetic data. A common approach to efficiently analyze genetic variation present in large sequencing data is through the allele frequency spectrum, defined as the distribution of allele frequencies in a sample. While the frequency spectrum serves to summarize patterns of genetic variation, it implicitly assumes mutation types (A→C vs C→T) as interchangeable. However, mutations of different types arise and spread due to spatial and temporal variation in forces such as mutation rate and biased gene conversion that result in heterogeneity in the distribution of allele frequencies across sites. In this work, we explore the impact of this simplification on multiple aspects of population genetic modeling. As a site's mutation rate is strongly affected by flanking nucleotides, we defined a mutation subtype by the base pair change and adjacent nucleotides (e.g. AAA→ATA) and systematically assessed the heterogeneity in the frequency spectrum across 96 distinct 3-mer mutation subtypes using n = 3556 whole-genome sequenced individuals of European ancestry. We observed substantial variation across the subtype-specific frequency spectra, with some of the variation being influenced by molecular factors previously identified for single base mutation types. Estimates of model parameters from demographic inference performed for each mutation subtype's AFS individually varied drastically across the 96 subtypes. In local patterns of variation, a combination of regional subtype composition and local genomic factors shaped the regional frequency spectrum across genomic regions. Our results illustrate how treating variants in large sequencing samples as interchangeable may confound population genetic frameworks and encourages us to consider the unique evolutionary mechanisms of analyzed polymorphisms.
Collapse
Affiliation(s)
- Kevin Liao
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jedidiah Carlson
- Department of Integrative Biology, University of Texas at Austin, Austin, TX 78712, USA
- Department of Population Health, University of Texas at Austin, Austin, TX 78712, USA
| | - Sebastian Zöllner
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Psychiatry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
2
|
Abstract
Testing among competing demographic models of divergence has become an important component of evolutionary research in model and non-model organisms. However, the effect of unaccounted demographic events on model choice and parameter estimation remains largely unexplored. Using extensive simulations, we demonstrate that under realistic divergence scenarios, failure to account for population size (Ne) changes in daughter and ancestral populations leads to strong biases in divergence time estimates as well as model choice. We illustrate these issues reconstructing the recent demographic history of North Sea and Baltic Sea turbots (Scophthalmus maximus) by testing 16 isolation with migration (IM) and 16 secondary contact (SC) scenarios, modeling changes in Ne as well as the effects of linked selection and barrier loci. Failure to account for changes in Ne resulted in selecting SC models with long periods of strict isolation and divergence times preceding the formation of the Baltic Sea. In contrast, models accounting for Ne changes suggest recent (<6 kya) divergence with constant gene flow. We further show how interpreting genomic landscapes of differentiation can help discerning among competing models. For example, in the turbot data, islands of differentiation show signatures of recent selective sweeps, rather than old divergence resisting secondary introgression. The results have broad implications for the study of population divergence by highlighting the potential effects of unmodeled changes in Ne on demographic inference. Tested models should aim at representing realistic divergence scenarios for the target taxa, and extreme caution should always be exercised when interpreting results of demographic modeling.
Collapse
Affiliation(s)
- Paolo Momigliano
- Ecological Genetics Research Unit, Organismal and Evolutionary Biology Research Programme, University of Helsinki, Helsinki, Finland
| | - Ann-Britt Florin
- Department of Aquatic Resources, Institute of Coastal Research, Swedish University of Agricultural Sciences, Öregrund, Sweden
| | - Juha Merilä
- Ecological Genetics Research Unit, Organismal and Evolutionary Biology Research Programme, University of Helsinki, Helsinki, Finland.,Division of Ecology and Biodiversity, Faculty of Science, The University of Hong Kong, Hong Kong SAR
| |
Collapse
|
3
|
Arnoux S, Fraïsse C, Sauvage C. Genomic inference of complex domestication histories in three Solanaceae species. J Evol Biol 2020; 34:270-283. [PMID: 33107098 DOI: 10.1111/jeb.13723] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 10/15/2020] [Indexed: 12/30/2022]
Abstract
Domestication is a human-induced selection process that imprints the genomes of domesticated populations over a short evolutionary time scale and that occurs in a given demographic context. Reconstructing historical gene flow, effective population size changes and their timing is therefore of fundamental interest to understand how plant demography and human selection jointly shape genomic divergence during domestication. Yet, the comparison under a single statistical framework of independent domestication histories across different crop species has been little evaluated so far. Thus, it is unclear whether domestication leads to convergent demographic changes that similarly affect crop genomes. To address this question, we used existing and new transcriptome data on three crop species of Solanaceae (eggplant, pepper and tomato), together with their close wild relatives. We fitted twelve demographic models of increasing complexity on the unfolded joint allele frequency spectrum for each wild/crop pair, and we found evidence for both shared and species-specific demographic processes between species. A convergent history of domestication with gene flow was inferred for all three species, along with evidence of strong reduction in the effective population size during the cultivation stage of tomato and pepper. The absence of any reduction in size of the crop in eggplant stands out from the classical view of the domestication process; as does the existence of a "protracted period" of management before cultivation. Our results also suggest divergent management strategies of modern cultivars among species as their current demography substantially differs. Finally, the timing of domestication is species-specific and supported by the few historical records available.
Collapse
Affiliation(s)
- Stéphanie Arnoux
- INRA UR1052 GAFL, Centre de Recherche INRA PACA, Avignon Cedex 9, France.,Vilmorin SA, Lédenon, France
| | | | - Christopher Sauvage
- INRA UR1052 GAFL, Centre de Recherche INRA PACA, Avignon Cedex 9, France.,Syngenta SAS France, Saint Sauveur, France
| |
Collapse
|
4
|
Noskova E, Ulyantsev V, Koepfli KP, O’Brien SJ, Dobrynin P. GADMA: Genetic algorithm for inferring demographic history of multiple populations from allele frequency spectrum data. Gigascience 2020; 9:giaa005. [PMID: 32112099 PMCID: PMC7049072 DOI: 10.1093/gigascience/giaa005] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2019] [Revised: 09/16/2019] [Accepted: 01/13/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND The demographic history of any population is imprinted in the genomes of the individuals that make up the population. One of the most popular and convenient representations of genetic information is the allele frequency spectrum (AFS), the distribution of allele frequencies in populations. The joint AFS is commonly used to reconstruct the demographic history of multiple populations, and several methods based on diffusion approximation (e.g., ∂a∂i) and ordinary differential equations (e.g., moments) have been developed and applied for demographic inference. These methods provide an opportunity to simulate AFS under a variety of researcher-specified demographic models and to estimate the best model and associated parameters using likelihood-based local optimizations. However, there are no known algorithms to perform global searches of demographic models with a given AFS. RESULTS Here, we introduce a new method that implements a global search using a genetic algorithm for the automatic and unsupervised inference of demographic history from joint AFS data. Our method is implemented in the software GADMA (Genetic Algorithm for Demographic Model Analysis, https://github.com/ctlab/GADMA). CONCLUSIONS We demonstrate the performance of GADMA by applying it to sequence data from humans and non-model organisms and show that it is able to automatically infer a demographic model close to or even better than the one that was previously obtained manually. Moreover, GADMA is able to infer multiple demographic models at different local optima close to the global one, providing a larger set of possible scenarios to further explore demographic history.
Collapse
Affiliation(s)
- Ekaterina Noskova
- Computer Technologies Laboratory, ITMO University, 49 Kronverkskiy Pr., St. Petersburg 197101, Russian Federation
| | - Vladimir Ulyantsev
- Computer Technologies Laboratory, ITMO University, 49 Kronverkskiy Pr., St. Petersburg 197101, Russian Federation
| | - Klaus-Peter Koepfli
- Computer Technologies Laboratory, ITMO University, 49 Kronverkskiy Pr., St. Petersburg 197101, Russian Federation
- Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park, 3001 Connecticut Ave., NW Washington, D.C. 20008, USA
| | - Stephen J O’Brien
- Computer Technologies Laboratory, ITMO University, 49 Kronverkskiy Pr., St. Petersburg 197101, Russian Federation
- Guy Harvey Oceanographic Center, Nova Southeastern University Ft. Lauderdale, 8000 North Ocean Drive, Ft. Lauderdale, Florida 33004, USA
| | - Pavel Dobrynin
- Computer Technologies Laboratory, ITMO University, 49 Kronverkskiy Pr., St. Petersburg 197101, Russian Federation
- Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park, 3001 Connecticut Ave., NW Washington, D.C. 20008, USA
| |
Collapse
|
5
|
Pyhäjärvi T, Kujala ST, Savolainen O. 275 years of forestry meets genomics in Pinus sylvestris. Evol Appl 2020; 13:11-30. [PMID: 31988655 PMCID: PMC6966708 DOI: 10.1111/eva.12809] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 04/05/2019] [Accepted: 04/24/2019] [Indexed: 12/12/2022] Open
Abstract
Pinus sylvestris has a long history of basic and applied research that is relevant for both forestry and evolutionary studies. Its patterns of adaptive variation and role in forest economic and ecological systems have been studied extensively for nearly 275 years, detailed demography for a 100 years and mating system more than 50 years. However, its reference genome sequence is not yet available and genomic studies have been lagging compared to, for example, Pinus taeda and Picea abies, two other economically important conifers. Despite the lack of reference genome, many modern genomic methods are applicable for a more detailed look at its biological characteristics. For example, RNA-seq has revealed a complex transcriptional landscape and targeted DNA sequencing displays an excess of rare variants and geographically homogenously distributed molecular genetic diversity. Current DNA and RNA resources can be used as a reference for gene expression studies, SNP discovery, and further targeted sequencing. In the future, specific consequences of the large genome size, such as functional effects of regulatory open chromatin regions and transposable elements, should be investigated more carefully. For forest breeding and long-term management purposes, genomic data can help in assessing the genetic basis of inbreeding depression and the application of genomic tools for genomic prediction and relatedness estimates. Given the challenges of breeding (long generation time, no easy vegetative propagation) and the economic importance, application of genomic tools has a potential to have a considerable impact. Here, we explore how genomic characteristics of P. sylvestris, such as rare alleles and the low extent of linkage disequilibrium, impact the applicability and power of the tools.
Collapse
Affiliation(s)
- Tanja Pyhäjärvi
- Department of Ecology and GeneticsUniversity of OuluOuluFinland
- Biocenter OuluUniversity of OuluOuluFinland
| | | | - Outi Savolainen
- Department of Ecology and GeneticsUniversity of OuluOuluFinland
- Biocenter OuluUniversity of OuluOuluFinland
| |
Collapse
|
6
|
Nikolic N, Liu S, Jacobsen MW, Jónsson B, Bernatchez L, Gagnaire PA, Hansen MM. Speciation history of European (Anguilla anguilla) and American eel (A. rostrata), analysed using genomic data. Mol Ecol 2019; 29:565-577. [PMID: 31863605 DOI: 10.1111/mec.15342] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 12/11/2019] [Accepted: 12/16/2019] [Indexed: 02/01/2023]
Abstract
Speciation in the ocean could differ from terrestrial environments due to fewer barriers to gene flow. Hence, sympatric speciation might be common, with American and European eel being candidates for exemplifying this. They show disjunct continental distributions on both sides of the Atlantic, but spawn in overlapping regions of the Sargasso Sea from where juveniles are advected to North American, European and North African coasts. Hybridization and introgression are known to occur, with hybrids almost exclusively observed in Iceland. Different speciation scenarios have been suggested, involving either vicariance or sympatric ecological speciation. Using RAD sequencing and whole-genome sequencing data from parental species and F1 hybrids, we analysed speciation history based on the joint allele frequency spectrum (JAFS) and pairwise sequentially Markovian coalescent (PSMC) plots. JAFS supported a model involving a split without gene flow 150,000-160,000 generations ago, followed by secondary contact 87,000-92,000 generations ago, with 64% of the genome experiencing restricted gene flow. This supports vicariance rather than sympatric speciation, likely associated with Pleistocene glaciation cycles and ocean current changes. Whole-genome PSMC analysis of F1 hybrids from Iceland suggested divergence 200,000 generations ago and indicated subsequent gene flow rather than strict isolation. Finally, simulations showed that results from both approaches (JAFS and PSMC) were congruent. Hence, there is strong evidence against sympatric speciation in North Atlantic eels. These results reiterate the need for careful consideration of cases of possible sympatric speciation, as even in seemingly barrier-free oceanic environments palaeoceanographic factors may have promoted vicariance and allopatric speciation.
Collapse
Affiliation(s)
- Natacha Nikolic
- Agence de Recherche pour la Biodiversité à la Réunion, ARBRE, Saint-Leu, Réunion
| | - Shenglin Liu
- Department of Bioscience, Aarhus University, Aarhus C, Denmark
| | | | | | - Louis Bernatchez
- IBIS (Institut de Biologie Intégrative et des Systèmes), Université Laval, Québec, QC, Canada
| | | | | |
Collapse
|
7
|
Titus BM, Blischak PD, Daly M. Genomic signatures of sympatric speciation with historical and contemporary gene flow in a tropical anthozoan (Hexacorallia: Actiniaria). Mol Ecol 2019; 28:3572-3586. [PMID: 31233641 DOI: 10.1111/mec.15157] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2018] [Revised: 05/21/2019] [Accepted: 06/04/2019] [Indexed: 12/23/2022]
Abstract
Sympatric diversification is recognized to have played an important role in the evolution of biodiversity. However, an in situ sympatric origin for codistributed taxa is difficult to demonstrate because different evolutionary processes can lead to similar biogeographic outcomes, especially in ecosystems that can readily facilitate secondary contact due to a lack of hard barriers to dispersal. Here we use a genomic (ddRADseq), model-based approach to delimit a species complex of tropical sea anemones that are codistributed on coral reefs throughout the Tropical Western Atlantic. We use coalescent simulations in fastsimcoal2 and ordinary differential equations in Moments to test competing diversification scenarios that span the allopatric-sympatric continuum. Our results suggest that the corkscrew sea anemone Bartholomea annulata is a cryptic species complex whose members are codistributed throughout their range. Simulation and model selection analyses from both approaches suggest these lineages experienced historical and contemporary gene flow, supporting a sympatric origin, but an alternative secondary contact model receives appreciable model support in fastsimcoal2. Leveraging the genome of the closely related Exaiptasia diaphana, we identify five loci under divergent selection between cryptic B. annulata lineages that fall within mRNA transcripts or CDS regions. Our study provides a rare empirical, genomic example of sympatric speciation in a tropical anthozoan and the first range-wide molecular study of a tropical sea anemone, underscoring that anemone diversity is under-described in the tropics, and highlighting the need for additional systematic studies into these ecologically and economically important species.
Collapse
Affiliation(s)
- Benjamin M Titus
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, OH, USA.,Division of Invertebrate Zoology, American Museum of Natural History, New York, NY, USA
| | - Paul D Blischak
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, OH, USA.,Department of Ecology and Evolutionary Biology, The University of Arizona, Tucson, AZ, USA
| | - Marymegan Daly
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, OH, USA
| |
Collapse
|