1
|
Delord C, Arnaud‐Haond S, Leone A, Rolland J, Nikolic N. Unraveling the Complexity of the N e/ N c Ratio for Conservation of Large and Widespread Pelagic Fish Species: Current Status and Challenges. Evol Appl 2024; 17:e70020. [PMID: 39391864 PMCID: PMC11464753 DOI: 10.1111/eva.70020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 09/09/2024] [Accepted: 09/13/2024] [Indexed: 10/12/2024] Open
Abstract
Estimating and understanding the ratio between effective population size (N e) and census population size (N c) are pivotal in the conservation of large marine pelagic fish species, including bony fish such as tunas and cartilaginous fish such as sharks, given the challenges associated with obtaining accurate estimates of their abundance. The difficulties inherent in capturing and monitoring these species in vast and dynamic marine environments often make direct estimation of their population size challenging. By focusing on N e, it is conceivable in certain cases to approximate census size once the N e/N c ratio is known, although this ratio can vary and does not always increase linearly, as it is influenced by various ecological and evolutionary factors. Thus, this ratio presents challenges and complexities in the context of pelagic species conservation. To delve deeper into these challenges, firstly, we recall the diverse types of effective population sizes, including contemporary and historical sizes, and their implications in conservation biology. Secondly, we outline current knowledge about the influence of life history traits on the N e/N c ratio in the light of examples drawn from large and abundant pelagic fish species. Despite efforts to document an increasing number of marine species using recent technologies and statistical methods, establishing general rules to predict N e/N c remains elusive, necessitating further research and investment. Finally, we recall statistical challenges in relating N e and N c emphasizing the necessity of aligning temporal and spatial scales. This last part discusses the roles of generation and reproductive cycle effective population sizes to predict genetic erosion and guiding management strategies. Collectively, these sections underscore the multifaceted nature of effective population size estimation, crucial for preserving genetic diversity and ensuring the long-term viability of populations. By navigating statistical and theoretical complexities, and addressing methodological challenges, scientists should be able to advance our understanding of the N e/N c ratio.
Collapse
Affiliation(s)
- Chrystelle Delord
- UMR248 MARBEC, Univ. MontpellierIfremer, IRD, CNRSLa RéunionFrance
- UMR248 MARBEC, Univ. MontpellierIfremer, IRD, CNRSSèteFrance
| | | | - Agostino Leone
- UMR248 MARBEC, Univ. MontpellierIfremer, IRD, CNRSSèteFrance
- Department of Earth and Marine Sciences (DiSTeM)University of PalermoPalermoItaly
- National Biodiversity Future CenterPalermoItaly
| | - Jonathan Rolland
- Centre de Recherche Sur la Biodiversité et l'Environnement (CRBE)Université de Toulouse, CNRS, IRD, Toulouse INP, Université Toulouse 3 – Paul Sabatier (UT3)ToulouseFrance
| | - Natacha Nikolic
- Centre de Recherche Sur la Biodiversité et l'Environnement (CRBE)Université de Toulouse, CNRS, IRD, Toulouse INP, Université Toulouse 3 – Paul Sabatier (UT3)ToulouseFrance
- Universite de Pau et des Pays de l’Adour, INRAE, AQUA, ECOBIOPSain‐Pée‐sur‐NivelleFrance
- ARBRE – Agence de Recherche Pour la Biodiversité à La RéunionSaint‐GillesFrance
| |
Collapse
|
2
|
Sellinger T, Johannes F, Tellier A. Improved inference of population histories by integrating genomic and epigenomic data. eLife 2024; 12:RP89470. [PMID: 39264367 PMCID: PMC11392530 DOI: 10.7554/elife.89470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2024] Open
Abstract
With the availability of high-quality full genome polymorphism (SNPs) data, it becomes feasible to study the past demographic and selective history of populations in exquisite detail. However, such inferences still suffer from a lack of statistical resolution for recent, for example bottlenecks, events, and/or for populations with small nucleotide diversity. Additional heritable (epi)genetic markers, such as indels, transposable elements, microsatellites, or cytosine methylation, may provide further, yet untapped, information on the recent past population history. We extend the Sequential Markovian Coalescent (SMC) framework to jointly use SNPs and other hyper-mutable markers. We are able to (1) improve the accuracy of demographic inference in recent times, (2) uncover past demographic events hidden to SNP-based inference methods, and (3) infer the hyper-mutable marker mutation rates under a finite site model. As a proof of principle, we focus on demographic inference in Arabidopsis thaliana using DNA methylation diversity data from 10 European natural accessions. We demonstrate that segregating single methylated polymorphisms (SMPs) satisfy the modeling assumptions of the SMC framework, while differentially methylated regions (DMRs) are not suitable as their length exceeds that of the genomic distance between two recombination events. Combining SNPs and SMPs while accounting for site- and region-level epimutation processes, we provide new estimates of the glacial age bottleneck and post-glacial population expansion of the European A. thaliana population. Our SMC framework readily accounts for a wide range of heritable genomic markers, thus paving the way for next-generation inference of evolutionary history by combining information from several genetic and epigenetic markers.
Collapse
Affiliation(s)
- Thibaut Sellinger
- Professorship for Population Genetics, Department of Life Science Systems, Technical University of Munich, Munich, Germany
- Department of Environment and Biodiversity, Paris Lodron University of Salzburg, Salzburg, Austria
| | - Frank Johannes
- Professorship for Plant Epigenomics, Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany
| | - Aurélien Tellier
- Professorship for Population Genetics, Department of Life Science Systems, Technical University of Munich, Munich, Germany
| |
Collapse
|
3
|
Patil AB, Vijay N. Repetitive genomic regions and the inference of demographic history. Heredity (Edinb) 2021; 127:151-166. [PMID: 34002046 PMCID: PMC8322061 DOI: 10.1038/s41437-021-00443-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Revised: 04/16/2021] [Accepted: 04/17/2021] [Indexed: 02/03/2023] Open
Abstract
Inference of demographic histories using whole-genome datasets has provided insights into diversification, adaptation, hybridization, and plant-pathogen interactions, and stimulated debate on the impact of anthropogenic interventions and past climate on species demography. However, the impact of repetitive genomic regions on these inferences has mostly been ignored by masking of repeats. We use the Populus trichocarpa genome (Pop_tri_v3) to show that masking of repeat regions leads to lower estimates of effective population size (Ne) in the distant past in contrast to an increase in Ne estimates in recent times. However, in human datasets, masking of repeats resulted in lower estimates of Ne at all time points. We demonstrate that repeats affect demographic inferences using diverse methods like PSMC, MSMC, SMC++, and the Stairway plot. Our genomic analysis revealed that the biases in Ne estimates were dependent on the repeat class type and its abundance in each atomic interval. Notably, we observed a weak, yet consistently significant negative correlation between the repeat abundance of an atomic interval and the Ne estimates for that interval, which potentially reflects the recombination rate variation within the genome. The rationale for the masking of repeats has been that variants identified within these regions are erroneous. We find that polymorphisms in some repeat classes occur in callable regions and reflect reliable coalescence histories (e.g., LTR Gypsy, LTR Copia). The current demography inference methods do not handle repeats explicitly, and hence the effect of individual repeat classes needs careful consideration in comparative analysis. Deciphering the repeat demographic histories might provide a clear understanding of the processes involved in repeat accumulation.
Collapse
Affiliation(s)
- Ajinkya Bharatraj Patil
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal, Bhauri, Madhya Pradesh, India
| | - Nagarjun Vijay
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal, Bhauri, Madhya Pradesh, India.
| |
Collapse
|
4
|
Cappello L, Palacios JA. SEQUENTIAL IMPORTANCE SAMPLING FOR MULTIRESOLUTION KINGMAN-TAJIMA COALESCENT COUNTING. Ann Appl Stat 2021; 14:727-751. [PMID: 33995755 DOI: 10.1214/19-aoas1313] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Statistical inference of evolutionary parameters from molecular sequence data relies on coalescent models to account for the shared genealogical ancestry of the samples. However, inferential algorithms do not scale to available data sets. A strategy to improve computational efficiency is to rely on simpler coalescent and mutation models, resulting in smaller hidden state spaces. An estimate of the cardinality of the state-space of genealogical trees at different resolutions is essential to decide the best modeling strategy for a given dataset. To our knowledge, there is neither an exact nor approximate method to determine these cardinalities. We propose a sequential importance sampling algorithm to estimate the cardinality of the sample space of genealogical trees under different coalescent resolutions. Our sampling scheme proceeds sequentially across the set of combinatorial constraints imposed by the data, which in this work are completely linked sequences of DNA at a non recombining segment. We analyze the cardinality of different genealogical tree spaces on simulations to study the settings that favor coarser resolutions. We apply our method to estimate the cardinality of genealogical tree spaces from mtDNA data from the 1000 genomes and a sample from a Melanesian population at the β-globin locus.
Collapse
|
5
|
Sjödin P, McKenna J, Jakobsson M. Estimating divergence times from DNA sequences. Genetics 2021; 217:iyab008. [PMID: 33769498 PMCID: PMC8049563 DOI: 10.1093/genetics/iyab008] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 12/11/2020] [Indexed: 11/23/2022] Open
Abstract
The patterns of genetic variation within and among individuals and populations can be used to make inferences about the evolutionary forces that generated those patterns. Numerous population genetic approaches have been developed in order to infer evolutionary history. Here, we present the "Two-Two (TT)" and the "Two-Two-outgroup (TTo)" methods; two closely related approaches for estimating divergence time based in coalescent theory. They rely on sequence data from two haploid genomes (or a single diploid individual) from each of two populations. Under a simple population-divergence model, we derive the probabilities of the possible sample configurations. These probabilities form a set of equations that can be solved to obtain estimates of the model parameters, including population split times, directly from the sequence data. This transparent and computationally efficient approach to infer population divergence time makes it possible to estimate time scaled in generations (assuming a mutation rate), and not as a compound parameter of genetic drift. Using simulations under a range of demographic scenarios, we show that the method is relatively robust to migration and that the TTo method can alleviate biases that can appear from drastic ancestral population size changes. We illustrate the utility of the approaches with some examples, including estimating split times for pairs of human populations as well as providing further evidence for the complex relationship among Neandertals and Denisovans and their ancestors.
Collapse
Affiliation(s)
- Per Sjödin
- Human Evolution, Department of Organismal Biology, Uppsala University, Norbyvägen 18 A, Uppsala 752 36, Sweden
| | - James McKenna
- Human Evolution, Department of Organismal Biology, Uppsala University, Norbyvägen 18 A, Uppsala 752 36, Sweden
| | - Mattias Jakobsson
- Human Evolution, Department of Organismal Biology, Uppsala University, Norbyvägen 18 A, Uppsala 752 36, Sweden
- Science for Life Laboratory, Uppsala University, Norbyvägen 18 A, Uppsala 752 36, Sweden
| |
Collapse
|
6
|
García NC, Robinson WD. Current and Forthcoming Approaches for Benchmarking Genetic and Genomic Diversity. Front Ecol Evol 2021. [DOI: 10.3389/fevo.2021.622603] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The current attrition of biodiversity extends beyond loss of species and unique populations to steady loss of a vast genomic diversity that remains largely undescribed. Yet the accelerating development of new techniques allows us to survey entire genomes ever faster and cheaper, to obtain robust samples from a diversity of sources including degraded DNA and residual DNA in the environment, and to address conservation efforts in new and innovative ways. Here we review recent studies that highlight the importance of carefully considering where to prioritize collection of genetic samples (e.g., organisms in rapidly changing landscapes or along edges of geographic ranges) and what samples to collect and archive (e.g., from individuals of little-known subspecies or populations, even of species not currently considered endangered). Those decisions will provide the sample infrastructure to detect the disappearance of certain genotypes or gene complexes, increases in inbreeding levels, and loss of genomic diversity as environmental conditions change. Obtaining samples from currently endangered, protected, and rare species can be particularly difficult, thus we also focus on studies that use new, non-invasive ways of obtaining genomic samples and analyzing them in these cases where other sampling options are highly constrained. Finally, biological collections archiving such samples face an inherent contradiction: their main goal is to preserve biological material in good shape so it can be used for scientific research for centuries to come, yet the technologies that can make use of such materials are advancing faster than collections can change their standardized practices. Thus, we also discuss current and potential new practices in biological collections that might bolster their usefulness for future biodiversity conservation research.
Collapse
|
7
|
Klingler KB, Jahner JP, Parchman TL, Ray C, Peacock MM. Genomic variation in the American pika: signatures of geographic isolation and implications for conservation. BMC Ecol Evol 2021; 21:2. [PMID: 33514306 PMCID: PMC7853312 DOI: 10.1186/s12862-020-01739-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 12/17/2020] [Indexed: 01/12/2023] Open
Abstract
Background Distributional responses by alpine taxa to repeated, glacial-interglacial cycles throughout the last two million years have significantly influenced the spatial genetic structure of populations. These effects have been exacerbated for the American pika (Ochotona princeps), a small alpine lagomorph constrained by thermal sensitivity and a limited dispersal capacity. As a species of conservation concern, long-term lack of gene flow has important consequences for landscape genetic structure and levels of diversity within populations. Here, we use reduced representation sequencing (ddRADseq) to provide a genome-wide perspective on patterns of genetic variation across pika populations representing distinct subspecies. To investigate how landscape and environmental features shape genetic variation, we collected genetic samples from distinct geographic regions as well as across finer spatial scales in two geographically proximate mountain ranges of eastern Nevada. Results Our genome-wide analyses corroborate range-wide, mitochondrial subspecific designations and reveal pronounced fine-scale population structure between the Ruby Mountains and East Humboldt Range of eastern Nevada. Populations in Nevada were characterized by low genetic diversity (π = 0.0006–0.0009; θW = 0.0005–0.0007) relative to populations in California (π = 0.0014–0.0019; θW = 0.0011–0.0017) and the Rocky Mountains (π = 0.0025–0.0027; θW = 0.0021–0.0024), indicating substantial genetic drift in these isolated populations. Tajima’s D was positive for all sites (D = 0.240–0.811), consistent with recent contraction in population sizes range-wide. Conclusions Substantial influences of geography, elevation and climate variables on genetic differentiation were also detected and may interact with the regional effects of anthropogenic climate change to force the loss of unique genetic lineages through continued population extirpations in the Great Basin and Sierra Nevada.
Collapse
Affiliation(s)
| | - Joshua P Jahner
- Department of Biology, University of Nevada, Reno, 89557, USA
| | - Thomas L Parchman
- Department of Biology, University of Nevada, Reno, 89557, USA.,Program in Ecology, Evolution, and Conservation Biology, University of Nevada, Reno, NV, 89557, USA
| | - Chris Ray
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, CO, 80309-0334, USA
| | - Mary M Peacock
- Department of Biology, University of Nevada, Reno, 89557, USA. .,Program in Ecology, Evolution, and Conservation Biology, University of Nevada, Reno, NV, 89557, USA.
| |
Collapse
|
8
|
Parag KV, du Plessis L, Pybus OG. Jointly Inferring the Dynamics of Population Size and Sampling Intensity from Molecular Sequences. Mol Biol Evol 2020; 37:2414-2429. [PMID: 32003829 PMCID: PMC7403618 DOI: 10.1093/molbev/msaa016] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Estimating past population dynamics from molecular sequences that have been sampled longitudinally through time is an important problem in infectious disease epidemiology, molecular ecology, and macroevolution. Popular solutions, such as the skyline and skygrid methods, infer past effective population sizes from the coalescent event times of phylogenies reconstructed from sampled sequences but assume that sequence sampling times are uninformative about population size changes. Recent work has started to question this assumption by exploring how sampling time information can aid coalescent inference. Here, we develop, investigate, and implement a new skyline method, termed the epoch sampling skyline plot (ESP), to jointly estimate the dynamics of population size and sampling rate through time. The ESP is inspired by real-world data collection practices and comprises a flexible model in which the sequence sampling rate is proportional to the population size within an epoch but can change discontinuously between epochs. We show that the ESP is accurate under several realistic sampling protocols and we prove analytically that it can at least double the best precision achievable by standard approaches. We generalize the ESP to incorporate phylogenetic uncertainty in a new Bayesian package (BESP) in BEAST2. We re-examine two well-studied empirical data sets from virus epidemiology and molecular evolution and find that the BESP improves upon previous coalescent estimators and generates new, biologically useful insights into the sampling protocols underpinning these data sets. Sequence sampling times provide a rich source of information for coalescent inference that will become increasingly important as sequence collection intensifies and becomes more formalized.
Collapse
Affiliation(s)
- Kris V Parag
- Department of Zoology, University of Oxford, Oxford, United Kingdom
- Department of Infectious Disease Epidemiology, MRC Centre for Global Infectious Disease Analysis, Imperial College London, London, United Kingdom
| | - Louis du Plessis
- Department of Zoology, University of Oxford, Oxford, United Kingdom
| | - Oliver G Pybus
- Department of Zoology, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
9
|
Deep-Time Demographic Inference Suggests Ecological Release as Driver of Neoavian Adaptive Radiation. DIVERSITY-BASEL 2020. [DOI: 10.3390/d12040164] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Assessing the applicability of theory to major adaptive radiations in deep time represents an extremely difficult problem in evolutionary biology. Neoaves, which includes 95% of living birds, is believed to have undergone a period of rapid diversification roughly coincident with the Cretaceous–Paleogene (K-Pg) boundary. We investigate whether basal neoavian lineages experienced an ecological release in response to ecological opportunity, as evidenced by density compensation. We estimated effective population sizes (Ne) of basal neoavian lineages by combining coalescent branch lengths (CBLs) and the numbers of generations between successive divergences. We used a modified version of Accurate Species TRee Algorithm (ASTRAL) to estimate CBLs directly from insertion–deletion (indel) data, as well as from gene trees using DNA sequence and/or indel data. We found that some divergences near the K-Pg boundary involved unexpectedly high gene tree discordance relative to the estimated number of generations between speciation events. The simplest explanation for this result is an increase in Ne, despite the caveats discussed herein. It appears that at least some early neoavian lineages, similar to the ancestor of the clade comprising doves, mesites, and sandgrouse, experienced ecological release near the time of the K-Pg mass extinction.
Collapse
|
10
|
Inference of Historical Population-Size Changes with Allele-Frequency Data. G3-GENES GENOMES GENETICS 2020; 10:211-223. [PMID: 31699776 PMCID: PMC6945023 DOI: 10.1534/g3.119.400854] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
With up to millions of nearly neutral polymorphisms now being routinely sampled in population-genomic surveys, it is possible to estimate the site-frequency spectrum of such sites with high precision. Each frequency class reflects a mixture of potentially unique demographic histories, which can be revealed using theory for the probability distributions of the starting and ending points of branch segments over all possible coalescence trees. Such distributions are completely independent of past population history, which only influences the segment lengths, providing the basis for estimating average population sizes separating tree-wide coalescence events. The history of population-size change experienced by a sample of polymorphisms can then be dissected in a model-flexible fashion, and extension of this theory allows estimation of the mean and full distribution of long-term effective population sizes and ages of alleles of specific frequencies. Here, we outline the basic theory underlying the conceptual approach, develop and test an efficient statistical procedure for parameter estimation, and apply this to multiple population-genomic datasets for the microcrustacean Daphnia pulex.
Collapse
|
11
|
Parag KV, Pybus OG. Robust Design for Coalescent Model Inference. Syst Biol 2019; 68:730-743. [PMID: 30726979 DOI: 10.1093/sysbio/syz008] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2018] [Revised: 01/28/2019] [Accepted: 02/04/2019] [Indexed: 11/08/2023] Open
Abstract
The coalescent process describes how changes in the size or structure of a population influence the genealogical patterns of sequences sampled from that population. The estimation of (effective) population size changes from genealogies that are reconstructed from these sampled sequences is an important problem in many biological fields. Often, population size is characterized by a piecewise-constant function, with each piece serving as a population size parameter to be estimated. Estimation quality depends on both the statistical coalescent inference method employed, and on the experimental protocol, which controls variables such as the sampling of sequences through time and space, or the transformation of model parameters. While there is an extensive literature on coalescent inference methodology, there is comparatively little work on experimental design. The research that does exist is largely simulation-based, precluding the development of provable or general design theorems. We examine three key design problems: temporal sampling of sequences under the skyline demographic coalescent model, spatio-temporal sampling under the structured coalescent model, and time discretization for sequentially Markovian coalescent models. In all cases, we prove that 1) working in the logarithm of the parameters to be inferred (e.g., population size) and 2) distributing informative coalescent events uniformly among these log-parameters, is uniquely robust. "Robust" means that the total and maximum uncertainty of our parameter estimates are minimized, and made insensitive to their unknown (true) values. This robust design theorem provides rigorous justification for several existing coalescent experimental design decisions and leads to usable guidelines for future empirical or simulation-based investigations. Given its persistence among models, this theorem may form the basis of an experimental design paradigm for coalescent inference.
Collapse
Affiliation(s)
- Kris V Parag
- Department of Zoology, University of Oxford, Oxford OX1 3SY, UK
| | - Oliver G Pybus
- Department of Zoology, University of Oxford, Oxford OX1 3SY, UK
| |
Collapse
|
12
|
Khatri BS, Goldstein RA. Biophysics and population size constrains speciation in an evolutionary model of developmental system drift. PLoS Comput Biol 2019; 15:e1007177. [PMID: 31335870 PMCID: PMC6677325 DOI: 10.1371/journal.pcbi.1007177] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Revised: 08/02/2019] [Accepted: 06/13/2019] [Indexed: 02/06/2023] Open
Abstract
Developmental system drift is a likely mechanism for the origin of hybrid incompatibilities between closely related species. We examine here the detailed mechanistic basis of hybrid incompatibilities between two allopatric lineages, for a genotype-phenotype map of developmental system drift under stabilising selection, where an organismal phenotype is conserved, but the underlying molecular phenotypes and genotype can drift. This leads to number of emergent phenomenon not obtainable by modelling genotype or phenotype alone. Our results show that: 1) speciation is more rapid at smaller population sizes with a characteristic, Orr-like, power law, but at large population sizes slow, characterised by a sub-diffusive growth law; 2) the molecular phenotypes under weakest selection contribute to the earliest incompatibilities; and 3) pair-wise incompatibilities dominate over higher order, contrary to previous predictions that the latter should dominate. The population size effect we find is consistent with previous results on allopatric divergence of transcription factor-DNA binding, where smaller populations have common ancestors with a larger drift load because genetic drift favours phenotypes which have a larger number of genotypes (higher sequence entropy) over more fit phenotypes which have far fewer genotypes; this means less substitutions are required in either lineage before incompatibilities arise. Overall, our results indicate that biophysics and population size provide a much stronger constraint to speciation than suggested by previous models, and point to a general mechanistic principle of how incompatibilities arise the under stabilising selection for an organismal phenotype. The process of speciation is of fundamental importance to the field of evolution as it is intimately connected to understanding the immense bio-diversity of life. There is still relatively little understanding of the underlying genetic mechanisms that give rise to hybrid incompatibilities with results suggesting that divergence in transcription factor DNA binding and gene expression play an important role. A key finding from the field of evo-devo is that organismal phenotypes show developmental system drift, where species maintain the same phenotype, but diverge in developmental pathways; this is an important potential source of hybrid incompatibilities. Here, we explore a theoretical framework to understand how incompatibilities arise due to developmental system drift, using a tractable biophysically inspired genotype-phenotype for spatial gene expression. Modelling the evolution of phenotypes in this way has the key advantage that it mirrors how selection works in nature, i.e. that selection acts on phenotypes, but variation (mutation) arise at the level of genotypes. This results, as we demonstrate, in a number of non-trivial and testable predictions concerning speciation due to developmental system drift, which would not be obtainable by modelling evolution of genotypes or phenotypes alone.
Collapse
Affiliation(s)
| | - Richard A. Goldstein
- Division of Infection & Immunity, University College London, London, United Kingdom
| |
Collapse
|
13
|
Johndrow JE, Palacios JA. Exact limits of inference in coalescent models. Theor Popul Biol 2018; 125:75-93. [PMID: 30571959 PMCID: PMC6541399 DOI: 10.1016/j.tpb.2018.11.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2018] [Revised: 11/12/2018] [Accepted: 11/27/2018] [Indexed: 12/13/2022]
Abstract
Recovery of population size history from molecular sequence data is an important problem in population genetics. Inference commonly relies on a coalescent model linking the population size history to genealogies. The high computational cost of estimating parameters from these models usually compels researchers to select a subset of the available data or to rely on insufficient summary statistics for statistical inference. We consider the problem of recovering the true population size history from two possible alternatives on the basis of coalescent time data previously considered by Kim et al. (2015). We improve upon previous results by giving exact expressions for the probability of correctly distinguishing between the two hypotheses as a function of the separation between the alternative size histories, the number of individuals, loci, and the sampling times. In more complicated settings we estimate the exact probability of correct recovery by Monte Carlo simulation. Our results give considerably more pessimistic inferential limits than those previously reported. We also extended our analyses to pairwise SMC and SMC’ models of recombination. This work is relevant for optimal design when the inference goal is to test scientific hypotheses about population size trajectories in coalescent models with and without recombination.
Collapse
|
14
|
Waltoft BL, Hobolth A. Non-parametric estimation of population size changes from the site frequency spectrum. Stat Appl Genet Mol Biol 2018; 17:sagmb-2017-0061. [PMID: 29886455 DOI: 10.1515/sagmb-2017-0061] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Changes in population size is a useful quantity for understanding the evolutionary history of a species. Genetic variation within a species can be summarized by the site frequency spectrum (SFS). For a sample of size n, the SFS is a vector of length n - 1 where entry i is the number of sites where the mutant base appears i times and the ancestral base appears n - i times. We present a new method, CubSFS, for estimating the changes in population size of a panmictic population from an observed SFS. First, we provide a straightforward proof for the expression of the expected site frequency spectrum depending only on the population size. Our derivation is based on an eigenvalue decomposition of the instantaneous coalescent rate matrix. Second, we solve the inverse problem of determining the changes in population size from an observed SFS. Our solution is based on a cubic spline for the population size. The cubic spline is determined by minimizing the weighted average of two terms, namely (i) the goodness of fit to the observed SFS, and (ii) a penalty term based on the smoothness of the changes. The weight is determined by cross-validation. The new method is validated on simulated demographic histories and applied on unfolded and folded SFS from 26 different human populations from the 1000 Genomes Project.
Collapse
Affiliation(s)
- Berit Lindum Waltoft
- Bioinformatics Research Centre, Aarhus University, C.F. Møllers allé 8, 8000 Aarhus C, Denmark, Phone: +45 87165763.,National Centre for Register-based Research, Department of Economics and Business, Aarhus University, Fuglesangs allé 26, 8210 Aarhus V, Denmark.,The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
| | - Asger Hobolth
- Bioinformatics Research Centre, Aarhus University, 8000 Aarhus C, Denmark
| |
Collapse
|
15
|
Abstract
Helicoverpa armigera is a major agricultural and horticultural pest that recently spread from its historical distribution throughout much of the Old World to the Americas, where it is already causing hundreds of millions of dollars in damage every year. The species is notoriously quick to generate and disseminate pesticide resistance throughout its range and has a wider host range than the native Helicoverpa zea. Hybridization between the two species increases the opportunity for novel, agriculturally problematic ecotypes to emerge and spread through the Americas. Within the mega-pest lineage of heliothine moths are a number of polyphagous, highly mobile species for which the exchange of adaptive traits through hybridization would affect their properties as pests. The recent invasion of South America by one of the most significant agricultural pests, Helicoverpa armigera, raises concerns for the formation of novel combinations of adaptive genes following hybridization with the closely related Helicoverpa zea. To investigate the propensity for hybridization within the genus Helicoverpa, we carried out whole-genome resequencing of samples from six species, focusing in particular upon H. armigera population structure and its relationship with H. zea. We show that both H. armigera subspecies have greater genetic diversity and effective population sizes than do the other species. We find no signals for gene flow among the six species, other than between H. armigera and H. zea, with nine Brazilian individuals proving to be hybrids of those two species. Eight had largely H. armigera genomes with some introgressed DNA from H. zea scattered throughout. The ninth resembled an F1 hybrid but with stretches of homozygosity for each parental species that reflect previous hybridization. Regions homozygous for H. armigera-derived DNA in this individual included one containing a gustatory receptor and esterase genes previously associated with host range, while another encoded a cytochrome P450 that confers insecticide resistance. Our data point toward the emergence of novel hybrid ecotypes and highlight the importance of monitoring H. armigera genotypes as they spread through the Americas.
Collapse
|
16
|
Range Expansion Compromises Adaptive Evolution in an Outcrossing Plant. Curr Biol 2017; 27:2544-2551.e4. [DOI: 10.1016/j.cub.2017.07.007] [Citation(s) in RCA: 56] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Revised: 05/22/2017] [Accepted: 07/04/2017] [Indexed: 01/04/2023]
|
17
|
Weissman DB, Hallatschek O. Minimal-assumption inference from population-genomic data. eLife 2017; 6:e24836. [PMID: 28671549 PMCID: PMC5515583 DOI: 10.7554/elife.24836] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2017] [Accepted: 07/01/2017] [Indexed: 01/01/2023] Open
Abstract
Samples of multiple complete genome sequences contain vast amounts of information about the evolutionary history of populations, much of it in the associations among polymorphisms at different loci. We introduce a method, Minimal-Assumption Genomic Inference of Coalescence (MAGIC), that reconstructs key features of the evolutionary history, including the distribution of coalescence times, by integrating information across genomic length scales without using an explicit model of coalescence or recombination, allowing it to analyze arbitrarily large samples without phasing while making no assumptions about ancestral structure, linked selection, or gene conversion. Using simulated data, we show that the performance of MAGIC is comparable to that of PSMC' even on single diploid samples generated with standard coalescent and recombination models. Applying MAGIC to a sample of human genomes reveals evidence of non-demographic factors driving coalescence.
Collapse
Affiliation(s)
- Daniel B Weissman
- Department of Physics, Emory University, Atlanta, United States
- Department of Physics and Integrative Biology, University of California, Berkeley, Berkeley, United States
| | - Oskar Hallatschek
- Department of Physics and Integrative Biology, University of California, Berkeley, Berkeley, United States
| |
Collapse
|