1
|
Rouzine IM. Long-range linkage effects in adapting sexual populations. Sci Rep 2023; 13:12492. [PMID: 37528175 PMCID: PMC10393966 DOI: 10.1038/s41598-023-39392-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 07/25/2023] [Indexed: 08/03/2023] Open
Abstract
In sexual populations, closely-situated genes have linked evolutionary fates, while genes spaced far in genome are commonly thought to evolve independently due to recombination. In the case where evolution depends essentially on supply of new mutations, this assumption has been confirmed by mathematical modeling. Here I examine it in the case of pre-existing genetic variation, where mutation is not important. A haploid population with [Formula: see text] genomes, [Formula: see text] loci, a fixed selection coefficient, and a small initial frequency of beneficial alleles [Formula: see text] is simulated by a Monte-Carlo algorithm. When the number of loci, L, is larger than a critical value of [Formula: see text] simulation demonstrates a host of linkage effects that decrease neither with the distance between loci nor the number of recombination crossovers. Due to clonal interference, the beneficial alleles become extinct at a fraction of loci [Formula: see text]. Due to a genetic background effect, the substitution rate varies broadly between loci, with the fastest value exceeding the one-locus limit by the factor of [Formula: see text] Thus, the far-situated parts of a long genome in a sexual population do not evolve as independent blocks. A potential link between these findings and the emergence of new Variants of Concern of SARS-CoV-2 is discussed.
Collapse
Affiliation(s)
- Igor M Rouzine
- Sechenov Institute of Evolutionary Physiology and Biochemistry, Russian Academy of Sciences, Saint-Petersburg, Russia, 194223.
| |
Collapse
|
2
|
Rouzine IM, Rozhnova G. Evolutionary implications of SARS-CoV-2 vaccination for the future design of vaccination strategies. COMMUNICATIONS MEDICINE 2023; 3:86. [PMID: 37336956 DOI: 10.1038/s43856-023-00320-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Accepted: 06/07/2023] [Indexed: 06/21/2023] Open
Abstract
Once the first SARS-CoV-2 vaccine became available, mass vaccination was the main pillar of the public health response to the COVID-19 pandemic. It was very effective in reducing hospitalizations and deaths. Here, we discuss the possibility that mass vaccination might accelerate SARS-CoV-2 evolution in antibody-binding regions compared to natural infection at the population level. Using the evidence of strong genetic variation in antibody-binding regions and taking advantage of the similarity between the envelope proteins of SARS-CoV-2 and influenza, we assume that immune selection pressure acting on these regions of the two viruses is similar. We discuss the consequences of this assumption for SARS-CoV-2 evolution in light of mathematical models developed previously for influenza. We further outline the implications of this phenomenon, if our assumptions are confirmed, for the future design of SARS-CoV-2 vaccination strategies.
Collapse
Affiliation(s)
- Igor M Rouzine
- Immunogenetics, Sechenov Institute of Evolutionary Physiology and Biochemistry of Russian Academy of Sciences, Saint-Petersburg, Russia.
| | - Ganna Rozhnova
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands.
- BioISI - Biosystems & Integrative Sciences Institute, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal.
- Center for Complex Systems Studies (CCSS), Utrecht University, Utrecht, The Netherlands.
| |
Collapse
|
3
|
Sohail MS, Louie RHY, Hong Z, Barton JP, McKay MR. Inferring Epistasis from Genetic Time-series Data. Mol Biol Evol 2022; 39:6710201. [PMID: 36130322 PMCID: PMC9558069 DOI: 10.1093/molbev/msac199] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Epistasis refers to fitness or functional effects of mutations that depend on the sequence background in which these mutations arise. Epistasis is prevalent in nature, including populations of viruses, bacteria, and cancers, and can contribute to the evolution of drug resistance and immune escape. However, it is difficult to directly estimate epistatic effects from sampled observations of a population. At present, there are very few methods that can disentangle the effects of selection (including epistasis), mutation, recombination, genetic drift, and genetic linkage in evolving populations. Here we develop a method to infer epistasis, along with the fitness effects of individual mutations, from observed evolutionary histories. Simulations show that we can accurately infer pairwise epistatic interactions provided that there is sufficient genetic diversity in the data. Our method also allows us to identify which fitness parameters can be reliably inferred from a particular data set and which ones are unidentifiable. Our approach therefore allows for the inference of more complex models of selection from time-series genetic data, while also quantifying uncertainty in the inferred parameters.
Collapse
Affiliation(s)
- Muhammad Saqib Sohail
- Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong SAR, People’s Republic of China
| | - Raymond H Y Louie
- The Kirby Institute, University of New South Wales, Sydney, New South Wales, Australia
| | - Zhenchen Hong
- Department of Physics and Astronomy, University of California, Riverside, CA, USA
| | | | | |
Collapse
|
4
|
Stolyarova AV, Neretina TV, Zvyagina EA, Fedotova AV, Kondrashov A, Bazykin GA. Complex fitness landscape shapes variation in a hyperpolymorphic species. eLife 2022; 11:76073. [PMID: 35532122 PMCID: PMC9187340 DOI: 10.7554/elife.76073] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 05/09/2022] [Indexed: 11/13/2022] Open
Abstract
It is natural to assume that patterns of genetic variation in hyperpolymorphic species can reveal large-scale properties of the fitness landscape that are hard to detect by studying species with ordinary levels of genetic variation. Here, we study such patterns in a fungus Schizophyllum commune, the most polymorphic species known. Throughout the genome, short-range linkage disequilibrium (LD) caused by attraction of minor alleles is higher between pairs of nonsynonymous than of synonymous variants. This effect is especially pronounced for pairs of sites that are located within the same gene, especially if a large fraction of the gene is covered by haploblocks, genome segments where the gene pool consists of two highly divergent haplotypes, which is a signature of balancing selection. Haploblocks are usually shorter than 1000 nucleotides, and collectively cover about 10% of the S. commune genome. LD tends to be substantially higher for pairs of nonsynonymous variants encoding amino acids that interact within the protein. There is a substantial correlation between LDs at the same pairs of nonsynonymous mutations in the USA and the Russian populations. These patterns indicate that selection in S. commune involves positive epistasis due to compensatory interactions between nonsynonymous alleles. When less polymorphic species are studied, analogous patterns can be detected only through interspecific comparisons. Changes to DNA known as mutations may alter how the proteins and other components of a cell work, and thus play an important role in allowing living things to evolve new traits and abilities over many generations. Whether a mutation is beneficial or harmful may differ depending on the genetic background of the individual – that is, depending on other mutations present in other positions within the same gene – due to a phenomenon called epistasis. Epistasis is known to affect how various species accumulate differences in their DNA compared to each other over time. For example, a mutation that is rare in humans and known to cause disease may be widespread in other primates because its negative effect is canceled out by another mutation that is standard for these species but absent in humans. However, it remains unclear whether epistasis plays a significant part in shaping genetic differences between individuals of the same species. A type of fungus known as Schizophyllum commune lives on rotting wood and is found across the world. It is one of the most genetically diverse species currently known, so there is a higher chance of pairs of compensatory mutations occurring and persisting for a long time in S. commune than in most other species, providing a unique opportunity to study epistasis. Here, Stolyarova et al. studied two distinct populations of S. commune, one from the USA and one from Russia. The team found that – unlike in humans, flies and other less genetically diverse species – epistasis maintains combinations of mutations in S. commune that individually would be harmful to the fungus but together compensate for each other. For example, pairs of mutations affecting specific molecules known as amino acids – the building blocks of proteins – that physically interact with each other tended to be found together in the same individuals. One potential downside of having pairs of compensatory mutations in the genome is that when the organism reproduces, the process of making sex cells may split up these pairs so that harmful mutations are inherited without their partner mutations. Thus, epistasis may have helped shape the way S. commune and other genetically diverse species have evolved.
Collapse
Affiliation(s)
| | - Tatiana V Neretina
- Biological Faculty, Lomonosov Moscow State University, Moscow, Russian Federation
| | - Elena A Zvyagina
- Biological Faculty, Lomonosov Moscow State University, Moscow, Russian Federation
| | - Anna V Fedotova
- Skolkovo Institute of Science and Technology, Moscow, Russian Federation
| | - Alexey Kondrashov
- Department of Ecology and Evolutionary Biology, University of Michigan-Ann Arbor, Ann Arbor, United States
| | - Georgii A Bazykin
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russian Federation
| |
Collapse
|
5
|
Bohutínská M, Handrick V, Yant L, Schmickl R, Kolář F, Bomblies K, Paajanen P. De Novo Mutation and Rapid Protein (Co-)evolution during Meiotic Adaptation in Arabidopsis arenosa. Mol Biol Evol 2021; 38:1980-1994. [PMID: 33502506 PMCID: PMC8097281 DOI: 10.1093/molbev/msab001] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
A sudden shift in environment or cellular context necessitates rapid adaptation. A dramatic example is genome duplication, which leads to polyploidy. In such situations, the waiting time for new mutations might be prohibitive; theoretical and empirical studies suggest that rapid adaptation will largely rely on standing variation already present in source populations. Here, we investigate the evolution of meiosis proteins in Arabidopsis arenosa, some of which were previously implicated in adaptation to polyploidy, and in a diploid, habitat. A striking and unexplained feature of prior results was the large number of amino acid changes in multiple interacting proteins, especially in the relatively young tetraploid. Here, we investigate whether selection on meiosis genes is found in other lineages, how the polyploid may have accumulated so many differences, and whether derived variants were selected from standing variation. We use a range-wide sample of 145 resequenced genomes of diploid and tetraploid A. arenosa, with new genome assemblies. We confirmed signals of positive selection in the polyploid and diploid lineages they were previously reported in and find additional meiosis genes with evidence of selection. We show that the polyploid lineage stands out both qualitatively and quantitatively. Compared with diploids, meiosis proteins in the polyploid have more amino acid changes and a higher proportion affecting more strongly conserved sites. We find evidence that in tetraploids, positive selection may have commonly acted on de novo mutations. Several tests provide hints that coevolution, and in some cases, multinucleotide mutations, might contribute to rapid accumulation of changes in meiotic proteins.
Collapse
Affiliation(s)
- Magdalena Bohutínská
- Department of Botany, Faculty of Science, Charles University, Prague, Czech Republic.,Institute of Botany of the Czech Academy of Sciences, Průhonice, Czech Republic
| | - Vinzenz Handrick
- Department of Cell and Developmental Biology, John Innes Centre, Norwich, United Kingdom
| | - Levi Yant
- Department of Cell and Developmental Biology, John Innes Centre, Norwich, United Kingdom
| | - Roswitha Schmickl
- Department of Botany, Faculty of Science, Charles University, Prague, Czech Republic.,Institute of Botany of the Czech Academy of Sciences, Průhonice, Czech Republic
| | - Filip Kolář
- Department of Botany, Faculty of Science, Charles University, Prague, Czech Republic.,Institute of Botany of the Czech Academy of Sciences, Průhonice, Czech Republic.,Department of Botany, University of Innsbruck, Innsbruck, Austria
| | - Kirsten Bomblies
- Department of Cell and Developmental Biology, John Innes Centre, Norwich, United Kingdom.,Plant Evolutionary Genetics, Department of Biology, Institute of Molecular Plant Biology, ETH Zürich, Zurich, Switzerland
| | - Pirita Paajanen
- Department of Cell and Developmental Biology, John Innes Centre, Norwich, United Kingdom
| |
Collapse
|
6
|
Garcia JA, Lohmueller KE. Negative linkage disequilibrium between amino acid changing variants reveals interference among deleterious mutations in the human genome. PLoS Genet 2021; 17:e1009676. [PMID: 34319975 PMCID: PMC8351996 DOI: 10.1371/journal.pgen.1009676] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 08/09/2021] [Accepted: 06/22/2021] [Indexed: 11/18/2022] Open
Abstract
Evolutionary forces like Hill-Robertson interference and negative epistasis can lead to deleterious mutations being found on distinct haplotypes. However, the extent to which these forces depend on the selection and dominance coefficients of deleterious mutations and shape genome-wide patterns of linkage disequilibrium (LD) in natural populations with complex demographic histories has not been tested. In this study, we first used forward-in-time simulations to predict how negative selection impacts LD. Under models where deleterious mutations have additive effects on fitness, deleterious variants less than 10 kb apart tend to be carried on different haplotypes relative to pairs of synonymous SNPs. In contrast, for recessive mutations, there is no consistent ordering of how selection coefficients affect LD decay, due to the complex interplay of different evolutionary effects. We then examined empirical data of modern humans from the 1000 Genomes Project. LD between derived alleles at nonsynonymous SNPs is lower compared to pairs of derived synonymous variants, suggesting that nonsynonymous derived alleles tend to occur on different haplotypes more than synonymous variants. This result holds when controlling for potential confounding factors by matching SNPs for frequency in the sample (allele count), physical distance, magnitude of background selection, and genetic distance between pairs of variants. Lastly, we introduce a new statistic HR(j) which allows us to detect interference using unphased genotypes. Application of this approach to high-coverage human genome sequences confirms our finding that nonsynonymous derived alleles tend to be located on different haplotypes more often than are synonymous derived alleles. Our findings suggest that interference may play a pervasive role in shaping patterns of LD between deleterious variants in the human genome, and consequently influences genome-wide patterns of LD.
Collapse
Affiliation(s)
- Jesse A. Garcia
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California, United States of America
| | - Kirk E. Lohmueller
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California, United States of America
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, United States of America
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California, United States of America
| |
Collapse
|
7
|
Pedruzzi G, Rouzine IM. An evolution-based high-fidelity method of epistasis measurement: Theory and application to influenza. PLoS Pathog 2021; 17:e1009669. [PMID: 34153082 PMCID: PMC8248644 DOI: 10.1371/journal.ppat.1009669] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 07/01/2021] [Accepted: 05/25/2021] [Indexed: 12/18/2022] Open
Abstract
Linkage effects in a multi-locus population strongly influence its evolution. The models based on the traveling wave approach enable us to predict the average speed of evolution and the statistics of phylogeny. However, predicting statistically the evolution of specific sites and pairs of sites in the multi-locus context remains a mathematical challenge. In particular, the effects of epistasis, the interaction of gene regions contributing to phenotype, is difficult to predict theoretically and detect experimentally in sequence data. A large number of false-positive interactions arises from stochastic linkage effects and indirect interactions, which mask true epistatic interactions. Here we develop a proof-of-principle method to filter out false-positive interactions. We start by demonstrating that the averaging of haplotype frequencies over multiple independent populations is necessary but not sufficient for epistatic detection, because it still leaves high numbers of false-positive interactions. To compensate for the residual stochastic noise, we develop a three-way haplotype method isolating true interactions. The fidelity of the method is confirmed analytically and on simulated genetic sequences evolved with a known epistatic network. The method is then applied to a large sequence database of neurominidase protein of influenza A H1N1 obtained from various geographic locations to infer the epistatic network responsible for the difference between the pre-pandemic virus and the pandemic strain of 2009. These results present a simple and reliable technique to measure epistatic interactions of any sign from sequence data. Interactions between genomic sites create a fitness landscape. The knowledge of topology and strength of interactions is vital for predicting the escape of viruses from drugs and immune response and their passing through fitness valleys. Many efforts have been invested into measuring these interactions from DNA sequence sets. Unfortunately, reproducibility of the results remains low due partly to a very small fraction of interaction pairs and partly to stochastic linkage noise masking true interactions. Here we propose a method to separate stochastic linkage and indirect interactions from epistatic interactions and apply it to influenza virus sequence data.
Collapse
Affiliation(s)
- Gabriele Pedruzzi
- Sorbonne Université, Institute de Biologie Paris-Seine, Laboratoire de Biologie Computationelle et Quantitative LCQB, Paris, France
| | - Igor M. Rouzine
- Sorbonne Université, Institute de Biologie Paris-Seine, Laboratoire de Biologie Computationelle et Quantitative LCQB, Paris, France
- * E-mail:
| |
Collapse
|
8
|
Barlukova A, Rouzine IM. The evolutionary origin of the universal distribution of mutation fitness effect. PLoS Comput Biol 2021; 17:e1008822. [PMID: 33684109 PMCID: PMC7971868 DOI: 10.1371/journal.pcbi.1008822] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Revised: 03/18/2021] [Accepted: 02/19/2021] [Indexed: 01/27/2023] Open
Abstract
An intriguing fact long defying explanation is the observation of a universal exponential distribution of beneficial mutations in fitness effect for different microorganisms. To explain this effect, we use a population model including mutation, directional selection, linkage, and genetic drift. The multiple-mutation regime of adaptation at large population sizes (traveling wave regime) is considered. We demonstrate analytically and by simulation that, regardless of the inherent distribution of mutation fitness effect across genomic sites, an exponential distribution of fitness effects emerges in the long term. This result follows from the exponential statistics of the frequency of the less-fit alleles, f, that we predict to evolve, in the long term, for both polymorphic and monomorphic sites. We map the logarithmic slope of the distribution onto the previously derived fixation probability and demonstrate that it increases linearly in time. Our results demonstrate a striking difference between the distribution of fitness effects observed experimentally for naturally occurring mutations, and the "inherent" distribution obtained in a directed-mutagenesis experiment, which can have any shape depending on the organism. Based on these results, we develop a new method to measure the fitness effect of mutations for each variable residue using DNA sequences sampled from adapting populations. This new method is not sensitive to linkage effects and does not require the one-site model assumptions.
Collapse
Affiliation(s)
- Ayuna Barlukova
- Sorbonne Université, Institute de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative, Paris, France
| | - Igor M. Rouzine
- Sorbonne Université, Institute de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative, Paris, France
- * E-mail: ,
| |
Collapse
|
9
|
Rouzine IM. An Evolutionary Model of Progression to AIDS. Microorganisms 2020; 8:microorganisms8111714. [PMID: 33142907 PMCID: PMC7692852 DOI: 10.3390/microorganisms8111714] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 10/30/2020] [Accepted: 10/30/2020] [Indexed: 11/16/2022] Open
Abstract
The time to the onset of AIDS symptoms in an HIV infected individual is known to correlate inversely with viremia and the level of immune activation. The correlation exists against the background of strong individual fluctuations demonstrating the existence of hidden variables depending on patient and virus parameters. At the moment, prognosis of the time to AIDS based on patient parameters is not possible. In addition, it is of paramount importance to understand the reason of progression to AIDS in untreated patients to be able to learn to control it by means other than anti-retroviral therapy. Here we develop a mechanistic mathematical model to predict the speed of progression to AIDS in individual untreated patients and patients treated with suboptimal therapy, based on a single-time measurement of several virological and immunological parameters. We show that the gradual increase in virus fitness during a chronic infection causes slow gradual depletion of CD4 T cells. Using the existing evolution models of HIV, we obtain general expressions predicting the time to the onset of AIDS symptoms in terms of the patient parameters, for low-viremia and high-viremia patients separately. We show that the evolution model of AIDS fits the existing data on virus-time correlations better than the alternative model of the deregulation of homeostatic response.
Collapse
Affiliation(s)
- Igor M Rouzine
- Laboratory of Computational and Quantitative Biology, 7238 CNRS-UPMC, Institut Biologie Paris-Seine, Sorbonne Université, Campus Pierre et Marie Curie, 75005 Paris, France
| |
Collapse
|
10
|
Aris-Brosou S, Parent L, Ibeh N. Viral Long-Term Evolutionary Strategies Favor Stability over Proliferation. Viruses 2019; 11:v11080677. [PMID: 31344814 PMCID: PMC6722887 DOI: 10.3390/v11080677] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2019] [Revised: 07/12/2019] [Accepted: 07/20/2019] [Indexed: 02/01/2023] Open
Abstract
Viruses are known to have some of the highest and most diverse mutation rates found in any biological replicator, with single-stranded (ss) RNA viruses evolving the fastest, and double-stranded (ds) DNA viruses having rates approaching those of bacteria. As mutation rates are tightly and negatively correlated with genome size, selection is a clear driver of viral evolution. However, the role of intragenomic interactions as drivers of viral evolution is still unclear. To understand how these two processes affect the long-term evolution of viruses infecting humans, we comprehensively analyzed ssRNA, ssDNA, dsRNA, and dsDNA viruses, to find which virus types and which functions show evidence for episodic diversifying selection and correlated evolution. We show that selection mostly affects single stranded viruses, that correlated evolution is more prevalent in DNA viruses, and that both processes, taken independently, mostly affect viral replication. However, the genes that are jointly affected by both processes are involved in key aspects of their life cycle, favoring viral stability over proliferation. We further show that both evolutionary processes are intimately linked at the amino acid level, which suggests that it is the joint action of selection and correlated evolution, and not just selection, that shapes the evolutionary trajectories of viruses—and possibly of their epidemiological potential.
Collapse
Affiliation(s)
- Stéphane Aris-Brosou
- Department of Biology, University of Ottawa, Ottawa, ON K1N 6N5, Canada.
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, ON K1N 6N5, Canada.
| | - Louis Parent
- Department of Biology, University of Ottawa, Ottawa, ON K1N 6N5, Canada
| | - Neke Ibeh
- Department of Biology, University of Ottawa, Ottawa, ON K1N 6N5, Canada
| |
Collapse
|
11
|
Epistasis detectably alters correlations between genomic sites in a narrow parameter window. PLoS One 2019; 14:e0214036. [PMID: 31150393 PMCID: PMC6544209 DOI: 10.1371/journal.pone.0214036] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Accepted: 05/18/2019] [Indexed: 01/12/2023] Open
Abstract
Different genomic sites evolve inter-dependently due to the combined action of epistasis, defined as a non-multiplicative contribution of alleles at different loci to genome fitness, and the physical linkage of different loci in genome. Both epistasis and linkage, partially compensated by recombination, cause correlations between allele frequencies at the loci (linkage disequilibrium, LD). The interaction and competition between epistasis and linkage are not fully understood, nor is their relative sensitivity to recombination. Modeling an adapting population in the presence of random mutation, natural selection, pairwise epistasis, and random genetic drift, we compare the contributions of epistasis and linkage. For this end, we use a panel of haplotype-based measures of LD and their various combinations calculated for epistatic and non-epistatic pairs separately. We compute the optimal percentages of detected and false positive pairs in a one-time sample of a population of moderate size. We demonstrate that true interacting pairs can be told apart in a sufficiently short genome within a narrow window of time and parameters. Outside of this parameter region, unless the population is extremely large, shared ancestry of individual sequences generates pervasive stochastic LD for non-interacting pairs masking true epistatic associations. In the presence of sufficiently strong recombination, linkage effects decrease faster than those of epistasis, and the detection of epistasis improves. We demonstrate that the epistasis component of locus association can be isolated, at a single time point, by averaging haplotype frequencies over multiple independent populations. These results demonstrate the existence of fundamental restrictions on the protocols for detecting true interactions in DNA sequence sets.
Collapse
|