1
|
Marsh JI, Johri P. Biases in ARG-Based Inference of Historical Population Size in Populations Experiencing Selection. Mol Biol Evol 2024; 41:msae118. [PMID: 38874402 PMCID: PMC11245712 DOI: 10.1093/molbev/msae118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 06/05/2024] [Accepted: 06/11/2024] [Indexed: 06/15/2024] Open
Abstract
Inferring the demographic history of populations provides fundamental insights into species dynamics and is essential for developing a null model to accurately study selective processes. However, background selection and selective sweeps can produce genomic signatures at linked sites that mimic or mask signals associated with historical population size change. While the theoretical biases introduced by the linked effects of selection have been well established, it is unclear whether ancestral recombination graph (ARG)-based approaches to demographic inference in typical empirical analyses are susceptible to misinference due to these effects. To address this, we developed highly realistic forward simulations of human and Drosophila melanogaster populations, including empirically estimated variability of gene density, mutation rates, recombination rates, purifying, and positive selection, across different historical demographic scenarios, to broadly assess the impact of selection on demographic inference using a genealogy-based approach. Our results indicate that the linked effects of selection minimally impact demographic inference for human populations, although it could cause misinference in populations with similar genome architecture and population parameters experiencing more frequent recurrent sweeps. We found that accurate demographic inference of D. melanogaster populations by ARG-based methods is compromised by the presence of pervasive background selection alone, leading to spurious inferences of recent population expansion, which may be further worsened by recurrent sweeps, depending on the proportion and strength of beneficial mutations. Caution and additional testing with species-specific simulations are needed when inferring population history with non-human populations using ARG-based approaches to avoid misinference due to the linked effects of selection.
Collapse
Affiliation(s)
- Jacob I Marsh
- Department of Biology, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Parul Johri
- Department of Biology, University of North Carolina, Chapel Hill, NC 27599, USA
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
- Integrative Program for Biological and Genome Sciences, University of North Carolina, Chapel Hill, NC 27599, USA
| |
Collapse
|
2
|
Rodrigues MF, Kern AD, Ralph PL. Shared evolutionary processes shape landscapes of genomic variation in the great apes. Genetics 2024; 226:iyae006. [PMID: 38242701 PMCID: PMC10990428 DOI: 10.1093/genetics/iyae006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 10/26/2023] [Accepted: 01/03/2024] [Indexed: 01/21/2024] Open
Abstract
For at least the past 5 decades, population genetics, as a field, has worked to describe the precise balance of forces that shape patterns of variation in genomes. The problem is challenging because modeling the interactions between evolutionary processes is difficult, and different processes can impact genetic variation in similar ways. In this paper, we describe how diversity and divergence between closely related species change with time, using correlations between landscapes of genetic variation as a tool to understand the interplay between evolutionary processes. We find strong correlations between landscapes of diversity and divergence in a well-sampled set of great ape genomes, and explore how various processes such as incomplete lineage sorting, mutation rate variation, GC-biased gene conversion and selection contribute to these correlations. Through highly realistic, chromosome-scale, forward-in-time simulations, we show that the landscapes of diversity and divergence in the great apes are too well correlated to be explained via strictly neutral processes alone. Our best fitting simulation includes both deleterious and beneficial mutations in functional portions of the genome, in which 9% of fixations within those regions is driven by positive selection. This study provides a framework for modeling genetic variation in closely related species, an approach which can shed light on the complex balance of forces that have shaped genetic variation.
Collapse
Affiliation(s)
- Murillo F Rodrigues
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
- Department of Biology, University of Oregon, Eugene, OR 97403, USA
| | - Andrew D Kern
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
- Department of Biology, University of Oregon, Eugene, OR 97403, USA
| | - Peter L Ralph
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
- Department of Biology, University of Oregon, Eugene, OR 97403, USA
- Department of Mathematics, University of Oregon, Eugene, OR 97403, USA
| |
Collapse
|
3
|
Pivirotto AM, Platt A, Patel R, Kumar S, Hey J. Analyses of allele age and fitness impact reveal human beneficial alleles to be older than neutral controls. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.09.561569. [PMID: 37873438 PMCID: PMC10592680 DOI: 10.1101/2023.10.09.561569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
A classic population genetic prediction is that alleles experiencing directional selection should swiftly traverse allele frequency space, leaving detectable reductions in genetic variation in linked regions. However, despite this expectation, identifying clear footprints of beneficial allele passage has proven to be surprisingly challenging. We addressed the basic premise underlying this expectation by estimating the ages of large numbers of beneficial and deleterious alleles in a human population genomic data set. Deleterious alleles were found to be young, on average, given their allele frequency. However, beneficial alleles were older on average than non-coding, non-regulatory alleles of the same frequency. This finding is not consistent with directional selection and instead indicates some type of balancing selection. Among derived beneficial alleles, those fixed in the population show higher local recombination rates than those still segregating, consistent with a model in which new beneficial alleles experience an initial period of balancing selection due to linkage disequilibrium with deleterious recessive alleles. Alleles that ultimately fix following a period of balancing selection will leave a modest 'soft' sweep impact on the local variation, consistent with the overall paucity of species-wide 'hard' sweeps in human genomes.
Collapse
Affiliation(s)
| | - Alexander Platt
- Temple University, Department of Biology, Philadelphia PA 19122, USA
- University of Pennsylvania, Department of Genetics, Philadelphia PA 19104, USA
| | - Ravi Patel
- Temple University, Department of Biology, Philadelphia PA 19122, USA
- Institute for Genomics and Evolutionary Medicine, Temple University, PA 19122, USA
| | - Sudhir Kumar
- Temple University, Department of Biology, Philadelphia PA 19122, USA
- Institute for Genomics and Evolutionary Medicine, Temple University, PA 19122, USA
| | - Jody Hey
- Temple University, Department of Biology, Philadelphia PA 19122, USA
| |
Collapse
|
4
|
Zhao S, Chi L, Chen H. CEGA: a method for inferring natural selection by comparative population genomic analysis across species. Genome Biol 2023; 24:219. [PMID: 37789379 PMCID: PMC10548728 DOI: 10.1186/s13059-023-03068-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 09/20/2023] [Indexed: 10/05/2023] Open
Abstract
We developed maximum likelihood method for detecting positive selection or balancing selection using multilocus or genomic polymorphism and divergence data from two species. The method is especially useful for investigating natural selection in noncoding regions. Simulations demonstrate that the method outperforms existing methods in detecting both positive and balancing selection. We apply the method to population genomic data from human and chimpanzee. The list of genes identified under selection in the noncoding regions is prominently enriched in pathways related to the brain and nervous system. Therefore, our method will serve as a useful tool for comparative population genomic analysis.
Collapse
Affiliation(s)
- Shilei Zhao
- CAS Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
- China National Center for Bioinformation, Beijing, 100101, China
- School of Future Technology, College of Life Sciences and Sino-Danish College, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Lianjiang Chi
- CAS Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
- China National Center for Bioinformation, Beijing, 100101, China
| | - Hua Chen
- CAS Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China.
- China National Center for Bioinformation, Beijing, 100101, China.
- School of Future Technology, College of Life Sciences and Sino-Danish College, University of Chinese Academy of Sciences, Beijing, 100049, China.
- CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China.
| |
Collapse
|
5
|
Rundell TB, Brunelli M, Alvi A, Safian G, Capobianco C, Tu W, Subedi S, Fiumera A, Musselman LP. Polygenic adaptation to overnutrition reveals a role for cholinergic signaling in longevity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.14.544888. [PMID: 37398379 PMCID: PMC10312690 DOI: 10.1101/2023.06.14.544888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Overnutrition by high-sugar (HS) feeding reduces both the lifespan and healthspan across taxa. Pressuring organisms to adapt to overnutrition can highlight genes and pathways important for the healthspan in stressful environments. We used an experimental evolution approach to adapt four replicate, outbred population pairs of Drosophila melanogaster to a HS or control diet. Sexes were separated and aged on either diet until mid-life, then mated to produce the next generation, allowing enrichment for protective alleles over time. All HS-selected populations increased their lifespan and were therefore used as a platform to compare allele frequencies and gene expression. Pathways functioning in the nervous system were overrepresented in the genomic data and showed evidence for parallel evolution, although very few genes were the same across replicates. Acetylcholine-related genes, including the muscarinic receptor mAChR-A, showed significant changes in allele frequency in multiple selected populations and differential expression on a HS diet. Using genetic and pharmacological approaches, we show that cholinergic signaling affects Drosophila feeding in a sugar-specific fashion. Together, these results suggest that adaptation produces changes in allele frequencies that benefit animals under conditions of overnutrition and that it is repeatable at the pathway level.
Collapse
|
6
|
Barroso GV, Lohmueller KE. Inferring the mode and strength of ongoing selection. Genome Res 2023; 33:632-643. [PMID: 37055196 PMCID: PMC10234300 DOI: 10.1101/gr.276386.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 03/29/2023] [Indexed: 04/15/2023]
Abstract
Genome sequence data are no longer scarce. The UK Biobank alone comprises 200,000 individual genomes, with more on the way, leading the field of human genetics toward sequencing entire populations. Within the next decades, other model organisms will follow suit, especially domesticated species such as crops and livestock. Having sequences from most individuals in a population will present new challenges for using these data to improve health and agriculture in the pursuit of a sustainable future. Existing population genetic methods are designed to model hundreds of randomly sampled sequences but are not optimized for extracting the information contained in the larger and richer data sets that are beginning to emerge, with thousands of closely related individuals. Here we develop a new method called trio-based inference of dominance and selection (TIDES) that uses data from tens of thousands of family trios to make inferences about natural selection acting in a single generation. TIDES further improves on the state of the art by making no assumptions regarding demography, linkage, or dominance. We discuss how our method paves the way for studying natural selection from new angles.
Collapse
Affiliation(s)
- Gustavo V Barroso
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095-1606, USA; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California 90095, USA
| | - Kirk E Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095-1606, USA; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California 90095, USA
| |
Collapse
|
7
|
Abstract
It is known that methods to estimate the rate of adaptive evolution, which are based on the McDonald–Kreitman test, can be biased by changes in effective population size. Here, we demonstrate theoretically that changes in population size can also generate an artifactual correlation between the rate of adaptive evolution and any factor that is correlated to the strength of selection acting against deleterious mutations. In this context, we have investigated whether several site-level factors influence the rate of adaptive evolution in the divergence of humans and chimpanzees, two species that have been inferred to have undergone population size contraction since they diverged. We find that the rate of adaptive evolution, relative to the rate of mutation, is higher for more exposed amino acids, lower for amino acid pairs that are more dissimilar in terms of their polarity, volume, and lower for amino acid pairs that are subject to stronger purifying selection, as measured by the ratio of the numbers of nonsynonymous to synonymous polymorphisms (pN/pS). All of these correlations are opposite to the artifactual correlations expected under contracting population size. We therefore conclude that these correlations are genuine.
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Ana Filipa Moutinho
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
- Department for Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plon, Germany
| | - Adam Eyre-Walker
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
- Corresponding author: E-mail:
| |
Collapse
|
8
|
Laval G, Patin E, Boutillier P, Quintana-Murci L. Sporadic occurrence of recent selective sweeps from standing variation in humans as revealed by an approximate Bayesian computation approach. Genetics 2021; 219:6377789. [PMID: 34849862 DOI: 10.1093/genetics/iyab161] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2021] [Accepted: 09/01/2021] [Indexed: 12/14/2022] Open
Abstract
During their dispersals over the last 100,000 years, modern humans have been exposed to a large variety of environments, resulting in genetic adaptation. While genome-wide scans for the footprints of positive Darwinian selection have increased knowledge of genes and functions potentially involved in human local adaptation, they have globally produced evidence of a limited contribution of selective sweeps in humans. Conversely, studies based on machine learning algorithms suggest that recent sweeps from standing variation are widespread in humans, an observation that has been recently questioned. Here, we sought to formally quantify the number of recent selective sweeps in humans, by leveraging approximate Bayesian computation and whole-genome sequence data. Our computer simulations revealed suitable ABC estimations, regardless of the frequency of the selected alleles at the onset of selection and the completion of sweeps. Under a model of recent selection from standing variation, we inferred that an average of 68 (from 56 to 79) and 140 (from 94 to 198) sweeps occurred over the last 100,000 years of human history, in African and Eurasian populations, respectively. The former estimation is compatible with human adaptation rates estimated since divergence with chimps, and reveals numbers of sweeps per generation per site in the range of values estimated in Drosophila. Our results confirm the rarity of selective sweeps in humans and show a low contribution of sweeps from standing variation to recent human adaptation.
Collapse
Affiliation(s)
- Guillaume Laval
- Human Evolutionary Genetics Unit, Institut Pasteur, UMR 2000, CNRS, Paris 75015, France
| | - Etienne Patin
- Human Evolutionary Genetics Unit, Institut Pasteur, UMR 2000, CNRS, Paris 75015, France
| | - Pierre Boutillier
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Lluis Quintana-Murci
- Human Evolutionary Genetics Unit, Institut Pasteur, UMR 2000, CNRS, Paris 75015, France.,Human Genomics and Evolution, Collège de France, 75005 Paris, France
| |
Collapse
|
9
|
Brevet M, Lartillot N. Reconstructing the History of Variation in Effective Population Size along Phylogenies. Genome Biol Evol 2021; 13:6311658. [PMID: 34190972 PMCID: PMC8358220 DOI: 10.1093/gbe/evab150] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/21/2021] [Indexed: 12/19/2022] Open
Abstract
The nearly neutral theory predicts specific relations between effective population size (Ne) and patterns of divergence and polymorphism, which depend on the shape of the distribution of fitness effects (DFE) of new mutations. However, testing these relations is not straightforward, owing to the difficulty in estimating Ne. Here, we introduce an integrative framework allowing for an explicit reconstruction of the phylogenetic history of Ne, thus leading to a quantitative test of the nearly neutral theory and an estimation of the allometric scaling of the ratios of nonsynonymous over synonymous polymorphism (πN/πS) and divergence (dN/dS) with respect to Ne. As an illustration, we applied our method to primates, for which the nearly neutral predictions were mostly verified. Under a purely nearly neutral model with a constant DFE across species, we find that the variation in πN/πS and dN/dS as a function of Ne is too large to be compatible with current estimates of the DFE based on site frequency spectra. The reconstructed history of Ne shows a 10-fold variation across primates. The mutation rate per generation u, also reconstructed over the tree by the method, varies over a 3-fold range and is negatively correlated with Ne. As a result of these opposing trends for Ne and u, variation in πS is intermediate, primarily driven by Ne but substantially influenced by u. Altogether, our integrative framework provides a quantitative assessment of the role of Ne and u in modulating patterns of genetic variation, while giving a synthetic picture of their history over the clade.
Collapse
Affiliation(s)
- Mathieu Brevet
- Station d'Écologie Théorique et Expérimentale, UPR 2001, Moulis, France
| | - Nicolas Lartillot
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558, Université Lyon 1, Villeurbanne, France
| |
Collapse
|
10
|
Huang X, Fortier AL, Coffman AJ, Struck TJ, Irby MN, James JE, León-Burguete JE, Ragsdale AP, Gutenkunst RN. Inferring genome-wide correlations of mutation fitness effects between populations. Mol Biol Evol 2021; 38:4588-4602. [PMID: 34043790 PMCID: PMC8476148 DOI: 10.1093/molbev/msab162] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The effect of a mutation on fitness may differ between populations depending on environmental and genetic context, but little is known about the factors that underlie such differences. To quantify genome-wide correlations in mutation fitness effects, we developed a novel concept called a joint distribution of fitness effects (DFE) between populations. We then proposed a new statistic w to measure the DFE correlation between populations. Using simulation, we showed that inferring the DFE correlation from the joint allele frequency spectrum is statistically precise and robust. Using population genomic data, we inferred DFE correlations of populations in humans, Drosophila melanogaster, and wild tomatoes. In these species, we found that the overall correlation of the joint DFE was inversely related to genetic differentiation. In humans and D. melanogaster, deleterious mutations had a lower DFE correlation than tolerated mutations, indicating a complex joint DFE. Altogether, the DFE correlation can be reliably inferred, and it offers extensive insight into the genetics of population divergence.
Collapse
|
11
|
Takata A, Hamanaka K, Matsumoto N. Refinement of the clinical variant interpretation framework by statistical evidence and machine learning. MED 2021; 2:611-632.e9. [PMID: 35590234 DOI: 10.1016/j.medj.2021.02.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 09/28/2020] [Accepted: 02/16/2021] [Indexed: 12/29/2022]
Abstract
BACKGROUND Although the American College of Medical Genetics and Genomics/Association for Molecular Pathology (ACMG/AMP) guidelines for variant interpretation are used widely in clinical genetics, there is room for improvement of these knowledge-based guidelines. METHODS Statistical assessment of average deleteriousness of start-lost, stop-lost, and in-frame insertion and deletion (indel) variants and extraction of deleterious subsets was performed, being informed by proportions of rare variants in the general population of the Genome Aggregation Database (gnomAD). A machine learning-based model scoring the pathogenicity of start-lost variants (the PoStaL model) was constructed by predicting possible translation initiation sites on transcripts by deep learning and training a random forest on known pathogenic and likely benign variants. FINDINGS The proportion of rare variants was highest in stop-lost variants, followed by in-frame indels and start-lost variants, suggesting that the criteria in the ACMG/AMP guidelines assigning PVS (pathogenic very strong) to start-lost variants and PM (pathogenic moderate) to stop-lost and in-frame indel variants would not be appropriate. Regarding deleterious subsets, stop-lost variants introducing extensions of more than 30 amino acids and in-frame indels computationally predicted to be damaging are enriched for rare and known pathogenic variants. For start-lost variants, we developed the PoStaL model, which outperforms existing tools. We also provide comprehensive lists of the PoStaL scores for start-lost variants and the length of extended amino acids by stop-lost variants. CONCLUSIONS Our study could contribute to refinement of the ACMG/AMP guidelines, provides resources for future investigation, and provides an example of how to improve knowledge-based frameworks by data-driven approaches. FUNDING The study was supported by grants from the Japan Agency for Medical Research and Development (AMED) and the Japan Society for the Promotion of Science (JSPS).
Collapse
Affiliation(s)
- Atsushi Takata
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, Kanagawa 236-0004, Japan; Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan; Laboratory for Molecular Dynamics of Mental Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan.
| | - Kohei Hamanaka
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, Kanagawa 236-0004, Japan
| | - Naomichi Matsumoto
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, Kanagawa 236-0004, Japan.
| |
Collapse
|