1
|
Srivastava K, Yin Q, Makuria AT, Rios M, Gebremedhin A, Flegel WA. CD59 gene: 143 haplotypes of 22,718 nucleotides length by computational phasing in 113 individuals from different ethnicities. Transfusion 2024; 64:1296-1305. [PMID: 38817044 PMCID: PMC11251854 DOI: 10.1111/trf.17869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Revised: 03/22/2024] [Accepted: 04/30/2024] [Indexed: 06/01/2024]
Abstract
BACKGROUND CD59 deficiency due to rare germline variants in the CD59 gene causes disabilities, ischemic strokes, neuropathy, and hemolysis. CD59 deficiency due to common somatic variants in the PIG-A gene in hematopoietic stem cells causes paroxysmal nocturnal hemoglobinuria. The ISBT database lists one nonsense and three missense germline variants that are associated with the CD59-null phenotype. To analyze the genetic diversity of the CD59 gene, we determined long-range CD59 haplotypes among individuals from different ethnicities. METHODS We determined a 22.7 kb genomic fragment of the CD59 gene in 113 individuals using next-generation sequencing (NGS), which covered the whole NM_203330.2 mRNA transcript of 7796 base pairs. Samples came from an FDA reference repository and our Ethiopia study cohorts. The raw genotype data were computationally phased into individual haplotype sequences. RESULTS Nucleotide sequencing of the CD59 gene of 226 chromosomes identified 216 positions with single nucleotide variants. Only three haplotypes were observed in homozygous form, which allowed us to assign them unambiguously as experimentally verified CD59 haplotypes. They were also the most frequent haplotypes among both cohorts. An additional 140 haplotypes were imputed computationally. DISCUSSION We provided a large set of haplotypes and proposed three verified long-range CD59 reference sequences, based on a population approach, using a generalizable rationale for our choice. Correct long-range haplotypes are useful as template sequences for allele calling in high-throughput NGS and precision medicine approaches, thus enhancing the reliability of clinical diagnostics. Long-range haplotypes can also be used to evaluate the influence of genetic variation on the risk of transfusion reactions or diseases.
Collapse
Affiliation(s)
- Kshitij Srivastava
- Department of Transfusion Medicine, NIH Clinical Center, National Institutes of Health, Bethesda, MD, USA
| | - Qinan Yin
- Department of Transfusion Medicine, NIH Clinical Center, National Institutes of Health, Bethesda, MD, USA
| | - Addisalem Taye Makuria
- Department of Transfusion Medicine, NIH Clinical Center, National Institutes of Health, Bethesda, MD, USA
- Department of Pathology and Laboratory Services, ECU Health Medical Center, Greenville, NC, USA
| | - Maria Rios
- Office of Blood Research and Review, Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD, USA
| | - Amha Gebremedhin
- School of Medicine, College of Health Sciences, Addis Ababa University, Ethiopia
| | - Willy Albert Flegel
- Department of Transfusion Medicine, NIH Clinical Center, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
2
|
Lewis R, Pointer MD, Friend L, Gage MJG, Spurgin LG. Tests of evolutionary and genetic rescue using flour beetles, Tribolium castaneum, experimentally evolved to thermal conditions. Ecol Evol 2024; 14:e11313. [PMID: 38694756 PMCID: PMC11056960 DOI: 10.1002/ece3.11313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 03/26/2024] [Accepted: 04/03/2024] [Indexed: 05/04/2024] Open
Abstract
Small, isolated populations are often characterised by low levels of genetic diversity. This can result in inbreeding depression and reduced capacity to adapt to changes in the environment, and therefore higher risk of extinction. However, sometimes these populations can be rescued if allowed to increase in size or if migrants enter, bringing in new allelic variation and thus increasing genetic diversity. This study uses experimental manipulation of population size and migration to quantify their effects on fitness in a challenging environment to better understand genetic rescue. Using small, replicated populations of Tribolium castaneum experimentally evolved to different temperature regimes we tested genetic and demographic rescue, by performing large-scale manipulations of population size and migration and examining fitness consequences over multiple generations. We measured fitness in high temperature (38°C) thermal lines maintained at their usual 'small' population size of N = 100 individuals, and with 'large' scaled up duplicates containing N≈10,000 individuals. We compared these large lines with and without migration (m = 0.1) for 10 generations. Additionally, we assessed the effects of outcrossing at an individual level, by comparing fitness of hybrid (thermal line × stock) offspring with within-line crosses. We found that, at the population level, a rapid increase in the number of individuals in the population resulted in reduced fitness (represented by reproductive output and survival through heatwave conditions), regardless of migration. However, at an individual level, the hybrid offspring of migrants with native individuals generally demonstrated increased longevity in high temperature conditions compared with individuals from thermal selection lines. Overall, these populations showed no evidence that demographic manipulations led to genetic or evolutionary rescue. Following the effects of migration in individuals over several generations may be the next step in unravelling these conflicting results. We discuss these findings in the context of conservation intervention.
Collapse
Affiliation(s)
- Rebecca Lewis
- School of Biological SciencesUniversity of East AngliaNorwichUK
| | | | - Lucy Friend
- School of Biological SciencesUniversity of East AngliaNorwichUK
| | | | | |
Collapse
|
3
|
Balick DJ. A field theoretic approach to non-equilibrium population genetics in the strong selection regime. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.16.524324. [PMID: 36711507 PMCID: PMC9882232 DOI: 10.1101/2023.01.16.524324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Natural populations are virtually never observed in equilibrium, yet equilibrium approximations comprise the majority of our understanding of population genetics. Using standard tools from statistical physics, a formalism is presented that re-expresses the stochastic equations describing allelic evolution as a partition functional over all possible allelic trajectories ('paths') governed by selection, mutation, and drift. A perturbative field theory is developed for strong additive selection, relevant to disease variation, that facilitates the straightforward computation of closed-form approximations for time-dependent moments of the allele frequency distribution across a wide range of non-equilibrium scenarios; examples are presented for constant population size, exponential growth, bottlenecks, and oscillatory size, all of which align well to simulations and break down just above the drift barrier. Equilibration times are computed and, even for static population size, generically extend beyond the order 1/s timescale associated with exponential frequency decay. Though the mutation load is largely robust to variable population size, perturbative drift-based corrections to the deterministic trajectory are readily computed. Under strong selection, the variance of a new mutation's frequency (related to homozygosity) is dominated by drift-driven dynamics and a transient increase in variance often occurs prior to equilibrating. The excess kurtosis over skew squared is roughly constant (i.e., independent of selection, provided 2Ns ≳ 5) for static population size, and thus potentially sensitive to deviation from equilibrium. These insights highlight the value of such closed-form approximations, naturally generated from Feynman diagrams in a perturbative field theory, which can simply and accurately capture the parameter dependences describing a variety of non-equilibrium population genetic phenomena of interest.
Collapse
Affiliation(s)
- Daniel J Balick
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| |
Collapse
|
4
|
Allman B, Koelle K, Weissman D. Heterogeneity in viral populations increases the rate of deleterious mutation accumulation. Genetics 2022; 222:6673144. [PMID: 35993909 PMCID: PMC9526070 DOI: 10.1093/genetics/iyac127] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 08/11/2022] [Indexed: 11/13/2022] Open
Abstract
RNA viruses have high mutation rates, with the majority of mutations being deleterious. We examine patterns of deleterious mutation accumulation over multiple rounds of viral replication, with a focus on how cellular coinfection and heterogeneity in viral output affect these patterns. Specifically, using agent-based intercellular simulations we find, in agreement with previous studies, that coinfection of cells by viruses relaxes the strength of purifying selection, and thereby increases the rate of deleterious mutation accumulation. We further find that cellular heterogeneity in viral output exacerbates the rate of deleterious mutation accumulation, regardless of whether this heterogeneity in viral output is stochastic or is due to variation in cellular multiplicity of infection. These results highlight the need to consider the unique life histories of viruses and their population structure to better understand observed patterns of viral evolution.
Collapse
Affiliation(s)
- Brent Allman
- Graduate Program in Population Biology, Ecology, and Evolution, Emory University, Atlanta, Georgia 30322, USA
| | - Katia Koelle
- Department of Biology, Emory University, Atlanta, Georgia 30322, USA
| | - Daniel Weissman
- Department of Biology, Emory University, Atlanta, Georgia 30322, USA.,Department of Physics, Emory University, Atlanta, Georgia 30322, USA
| |
Collapse
|
5
|
Legried B, Terhorst J. Rates of convergence in the two-island and isolation-with-migration models. Theor Popul Biol 2022; 147:16-27. [PMID: 36007782 DOI: 10.1016/j.tpb.2022.08.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2021] [Revised: 08/10/2022] [Accepted: 08/11/2022] [Indexed: 11/25/2022]
Abstract
A number of powerful demographic inference methods have been developed in recent years, with the goal of fitting rich evolutionary models to genetic data obtained from many populations. In this paper we investigate the statistical performance of these methods in the specific case where there is continuous migration between populations. Compared with earlier work, migration significantly complicates the theoretical analysis and requires new techniques. We employ the theories of phase-type distributions and concentration of measure in order to study the two-island and isolation-with-migration models, resulting in both upper and lower bounds on rates of convergence for parametric estimators in migration models. For the upper bounds, we consider inferring rates of coalescent and migration on the basis of directly observing pairwise coalescent times, and, more realistically, when (conditionally) Poisson-distributed mutations dropped on latent trees are observed. We complement these upper bounds with information-theoretic lower bounds which establish a limit, in terms of sample size, below which inference is effectively impossible.
Collapse
Affiliation(s)
- Brandon Legried
- Department of Statistics, University of Michigan, United States of America
| | - Jonathan Terhorst
- Department of Statistics, University of Michigan, United States of America.
| |
Collapse
|
6
|
Conover JL, Wendel JF. Deleterious Mutations Accumulate Faster in Allopolyploid than Diploid Cotton (Gossypium) and Unequally between Subgenomes. Mol Biol Evol 2022; 39:6517786. [PMID: 35099532 PMCID: PMC8841602 DOI: 10.1093/molbev/msac024] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Abstract
Whole genome duplication (polyploidization) is among the most dramatic mutational processes in nature, so understanding how natural selection differs in polyploids relative to diploids is an important goal. Population genetics theory predicts that recessive deleterious mutations accumulate faster in allopolyploids than diploids due to the masking effect of redundant gene copies, but this prediction is hitherto unconfirmed. Here, we use the cotton genus (Gossypium), which contains seven allopolyploids derived from a single polyploidization event 1-2 million years ago, to investigate deleterious mutation accumulation. We use two methods of identifying deleterious mutations at the nucleotide and amino acid level, along with whole-genome resequencing of 43 individuals spanning six allopolyploid species and their two diploid progenitors, to demonstrate that deleterious mutations accumulate faster in allopolyploids than in their diploid progenitors. We find that, unlike what would be expected under models of demographic changes alone, strongly deleterious mutations show the biggest difference between ploidy levels, and this effect diminishes for moderately and mildly deleterious mutations. We further show that the proportion of nonsynonymous mutations that are deleterious differs between the two co-resident subgenomes in the allopolyploids, suggesting that homoeologous masking acts unequally between subgenomes. Our results provide a genome-wide perspective on classic notions of the significance of gene duplication that likely are broadly applicable to allopolyploids, with implications for our understanding of the evolutionary fate of deleterious mutations. Finally, we note that some measures of selection (e.g. dN/dS, πN/πS) may be biased when species of different ploidy levels are compared.
Collapse
Affiliation(s)
- Justin L Conover
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA
| | - Jonathan F Wendel
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA
| |
Collapse
|
7
|
Vecchyo DOD, Lohmueller KE, Novembre J. Haplotype-based inference of the distribution of fitness effects. Genetics 2022; 220:6501446. [PMID: 35100400 PMCID: PMC8982047 DOI: 10.1093/genetics/iyac002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 12/18/2021] [Indexed: 11/13/2022] Open
Abstract
Abstract
Recent genome sequencing studies with large sample sizes in humans have discovered a vast quantity of low-frequency variants, providing an important source of information to analyze how selection is acting on human genetic variation. In order to estimate the strength of natural selection acting on low-frequency variants, we have developed a likelihood-based method that uses the lengths of pairwise identity-by-state between haplotypes carrying low-frequency variants. We show that in some non-equilibrium populations (such as those that have had recent population expansions) it is possible to distinguish between positive or negative selection acting on a set of variants. With our new framework, one can infer a fixed selection intensity acting on a set of variants at a particular frequency, or a distribution of selection coefficients for standing variants and new mutations. We show an application of our method to the UK10K phased haplotype dataset of individuals.
Collapse
Affiliation(s)
- Diego Ortega-Del Vecchyo
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Juriquilla, Querétaro, 76230, México
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, California, 90095, United States of America
| | - Kirk E Lohmueller
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, California, 90095, United States of America
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, 90095, United States of America
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, 90095, United States of America
| | - John Novembre
- Department of Human Genetics, University of Chicago, Chicago, Illinois, 60637, United States of America
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, 60637, United States of America
| |
Collapse
|
8
|
Sohail M, Izarraras-Gomez A, Ortega-Del Vecchyo D. Populations, Traits, and Their Spatial Structure in Humans. Genome Biol Evol 2021; 13:evab272. [PMID: 34894236 PMCID: PMC8715524 DOI: 10.1093/gbe/evab272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2021] [Indexed: 11/16/2022] Open
Abstract
The spatial distribution of genetic variants is jointly determined by geography, past demographic processes, natural selection, and its interplay with environmental variation. A fraction of these genetic variants are "causal alleles" that affect the manifestation of a complex trait. The effect exerted by these causal alleles on complex traits can be independent or dependent on the environment. Understanding the evolutionary processes that shape the spatial structure of causal alleles is key to comprehend the spatial distribution of complex traits. Natural selection, past population size changes, range expansions, consanguinity, assortative mating, archaic introgression, admixture, and the environment can alter the frequencies, effect sizes, and heterozygosities of causal alleles. This provides a genetic axis along which complex traits can vary. However, complex traits also vary along biogeographical and sociocultural axes which are often correlated with genetic axes in complex ways. The purpose of this review is to consider these genetic and environmental axes in concert and examine the ways they can help us decipher the variation in complex traits that is visible in humans today. This initiative necessarily implies a discussion of populations, traits, the ability to infer and interpret "genetic" components of complex traits, and how these have been impacted by adaptive events. In this review, we provide a history-aware discussion on these topics using both the recent and more distant past of our academic discipline and its relevant contexts.
Collapse
Affiliation(s)
- Mashaal Sohail
- Department of Human Genetics, University of Chicago, USA
- Centro de Ciencias Genómicas (CCG), Universidad Nacional Autónoma de México (UNAM), Cuernavaca, Morelos, México
| | - Alan Izarraras-Gomez
- Laboratorio Internacional de Investigación sobre el Genoma Humano (LIIGH), Universidad Nacional Autónoma de México (UNAM), Juriquilla, Querétaro, México
| | - Diego Ortega-Del Vecchyo
- Laboratorio Internacional de Investigación sobre el Genoma Humano (LIIGH), Universidad Nacional Autónoma de México (UNAM), Juriquilla, Querétaro, México
| |
Collapse
|
9
|
Genetic diversity and expression profile of Plasmodium falciparum Pf34 gene supports its immunogenicity. Curr Res Transl Med 2021; 69:103308. [PMID: 34425378 DOI: 10.1016/j.retram.2021.103308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 07/02/2021] [Accepted: 08/06/2021] [Indexed: 11/22/2022]
Abstract
PURPOSE OF THE STUDY Genetic variation is one of the major obstacles in the development of effective vaccines. A multivalent malaria vaccine is required to increase efficacy and confer long term protection. In this context, we analysed the genetic diversity, expression profile, and immune response against Pf34. METHODS Phylogenetic analysis was carried out using Pf34 orthologues sequences of various Plasmodium species. Genetic diversity was analysed by PCR amplification and Sanger dideoxy sequencing of Pf34 gene from Plasmodium falciparum positive human blood samples. The expression level of Pf34 gene was studied during erythrocytic stage by real time qPCR at four-hour interval, and immune response against synthetic peptides of Pf34 (P1 and P2) was analysed using ELISA. RESULTS Phylogenetic analysis revealed the conserved nature of Pf34 gene. Genetic diversity analysis showed that majority (92%) of Plasmodium falciparum isolates in available database bore wild type Pf34 gene (Hd = 0.160 ± 0.030, π = 0.00021), including the present study (89.3%). The P. falciparum specific amino acid repeats (NNDK, NNDLK, and NNNNNN) in the B cell epitope regions were conserved. Furthermore, Pf34 gene is expressed throughout the erythrocytic cycle and comparatively high expression was observed in early ring and schizont stage. High IgG response was observed against both the peptides P1 and P2 of Pf34 containing asparagine NNNNNN and NNDLK repeat respectively. CONCLUSION The limited genetic diversity, presence of conserved amino acid repeats within B cell epitope and high IgG response suggests that Pf34 may be a potential vaccine candidate for malaria. However, further validation studies are required.
Collapse
|
10
|
Koufopanou V, Lomas S, Pronina O, Almeida P, Sampaio JP, Mousseau T, Liti G, Burt A. Population Size, Sex and Purifying Selection: Comparative Genomics of Two Sister Taxa of the Wild Yeast Saccharomyces paradoxus. Genome Biol Evol 2021; 12:1636-1645. [PMID: 33011797 PMCID: PMC7533043 DOI: 10.1093/gbe/evaa141] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/10/2020] [Indexed: 12/16/2022] Open
Abstract
This study uses population genomic data to estimate demographic and selection parameters in two sister lineages of the wild yeast Saccharomyces paradoxus and compare their evolution. We first estimate nucleotide and recombinational diversities in each of the two lineages to infer their population size and frequency of sex and then analyze the rate of mutation accumulation since divergence from their inferred common ancestor to estimate the generation time and efficacy of selection. We find that one of the lineages has significantly higher silent nucleotide diversity and lower linkage disequilibrium, indicating a larger population with more frequent sexual generations. The same lineage also shows shorter generation time and higher efficacy of purifying selection, the latter consistent with the finding of larger population size and more frequent sex. Similar analyses are also performed on the ancestries of individual strains within lineages and we find significant differences between strains implying variation in rates of mitotic cell divisions. Our sample includes some strains originating in the Chernobyl nuclear-accident exclusion zone, which has been subjected to high levels of radiation for nearly 30 years now. We find no evidence, however, for increased rates of mutation. Finally, there is a positive correlation between rates of mutation accumulation and length of growing period, as measured by latitude of the place of origin of strains. Our study illustrates the power of genomic analyses in estimating population and life history parameters and testing predictions based on population genetic theory.
Collapse
Affiliation(s)
- Vassiliki Koufopanou
- Department of Life Sciences, Imperial College London, Ascot, Berks, United Kingdom
| | - Susan Lomas
- Department of Life Sciences, Imperial College London, Ascot, Berks, United Kingdom
| | - Olga Pronina
- Institute of Cell Biology and Genetic Engineering, NAS of Ukraine, Kyiv, Ukraine
| | - Pedro Almeida
- Department of Genetics, Evolution & Environment, University College London, United Kingdom
| | - Jose Paulo Sampaio
- UCIBIO, Departamento de Ciências da Vida, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Portugal
| | | | - Gianni Liti
- CNRS, INSERM, IRCAN, Universite Cote d' Azur, Nice, France
| | - Austin Burt
- Department of Life Sciences, Imperial College London, Ascot, Berks, United Kingdom
| |
Collapse
|
11
|
Cabrera VM. Human molecular evolutionary rate, time dependency and transient polymorphism effects viewed through ancient and modern mitochondrial DNA genomes. Sci Rep 2021; 11:5036. [PMID: 33658608 PMCID: PMC7930196 DOI: 10.1038/s41598-021-84583-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 02/15/2021] [Indexed: 01/31/2023] Open
Abstract
Human evolutionary genetics gives a chronological framework to interpret the human history. It is based on the molecular clock hypothesis that suppose a straightforward relationship between the mutation rate and the substitution rate with independence of other factors as demography dynamics. Analyzing ancient and modern human complete mitochondrial genomes we show here that, along the time, the substitution rate can be significantly slower or faster than the average germline mutation rate confirming a time dependence effect mainly attributable to changes in the effective population size of the human populations, with an exponential growth in recent times. We also detect that transient polymorphisms play a slowdown role in the evolutionary rate deduced from haplogroup intraspecific trees. Finally, we propose the use of the most divergent lineages within haplogroups as a practical approach to correct these molecular clock mismatches.
Collapse
Affiliation(s)
- Vicente M Cabrera
- Retired member of Departamento de Genética, Facultad de Biología, Universidad de La Laguna, Canary Islands, Spain.
| |
Collapse
|
12
|
Nelson CW, Ardern Z, Goldberg TL, Meng C, Kuo CH, Ludwig C, Kolokotronis SO, Wei X. Dynamically evolving novel overlapping gene as a factor in the SARS-CoV-2 pandemic. eLife 2020; 9:e59633. [PMID: 33001029 PMCID: PMC7655111 DOI: 10.7554/elife.59633] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 09/30/2020] [Indexed: 12/11/2022] Open
Abstract
Understanding the emergence of novel viruses requires an accurate and comprehensive annotation of their genomes. Overlapping genes (OLGs) are common in viruses and have been associated with pandemics but are still widely overlooked. We identify and characterize ORF3d, a novel OLG in SARS-CoV-2 that is also present in Guangxi pangolin-CoVs but not other closely related pangolin-CoVs or bat-CoVs. We then document evidence of ORF3d translation, characterize its protein sequence, and conduct an evolutionary analysis at three levels: between taxa (21 members of Severe acute respiratory syndrome-related coronavirus), between human hosts (3978 SARS-CoV-2 consensus sequences), and within human hosts (401 deeply sequenced SARS-CoV-2 samples). ORF3d has been independently identified and shown to elicit a strong antibody response in COVID-19 patients. However, it has been misclassified as the unrelated gene ORF3b, leading to confusion. Our results liken ORF3d to other accessory genes in emerging viruses and highlight the importance of OLGs.
Collapse
MESH Headings
- Amino Acid Sequence
- Animals
- Antibodies, Viral/immunology
- Antibody Specificity
- Antigens, Viral/biosynthesis
- Antigens, Viral/genetics
- Antigens, Viral/immunology
- Betacoronavirus/genetics
- Betacoronavirus/pathogenicity
- Betacoronavirus/physiology
- COVID-19
- China/epidemiology
- Chiroptera/virology
- Coronavirus/genetics
- Coronavirus Infections/epidemiology
- Coronavirus Infections/virology
- Epitopes/genetics
- Epitopes/immunology
- Europe/epidemiology
- Eutheria/virology
- Evolution, Molecular
- Gene Expression Regulation, Viral
- Genes, Overlapping
- Genes, Viral
- Genetic Variation
- Haplotypes/genetics
- Host Specificity/genetics
- Humans
- Models, Molecular
- Mutation
- Open Reading Frames/genetics
- Pandemics
- Phylogeny
- Pneumonia, Viral/epidemiology
- Pneumonia, Viral/virology
- Protein Biosynthesis
- Protein Conformation
- RNA, Viral/genetics
- SARS-CoV-2
- Sequence Alignment
- Sequence Homology, Nucleic Acid
- Viral Proteins/genetics
- Viral Proteins/immunology
Collapse
Affiliation(s)
- Chase W Nelson
- Biodiversity Research Center, Academia SinicaTaipeiTaiwan
- Institute for Comparative Genomics, American Museum of Natural HistoryNew YorkUnited States
| | - Zachary Ardern
- Chair for Microbial Ecology, Technical University of MunichFreisingGermany
| | - Tony L Goldberg
- Department of Pathobiological Sciences, University of Wisconsin-MadisonMadisonUnited States
- Global Health Institute, University of Wisconsin-MadisonMadisonUnited States
| | - Chen Meng
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), Technical University of MunichFreisingGermany
| | - Chen-Hao Kuo
- Biodiversity Research Center, Academia SinicaTaipeiTaiwan
| | - Christina Ludwig
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), Technical University of MunichFreisingGermany
| | - Sergios-Orestis Kolokotronis
- Institute for Comparative Genomics, American Museum of Natural HistoryNew YorkUnited States
- Department of Epidemiology and Biostatistics, School of Public Health, SUNY Downstate Health Sciences UniversityBrooklynUnited States
- Institute for Genomic Health, SUNY Downstate Health Sciences UniversityBrooklynUnited States
- Division of Infectious Diseases, Department of Medicine, SUNY Downstate Health Sciences UniversityBrooklynUnited States
| | - Xinzhu Wei
- Departments of Integrative Biology and Statistics, University of California, BerkeleyBerkeleyUnited States
- Departments of Computer Science, Human Genetics, and Computational Medicine, University of California, Los AngelesLos AngelesUnited States
| |
Collapse
|
13
|
Uricchio LH. Evolutionary perspectives on polygenic selection, missing heritability, and GWAS. Hum Genet 2020; 139:5-21. [PMID: 31201529 PMCID: PMC8059781 DOI: 10.1007/s00439-019-02040-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Accepted: 06/06/2019] [Indexed: 12/26/2022]
Abstract
Genome-wide association studies (GWAS) have successfully identified many trait-associated variants, but there is still much we do not know about the genetic basis of complex traits. Here, we review recent theoretical and empirical literature regarding selection on complex traits to argue that "missing heritability" is as much an evolutionary problem as it is a statistical problem. We discuss empirical findings that suggest a role for selection in shaping the effect sizes and allele frequencies of causal variation underlying complex traits, and the limitations of these studies. We then use simulations of selection, realistic genome structure, and complex human demography to illustrate the results of recent theoretical work on polygenic selection, and show that statistical inference of causal loci is sharply affected by evolutionary processes. In particular, when selection acts on causal alleles, it hampers the ability to detect causal loci and constrains the transferability of GWAS results across populations. Last, we discuss the implications of these findings for future association studies, and suggest that future statistical methods to infer causal loci for genetic traits will benefit from explicit modeling of the joint distribution of effect sizes and allele frequencies under plausible evolutionary models.
Collapse
Affiliation(s)
- Lawrence H Uricchio
- Department of Biology, Stanford University, Stanford, CA, USA.
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA.
| |
Collapse
|
14
|
Tong DMH, Hernandez RD. Population genetic simulation study of power in association testing across genetic architectures and study designs. Genet Epidemiol 2020; 44:90-103. [PMID: 31587362 PMCID: PMC6980249 DOI: 10.1002/gepi.22264] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Revised: 08/26/2019] [Accepted: 09/16/2019] [Indexed: 12/22/2022]
Abstract
While it is well established that genetics can be a major contributor to population variation of complex traits, the relative contributions of rare and common variants to phenotypic variation remains a matter of considerable debate. Here, we simulate genetic and phenotypic data across different case/control panel sampling strategies, sequencing methods, and genetic architecture models based on evolutionary forces to determine the statistical performance of rare variant association tests (RVATs) widely in use. We find that the highest statistical power of RVATs is achieved by sampling case/control individuals from the extremes of an underlying quantitative trait distribution. We also demonstrate that the use of genotyping arrays, in conjunction with imputation from a whole-genome sequenced (WGS) reference panel, recovers the vast majority (90%) of the power that could be achieved by sequencing the case/control panel using current tools. Finally, we show that for dichotomous traits, the statistical performance of RVATs decreases as rare variants become more important in the trait architecture. Our results extend previous work to show that RVATs are insufficiently powered to make generalizable conclusions about the role of rare variants in dichotomous complex traits.
Collapse
Affiliation(s)
- Dominic M. H. Tong
- University of California, Berkeley ‐ University of California, San Francisco Graduate Program in BioengineeringSan FranciscoCalifornia
| | - Ryan D. Hernandez
- Department of Bioengineering and Therapeutic SciencesUniversity of CaliforniaSan FranciscoCalifornia
- Department of Human GeneticsMcGill UniversityMontrealCanada
| |
Collapse
|
15
|
Fragoza R, Das J, Wierbowski SD, Liang J, Tran TN, Liang S, Beltran JF, Rivera-Erick CA, Ye K, Wang TY, Yao L, Mort M, Stenson PD, Cooper DN, Wei X, Keinan A, Schimenti JC, Clark AG, Yu H. Extensive disruption of protein interactions by genetic variants across the allele frequency spectrum in human populations. Nat Commun 2019; 10:4141. [PMID: 31515488 PMCID: PMC6742646 DOI: 10.1038/s41467-019-11959-3] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2019] [Accepted: 08/06/2019] [Indexed: 12/19/2022] Open
Abstract
Each human genome carries tens of thousands of coding variants. The extent to which this variation is functional and the mechanisms by which they exert their influence remains largely unexplored. To address this gap, we leverage the ExAC database of 60,706 human exomes to investigate experimentally the impact of 2009 missense single nucleotide variants (SNVs) across 2185 protein-protein interactions, generating interaction profiles for 4797 SNV-interaction pairs, of which 421 SNVs segregate at > 1% allele frequency in human populations. We find that interaction-disruptive SNVs are prevalent at both rare and common allele frequencies. Furthermore, these results suggest that 10.5% of missense variants carried per individual are disruptive, a higher proportion than previously reported; this indicates that each individual's genetic makeup may be significantly more complex than expected. Finally, we demonstrate that candidate disease-associated mutations can be identified through shared interaction perturbations between variants of interest and known disease mutations.
Collapse
Affiliation(s)
- Robert Fragoza
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Jishnu Das
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Shayne D Wierbowski
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Jin Liang
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Tina N Tran
- Department of Biomedical Science, Cornell University, Ithaca, NY, 14853, USA
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, 14853, USA
| | - Siqi Liang
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Juan F Beltran
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Christen A Rivera-Erick
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Kaixiong Ye
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Ting-Yi Wang
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Li Yao
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Matthew Mort
- Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Peter D Stenson
- Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - David N Cooper
- Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Xiaomu Wei
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Alon Keinan
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA
| | - John C Schimenti
- Department of Biomedical Science, Cornell University, Ithaca, NY, 14853, USA
| | - Andrew G Clark
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, 14853, USA
| | - Haiyuan Yu
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA.
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA.
| |
Collapse
|
16
|
Evolution and Genetic Diversity of the k13 Gene Associated with Artemisinin Delayed Parasite Clearance in Plasmodium falciparum. Antimicrob Agents Chemother 2019; 63:AAC.02550-18. [PMID: 31085516 DOI: 10.1128/aac.02550-18] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Accepted: 04/28/2019] [Indexed: 01/19/2023] Open
Abstract
Mutations in the Plasmodium falciparum k13 (Pfk13) gene are linked to delayed parasite clearance in response to artemisinin-based combination therapies (ACTs) in Southeast Asia. To explore the evolutionary rate and constraints acting on this gene, k13 orthologs from species sharing a recent common ancestor with P. falciparum and Plasmodium vivax were analyzed. These comparative studies were followed by genetic polymorphism analyses within P. falciparum using 982 complete Pfk13 sequences from public databases and new data obtained by next-generation sequencing from African and Haitian isolates. Although k13 orthologs evolve at heterogeneous rates, the gene was conserved across the genus, with only synonymous substitutions being found at residues where mutations linked to the delayed parasite clearance phenotype have been reported. This suggests that those residues were under constraint from undergoing nonsynonymous changes during evolution of the genus. No fixed nonsynonymous differences were found between Pfk13 and its orthologs in closely related species found in African apes. This indicates that all nonsynonymous substitutions currently found in Pfk13 are younger than the time of divergence between P. falciparum and its closely related species. At the population level, no mutations linked to delayed parasite clearance were found in our samples from Africa and Haiti. However, there is a high number of single Pfk13 mutations segregating in P. falciparum populations, and two predominant alleles are distributed worldwide. This pattern is discussed in terms of how changes in the efficacy of natural selection, affected by population expansion, may have allowed for the emergence of mutations tolerant to ACTs.
Collapse
|
17
|
Bosse M, Megens H, Derks MFL, de Cara ÁMR, Groenen MAM. Deleterious alleles in the context of domestication, inbreeding, and selection. Evol Appl 2019; 12:6-17. [PMID: 30622631 PMCID: PMC6304688 DOI: 10.1111/eva.12691] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Revised: 05/30/2018] [Accepted: 06/12/2018] [Indexed: 12/21/2022] Open
Abstract
Each individual has a certain number of harmful mutations in its genome. These mutations can lower the fitness of the individual carrying them, dependent on their dominance and selection coefficient. Effective population size, selection, and admixture are known to affect the occurrence of such mutations in a population. The relative roles of demography and selection are a key in understanding the process of adaptation. These are factors that are potentially influenced and confounded in domestic animals. Here, we hypothesize that the series of events of bottlenecks, introgression, and strong artificial selection associated with domestication increased mutational load in domestic species. Yet, mutational load is hard to quantify, so there are very few studies available revealing the relevance of evolutionary processes. The precise role of artificial selection, bottlenecks, and introgression in further increasing the load of deleterious variants in animals in breeding and conservation programmes remains unclear. In this paper, we review the effects of domestication and selection on mutational load in domestic species. Moreover, we test some hypotheses on higher mutational load due to domestication and selective sweeps using sequence data from commercial pig and chicken lines. Overall, we argue that domestication by itself is not a prerequisite for genetic erosion, indicating that fitness potential does not need to decline. Rather, mutational load in domestic species can be influenced by many factors, but consistent or strong trends are not yet clear. However, methods emerging from molecular genetics allow discrimination of hypotheses about the determinants of mutational load, such as effective population size, inbreeding, and selection, in domestic systems. These findings make us rethink the effect of our current breeding schemes on fitness of populations.
Collapse
Affiliation(s)
- Mirte Bosse
- Animal Breeding and GenomicsWageningen University & ResearchWageningenThe Netherlands
| | - Hendrik‐Jan Megens
- Animal Breeding and GenomicsWageningen University & ResearchWageningenThe Netherlands
| | - Martijn F. L. Derks
- Animal Breeding and GenomicsWageningen University & ResearchWageningenThe Netherlands
| | - Ángeles M. R. de Cara
- Centre d’Ecologie Fonctionnelle et EvolutiveCNRSUniversité de MontpellierUniversité Paul Valéry Montpellier 3EPHE, IRDMontpellierFrance
| | - Martien A. M. Groenen
- Animal Breeding and GenomicsWageningen University & ResearchWageningenThe Netherlands
| |
Collapse
|
18
|
Historical Genomes Reveal the Genomic Consequences of Recent Population Decline in Eastern Gorillas. Curr Biol 2018; 29:165-170.e6. [PMID: 30595519 DOI: 10.1016/j.cub.2018.11.055] [Citation(s) in RCA: 91] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Revised: 11/12/2018] [Accepted: 11/21/2018] [Indexed: 12/30/2022]
Abstract
Many endangered species have experienced severe population declines within the last centuries [1, 2]. However, despite concerns about negative fitness effects resulting from increased genetic drift and inbreeding, there is a lack of empirical data on genomic changes in conjunction with such declines [3-7]. Here, we use whole genomes recovered from century-old historical museum specimens to quantify the genomic consequences of small population size in the critically endangered Grauer's and endangered mountain gorillas. We find a reduction of genetic diversity and increase in inbreeding and genetic load in the Grauer's gorilla, which experienced severe population declines in recent decades. In contrast, the small but relatively stable mountain gorilla population has experienced little genomic change during the last century. These results suggest that species histories as well as the rate of demographic change may influence how population declines affect genome diversity.
Collapse
|
19
|
Kim BY, Huber CD, Lohmueller KE. Deleterious variation shapes the genomic landscape of introgression. PLoS Genet 2018; 14:e1007741. [PMID: 30346959 PMCID: PMC6233928 DOI: 10.1371/journal.pgen.1007741] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2018] [Revised: 11/13/2018] [Accepted: 10/05/2018] [Indexed: 11/19/2022] Open
Abstract
While it is appreciated that population size changes can impact patterns of deleterious variation in natural populations, less attention has been paid to how gene flow affects and is affected by the dynamics of deleterious variation. Here we use population genetic simulations to examine how gene flow impacts deleterious variation under a variety of demographic scenarios, mating systems, dominance coefficients, and recombination rates. Our results show that admixture between populations can temporarily reduce the genetic load of smaller populations and cause increases in the frequency of introgressed ancestry, especially if deleterious mutations are recessive. Additionally, when fitness effects of new mutations are recessive, between-population differences in the sites at which deleterious variants exist creates heterosis in hybrid individuals. Together, these factors lead to an increase in introgressed ancestry, particularly when recombination rates are low. Under certain scenarios, introgressed ancestry can increase from an initial frequency of 5% to 30–75% and fix at many loci, even in the absence of beneficial mutations. Further, deleterious variation and admixture can generate correlations between the frequency of introgressed ancestry and recombination rate or exon density, even in the absence of other types of selection. The direction of these correlations is determined by the specific demography and whether mutations are additive or recessive. Therefore, it is essential that null models of admixture include both demography and deleterious variation before invoking other mechanisms to explain unusual patterns of genetic variation. Individuals from distinct populations sometimes will produce fertile offspring and will exchange genetic material in a process called hybridization. Genomes of hybrid individuals often show non-random patterns of hybrid ancestry across the genome, where some regions have a high frequency of ancestry from the second population and other regions have less. Typically, this pattern has been attributed to adaptive introgression, where beneficial genetic variants are passed from one population to the other, or to genomic incompatibilities between these distinct species. However, other mechanisms could lead to these heterogeneous patterns of ancestry in hybrids. Here we use simulations to investigate whether deleterious mutations affect the patterns of introgressed ancestry across genomes. We show that when ancestry from a larger population is added to a smaller population, the ancestry from the larger population dramatically increases in frequency because it carries fewer deleterious mutations. This occurs even in the absence of beneficial mutations in either population. Additionally, we show that differences in sex chromosome evolution relative to autosomes, or differences in mating system, can affect patterns of introgression in similar ways. Our study argues that deleterious mutations should be included in population genetic models used to identify unusual regions of the genome that appear to be under selection in hybrids.
Collapse
Affiliation(s)
- Bernard Y. Kim
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, United States of America
| | - Christian D. Huber
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, United States of America
| | - Kirk E. Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, United States of America
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California, United States of America
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California, United States of America
- * E-mail:
| |
Collapse
|
20
|
Relaxed Selection During a Recent Human Expansion. Genetics 2017; 208:763-777. [PMID: 29187508 DOI: 10.1534/genetics.117.300551] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2017] [Accepted: 11/22/2017] [Indexed: 01/15/2023] Open
Abstract
Humans have colonized the planet through a series of range expansions, which deeply impacted genetic diversity in newly settled areas and potentially increased the frequency of deleterious mutations on expanding wave fronts. To test this prediction, we studied the genomic diversity of French Canadians who colonized Quebec in the 17th century. We used historical information and records from ∼4000 ascending genealogies to select individuals whose ancestors lived mostly on the colonizing wave front and individuals whose ancestors remained in the core of the settlement. Comparison of exomic diversity reveals that: (i) both new and low-frequency variants are significantly more deleterious in front than in core individuals, (ii) equally deleterious mutations are at higher frequencies in front individuals, and (iii) front individuals are two times more likely to be homozygous for rare very deleterious mutations present in Europeans. These differences have emerged in the past six to nine generations and cannot be explained by differential inbreeding, but are consistent with relaxed selection mainly due to higher rates of genetic drift on the wave front. Demographic inference and modeling of the evolution of rare variants suggest lower effective size on the front, and lead to an estimation of selection coefficients that increase with conservation scores. Even though range expansions have had a relatively limited impact on the overall fitness of French Canadians, they could explain the higher prevalence of recessive genetic diseases in recently settled regions of Quebec.
Collapse
|
21
|
A Temporal Perspective on the Interplay of Demography and Selection on Deleterious Variation in Humans. G3-GENES GENOMES GENETICS 2017; 7:1027-1037. [PMID: 28159863 PMCID: PMC5345704 DOI: 10.1534/g3.117.039651] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
When mutations have small effects on fitness, population size plays an important role in determining the amount and nature of deleterious genetic variation. The extent to which recent population size changes have impacted deleterious variation in humans has been a question of considerable interest and debate. An emerging consensus is that the Out-of-Africa bottleneck and subsequent growth events have been too short to cause meaningful differences in genetic load between populations; though changes in the number and average frequencies of deleterious variants have taken place. To provide more support for this view and to offer additional insight into the divergent evolution of deleterious variation across populations, we numerically solve time-inhomogeneous diffusion equations and study the temporal dynamics of the frequency spectra in models of population size change for modern humans. We observe how the response to demographic change differs by the strength of selection, and we then assess whether similar patterns are observed in exome sequence data from 33,370 and 5203 individuals of non-Finnish European and West African ancestry, respectively. Our theoretical results highlight how even simple summaries of the frequency spectrum can have complex responses to demographic change. These results support the finding that some apparent discrepancies between previous results have been driven by the behaviors of the precise summaries of deleterious variation. Further, our empirical results make clear the difficulty of inferring slight differences in frequency spectra using recent next-generation sequence data.
Collapse
|
22
|
A Model of Compound Heterozygous, Loss-of-Function Alleles Is Broadly Consistent with Observations from Complex-Disease GWAS Datasets. PLoS Genet 2017; 13:e1006573. [PMID: 28103232 PMCID: PMC5289629 DOI: 10.1371/journal.pgen.1006573] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Revised: 02/02/2017] [Accepted: 01/05/2017] [Indexed: 12/17/2022] Open
Abstract
The genetic component of complex disease risk in humans remains largely unexplained. A corollary is that the allelic spectrum of genetic variants contributing to complex disease risk is unknown. Theoretical models that relate population genetic processes to the maintenance of genetic variation for quantitative traits may suggest profitable avenues for future experimental design. Here we use forward simulation to model a genomic region evolving under a balance between recurrent deleterious mutation and Gaussian stabilizing selection. We consider multiple genetic and demographic models, and several different methods for identifying genomic regions harboring variants associated with complex disease risk. We demonstrate that the model of gene action, relating genotype to phenotype, has a qualitative effect on several relevant aspects of the population genetic architecture of a complex trait. In particular, the genetic model impacts genetic variance component partitioning across the allele frequency spectrum and the power of statistical tests. Models with partial recessivity closely match the minor allele frequency distribution of significant hits from empirical genome-wide association studies without requiring homozygous effect sizes to be small. We highlight a particular gene-based model of incomplete recessivity that is appealing from first principles. Under that model, deleterious mutations in a genomic region partially fail to complement one another. This model of gene-based recessivity predicts the empirically observed inconsistency between twin and SNP based estimated of dominance heritability. Furthermore, this model predicts considerable levels of unexplained variance associated with intralocus epistasis. Our results suggest a need for improved statistical tools for region based genetic association and heritability estimation. Gene action determines how mutations affect phenotype. When placed in an evolutionary context, the details of the genotype-to-phenotype model can impact the maintenance of genetic variation for complex traits. Likewise, non-equilibrium demographic history may affect patterns of genetic variation. Here, we explore the impact of genetic model and population growth on distribution of genetic variance across the allele frequency spectrum underlying risk for a complex disease. Using forward-in-time population genetic simulations, we show that the genetic model has important impacts on the composition of variation for complex disease risk in a population. We explicitly simulate genome-wide association studies (GWAS) and perform heritability estimation on population samples. A particular model of gene-based partial recessivity, based on allelic non-complementation, aligns well with empirical results. This model is congruent with the dominance variance estimates from both SNPs and twins, and the minor allele frequency distribution of GWAS hits.
Collapse
|
23
|
Fijarczyk A, Dudek K, Babik W. Selective Landscapes in newt Immune Genes Inferred from Patterns of Nucleotide Variation. Genome Biol Evol 2016; 8:3417-3432. [PMID: 27702815 PMCID: PMC5203778 DOI: 10.1093/gbe/evw236] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Host–pathogen interactions may result in either directional selection or in pressure for the maintenance of polymorphism at the molecular level. Hence signatures of both positive and balancing selection are expected in immune genes. Because both overall selective pressure and specific targets may differ between species, large-scale population genomic studies are useful in detecting functionally important immune genes and comparing selective landscapes between taxa. Such studies are of particular interest in amphibians, a group threatened worldwide by emerging infectious diseases. Here, we present an analysis of polymorphism and divergence of 634 immune genes in two lineages of Lissotriton newts: L. montandoni and L. vulgaris graecus. Variation in newt immune genes has been shaped predominantly by widespread purifying selection and strong evolutionary constraint, implying long-term importance of these genes for functioning of the immune system. The two evolutionary lineages differ in the overall strength of purifying selection which can partially be explained by demographic history but may also signal differences in long-term pathogen pressure. The prevalent constraint notwithstanding, 23 putative targets of positive selection and 11 putative targets of balancing selection were identified. The latter were detected by composite tests involving the demographic model and further validated in independent population samples. Putative targets of balancing selection encode proteins which may interact closely with pathogens but include also regulators of immune response. The identified candidates will be useful for testing whether genes affected by balancing selection are more prone to interspecific introgression than other genes in the genome.
Collapse
Affiliation(s)
- Anna Fijarczyk
- Institute of Environmental Sciences, Jagiellonian University, Kraków, Poland
| | - Katarzyna Dudek
- Institute of Environmental Sciences, Jagiellonian University, Kraków, Poland
| | - Wieslaw Babik
- Institute of Environmental Sciences, Jagiellonian University, Kraków, Poland
| |
Collapse
|
24
|
Gao F, Keinan A. Explosive genetic evidence for explosive human population growth. Curr Opin Genet Dev 2016; 41:130-139. [PMID: 27710906 PMCID: PMC5161661 DOI: 10.1016/j.gde.2016.09.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Revised: 08/26/2016] [Accepted: 09/11/2016] [Indexed: 11/19/2022]
Abstract
The advent of next-generation sequencing technology has allowed the collection of vast amounts of genetic variation data. A recurring discovery from studying larger and larger samples of individuals had been the extreme, previously unexpected, excess of very rare genetic variants, which has been shown to be mostly due to the recent explosive growth of human populations. Here, we review recent literature that inferred recent changes in population size in different human populations and with different methodologies, with many pointing to recent explosive growth, especially in European populations for which more data has been available. We also review the state-of-the-art methods and software for the inference of historical population size changes that lead to these discoveries. Finally, we discuss the implications of recent population growth on personalized genomics, on purifying selection in the non-equilibrium state it entails and, as a consequence, on the genetic architecture underlying complex disease and the performance of mapping methods in discovering rare variants that contribute to complex disease risk.
Collapse
Affiliation(s)
- Feng Gao
- Department of Biological Statistics and Computational Biology, Ithaca, NY 14850, United States
| | - Alon Keinan
- Department of Biological Statistics and Computational Biology, Ithaca, NY 14850, United States.
| |
Collapse
|
25
|
Freedman AH, Lohmueller KE, Wayne RK. Evolutionary History, Selective Sweeps, and Deleterious Variation in the Dog. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2016. [DOI: 10.1146/annurev-ecolsys-121415-032155] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The dog is our oldest domesticate and has experienced a wide variety of demographic histories, including a bottleneck associated with domestication and individual bottlenecks associated with the formation of modern breeds. Admixture with gray wolves, and among dog breeds and populations, has also occurred throughout its history. Likewise, the intensity and focus of selection have varied, from an initial focus on traits enhancing cohabitation with humans, to more directed selection on specific phenotypic characteristics and behaviors. In this review, we summarize and synthesize genetic findings from genome-wide and complete genome studies that document the genomic consequences of demography and selection, including the effects on adaptive and deleterious variation. Consistent with the evolutionary history of the dog, signals of natural and artificial selection are evident in the dog genome. However, conclusions from studies of positive selection are fraught with the problem of false positives given that demographic history is often not taken into account.
Collapse
Affiliation(s)
- Adam H. Freedman
- Informatics Group, Faculty of Arts and Sciences, Harvard University, Cambridge, Massachusetts 02138
| | - Kirk E. Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095
| | - Robert K. Wayne
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095
| |
Collapse
|
26
|
Pfeifer SP, Jensen JD. The Impact of Linked Selection in Chimpanzees: A Comparative Study. Genome Biol Evol 2016; 8:3202-3208. [PMID: 27678122 PMCID: PMC5174744 DOI: 10.1093/gbe/evw240] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Levels of nucleotide diversity vary greatly across the genomes of most species owing to multiple factors. These include variation in the underlying mutation rates, as well as the effects of both direct and linked selection. Fundamental to interpreting the relative importance of these forces is the common observation of a strong positive correlation between nucleotide diversity and recombination rate. While indeed observed in humans, the interpretation of this pattern has been difficult in the absence of high-quality polymorphism data and recombination maps in closely related species. Here, we characterize genetic features driving nucleotide diversity in Western chimpanzees using a recently generated whole genome polymorphism data set. Our results suggest that recombination rate is the primary predictor of nucleotide variation with a strongly positive correlation. In addition, telomeric distance, regional GC-content, and regional CpG-island content are strongly negatively correlated with variation. These results are compared with humans, with both similarities and differences interpreted in the light of the estimated effective population sizes of the two species as well as their strongly differing recent demographic histories.
Collapse
Affiliation(s)
- Susanne P Pfeifer
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland .,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.,School of Life Sciences, Arizona State University (ASU), Tempe, Arizona
| | - Jeffrey D Jensen
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.,School of Life Sciences, Arizona State University (ASU), Tempe, Arizona
| |
Collapse
|
27
|
Xue C, Raveendran M, Harris RA, Fawcett GL, Liu X, White S, Dahdouli M, Rio Deiros D, Below JE, Salerno W, Cox L, Fan G, Ferguson B, Horvath J, Johnson Z, Kanthaswamy S, Kubisch HM, Liu D, Platt M, Smith DG, Sun B, Vallender EJ, Wang F, Wiseman RW, Chen R, Muzny DM, Gibbs RA, Yu F, Rogers J. The population genomics of rhesus macaques (Macaca mulatta) based on whole-genome sequences. Genome Res 2016; 26:1651-1662. [PMID: 27934697 PMCID: PMC5131817 DOI: 10.1101/gr.204255.116] [Citation(s) in RCA: 73] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2016] [Accepted: 10/12/2016] [Indexed: 12/30/2022]
Abstract
Rhesus macaques (Macaca mulatta) are the most widely used nonhuman primate in biomedical research, have the largest natural geographic distribution of any nonhuman primate, and have been the focus of much evolutionary and behavioral investigation. Consequently, rhesus macaques are one of the most thoroughly studied nonhuman primate species. However, little is known about genome-wide genetic variation in this species. A detailed understanding of extant genomic variation among rhesus macaques has implications for the use of this species as a model for studies of human health and disease, as well as for evolutionary population genomics. Whole-genome sequencing analysis of 133 rhesus macaques revealed more than 43.7 million single-nucleotide variants, including thousands predicted to alter protein sequences, transcript splicing, and transcription factor binding sites. Rhesus macaques exhibit 2.5-fold higher overall nucleotide diversity and slightly elevated putative functional variation compared with humans. This functional variation in macaques provides opportunities for analyses of coding and noncoding variation, and its cellular consequences. Despite modestly higher levels of nonsynonymous variation in the macaques, the estimated distribution of fitness effects and the ratio of nonsynonymous to synonymous variants suggest that purifying selection has had stronger effects in rhesus macaques than in humans. Demographic reconstructions indicate this species has experienced a consistently large but fluctuating population size. Overall, the results presented here provide new insights into the population genomics of nonhuman primates and expand genomic information directly relevant to primate models of human disease.
Collapse
Affiliation(s)
- Cheng Xue
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Muthuswamy Raveendran
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - R Alan Harris
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Gloria L Fawcett
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Xiaoming Liu
- University of Texas Health Science Center, Houston, Texas 77030, USA
| | - Simon White
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Mahmoud Dahdouli
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - David Rio Deiros
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Jennifer E Below
- University of Texas Health Science Center, Houston, Texas 77030, USA
| | - William Salerno
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Laura Cox
- Southwest National Primate Research Center, San Antonio, Texas 78227, USA
| | - Guoping Fan
- Department of Human Genetics, University of California, Los Angeles, California 90095, USA
| | - Betsy Ferguson
- Oregon National Primate Research Center, Beaverton, Oregon 97006, USA
| | - Julie Horvath
- North Carolina Museum of Natural Sciences, Raleigh, North Carolina 27601, USA.,Biological and Biomedical Sciences, North Carolina Central University, Durham, North Carolina 27707, USA.,Department of Evolutionary Anthropology, Duke University, Durham, North Carolina 27708, USA
| | - Zach Johnson
- Yerkes National Primate Research Center, Atlanta, Georgia 30322, USA
| | - Sree Kanthaswamy
- California National Primate Research Center, Davis, California 95616, USA.,School of Mathematical and Natural Sciences, Arizona State University, Phoenix, Arizona 85004, USA
| | - H Michael Kubisch
- Tulane National Primate Research Center, Covington, Louisiana 70433, USA
| | - Dahai Liu
- Center for Stem Cell and Translational Medicine, Anhui University, Anhui, China 230601
| | - Michael Platt
- Department of Neurobiology, Duke University, Durham, North Carolina 27708, USA.,Department of Neuroscience, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - David G Smith
- California National Primate Research Center, Davis, California 95616, USA
| | - Binghua Sun
- Center for Stem Cell and Translational Medicine, Anhui University, Anhui, China 230601
| | - Eric J Vallender
- Tulane National Primate Research Center, Covington, Louisiana 70433, USA.,New England National Primate Research Center, Southborough, Massachusetts 01772, USA
| | - Feng Wang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Roger W Wiseman
- Wisconsin National Primate Research Center, Madison, Wisconsin 53711, USA
| | - Rui Chen
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Donna M Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Fuli Yu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Jeffrey Rogers
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| |
Collapse
|
28
|
Simons YB, Sella G. The impact of recent population history on the deleterious mutation load in humans and close evolutionary relatives. Curr Opin Genet Dev 2016; 41:150-158. [PMID: 27744216 DOI: 10.1016/j.gde.2016.09.006] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2016] [Revised: 09/13/2016] [Accepted: 09/18/2016] [Indexed: 01/22/2023]
Abstract
Over the past decade, there has been both great interest and confusion about whether recent demographic events-notably the Out-of-Africa-bottleneck and recent population growth-have led to differences in mutation load among human populations. The confusion can be traced to the use of different summary statistics to measure load, which lead to apparently conflicting results. We argue, however, that when statistics more directly related to load are used, the results of different studies and data sets consistently reveal little or no difference in the load of non-synonymous mutations among human populations. Theory helps to understand why no such differences are seen, as well as to predict in what settings they are to be expected. In particular, as predicted by modeling, there is evidence for changes in the load of recessive loss of function mutations in founder and inbred human populations. Also as predicted, eastern subspecies of gorilla, Neanderthals and Denisovans, who are thought to have undergone reductions in population sizes that exceed the human Out-of-Africa bottleneck in duration and severity, show evidence for increased load of non-synonymous mutations (relative to western subspecies of gorillas and modern humans, respectively). A coherent picture is thus starting to emerge about the effects of demographic history on the mutation load in populations of humans and close evolutionary relatives.
Collapse
Affiliation(s)
- Yuval B Simons
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Guy Sella
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA.
| |
Collapse
|
29
|
Uricchio LH, Zaitlen NA, Ye CJ, Witte JS, Hernandez RD. Selection and explosive growth alter genetic architecture and hamper the detection of causal rare variants. Genome Res 2016; 26:863-73. [PMID: 27197206 PMCID: PMC4937562 DOI: 10.1101/gr.202440.115] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2015] [Accepted: 05/16/2016] [Indexed: 12/20/2022]
Abstract
The role of rare alleles in complex phenotypes has been hotly debated, but most rare variant association tests (RVATs) do not account for the evolutionary forces that affect genetic architecture. Here, we use simulation and numerical algorithms to show that explosive population growth, as experienced by human populations, can dramatically increase the impact of very rare alleles on trait variance. We then assess the ability of RVATs to detect causal loci using simulations and human RNA-seq data. Surprisingly, we find that statistical performance is worst for phenotypes in which genetic variance is due mainly to rare alleles, and explosive population growth decreases power. Although many studies have attempted to identify causal rare variants, few have reported novel associations. This has sometimes been interpreted to mean that rare variants make negligible contributions to complex trait heritability. Our work shows that RVATs are not robust to realistic human evolutionary forces, so general conclusions about the impact of rare variants on complex traits may be premature.
Collapse
Affiliation(s)
- Lawrence H Uricchio
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California 94143, USA; Graduate Program in Bioinformatics, University of California, San Francisco, San Francisco, California 94143, USA
| | - Noah A Zaitlen
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California 94143, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, California 94143, USA; Institute for Quantitative Biosciences (QB3), University of California, San Francisco, San Francisco, California 94143, USA
| | - Chun Jimmie Ye
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California 94143, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, California 94143, USA; Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, California 94143, USA
| | - John S Witte
- Institute for Human Genetics, University of California, San Francisco, San Francisco, California 94143, USA; Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, California 94143, USA
| | - Ryan D Hernandez
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California 94143, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, California 94143, USA; Institute for Quantitative Biosciences (QB3), University of California, San Francisco, San Francisco, California 94143, USA
| |
Collapse
|
30
|
Barbieri C, Hübner A, Macholdt E, Ni S, Lippold S, Schröder R, Mpoloka SW, Purps J, Roewer L, Stoneking M, Pakendorf B. Refining the Y chromosome phylogeny with southern African sequences. Hum Genet 2016; 135:541-553. [PMID: 27043341 PMCID: PMC4835522 DOI: 10.1007/s00439-016-1651-0] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2016] [Accepted: 02/18/2016] [Indexed: 12/04/2022]
Abstract
The recent availability of large-scale sequence data for the human Y chromosome has revolutionized analyses of and insights gained from this non-recombining, paternally inherited chromosome. However, the studies to date focus on Eurasian variation, and hence the diversity of early-diverging branches found in Africa has not been adequately documented. Here, we analyze over 900 kb of Y chromosome sequence obtained from 547 individuals from southern African Khoisan- and Bantu-speaking populations, identifying 232 new sequences from basal haplogroups A and B. We identify new clades in the phylogeny, an older age for the root, and substantially older ages for some individual haplogroups. Furthermore, while haplogroup B2a is traditionally associated with the spread of Bantu speakers, we find that it probably also existed in Khoisan groups before the arrival of Bantu speakers. Finally, there is pronounced variation in branch length between major haplogroups; in particular, haplogroups associated with Bantu speakers have significantly longer branches. Technical artifacts cannot explain this branch length variation, which instead likely reflects aspects of the demographic history of Bantu speakers, such as recent population expansion and an older average paternal age. The influence of demographic factors on branch length variation has broader implications both for the human Y phylogeny and for similar analyses of other species.
Collapse
Affiliation(s)
- Chiara Barbieri
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103, Leipzig, Germany.
- Department of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, 07745, Jena, Germany.
| | - Alexander Hübner
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103, Leipzig, Germany
| | - Enrico Macholdt
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103, Leipzig, Germany
| | - Shengyu Ni
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103, Leipzig, Germany
| | - Sebastian Lippold
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103, Leipzig, Germany
| | - Roland Schröder
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103, Leipzig, Germany
| | | | - Josephine Purps
- Department of Forensic Genetics, Institute of Legal Medicine and Forensic Sciences, Charité-Universitätsmedizin, 10559, Berlin, Germany
| | - Lutz Roewer
- Department of Forensic Genetics, Institute of Legal Medicine and Forensic Sciences, Charité-Universitätsmedizin, 10559, Berlin, Germany
| | - Mark Stoneking
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103, Leipzig, Germany
| | - Brigitte Pakendorf
- Dynamique du Langage, UMR5596, CNRS & Université Lyon 2, 69363, Lyon Cedex 07, France.
| |
Collapse
|
31
|
A flexible method for estimating the fraction of fitness influencing mutations from large sequencing data sets. Genome Res 2016; 26:834-43. [PMID: 27197222 PMCID: PMC4889975 DOI: 10.1101/gr.203059.115] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Accepted: 04/14/2016] [Indexed: 01/07/2023]
Abstract
A continuing challenge in the analysis of massively large sequencing data sets is quantifying and interpreting non-neutrally evolving mutations. Here, we describe a flexible and robust approach based on the site frequency spectrum to estimate the fraction of deleterious and adaptive variants from large-scale sequencing data sets. We applied our method to approximately 1 million single nucleotide variants (SNVs) identified in high-coverage exome sequences of 6515 individuals. We estimate that the fraction of deleterious nonsynonymous SNVs is higher than previously reported; quantify the effects of genomic context, codon bias, chromatin accessibility, and number of protein-protein interactions on deleterious protein-coding SNVs; and identify pathways and networks that have likely been influenced by positive selection. Furthermore, we show that the fraction of deleterious nonsynonymous SNVs is significantly higher for Mendelian versus complex disease loci and in exons harboring dominant versus recessive Mendelian mutations. In summary, as genome-scale sequencing data accumulate in progressively larger sample sizes, our method will enable increasingly high-resolution inferences into the characteristics and determinants of non-neutral variation.
Collapse
|
32
|
Jeroncic A, Memari Y, Ritchie GR, Hendricks AE, Kolb-Kokocinski A, Matchan A, Vitart V, Hayward C, Kolcic I, Glodzik D, Wright AF, Rudan I, Campbell H, Durbin R, Polašek O, Zeggini E, Boraska Perica V. Whole-exome sequencing in an isolated population from the Dalmatian island of Vis. Eur J Hum Genet 2016; 24:1479-87. [PMID: 27049301 PMCID: PMC4950961 DOI: 10.1038/ejhg.2016.23] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2015] [Revised: 02/07/2016] [Accepted: 02/17/2016] [Indexed: 12/14/2022] Open
Abstract
We have whole-exome sequenced 176 individuals from the isolated population of the island of Vis in Croatia in order to describe exonic variation architecture. We found 290 577 single nucleotide variants (SNVs), 65% of which are singletons, low frequency or rare variants. A total of 25 430 (9%) SNVs are novel, previously not catalogued in NHLBI GO Exome Sequencing Project, UK10K-Generation Scotland, 1000Genomes Project, ExAC or NCBI Reference Assembly dbSNP. The majority of these variants (76%) are singletons. Comparable to data obtained from UK10K-Generation Scotland that were sequenced and analysed using the same protocols, we detected an enrichment of potentially damaging variants (non-synonymous and loss-of-function) in the low frequency and common variant categories. On average 115 (range 93–140) genotypes with loss-of-function variants, 23 (15–34) of which were homozygous, were identified per person. The landscape of loss-of-function variants across an exome revealed that variants mainly accumulated in genes on the xenobiotic-related pathways, of which majority coded for enzymes. The frequency of loss-of-function variants was additionally increased in Vis runs of homozygosity regions where variants mainly affected signalling pathways. This work confirms the isolate status of Vis population by means of whole-exome sequence and reveals the pattern of loss-of-function mutations, which resembles the trails of adaptive evolution that were found in other species. By cataloguing the exomic variants and describing the allelic structure of the Vis population, this study will serve as a valuable resource for future genetic studies of human diseases, population genetics and evolution in this population.
Collapse
Affiliation(s)
- Ana Jeroncic
- Department of Research in Biomedicine and Health, University of Split School of Medicine, Split, Croatia
| | - Yasin Memari
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| | | | - Audrey E Hendricks
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.,Department of Mathematical and Statistical Sciences, University of Colorado, Denver, CO, USA
| | | | | | - Veronique Vitart
- MRC Human Genetics Unit, Institute for Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK
| | - Caroline Hayward
- MRC Human Genetics Unit, Institute for Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK
| | - Ivana Kolcic
- Department of Public Health, University of Split School of Medicine, Split, Croatia
| | - Dominik Glodzik
- MRC Human Genetics Unit, Institute for Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK
| | - Alan F Wright
- MRC Human Genetics Unit, Institute for Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK
| | - Igor Rudan
- Centre for Global Health Research, University of Edinburgh, Edinburgh, UK
| | - Harry Campbell
- Centre for Global Health Research, University of Edinburgh, Edinburgh, UK
| | | | - Ozren Polašek
- Department of Public Health, University of Split School of Medicine, Split, Croatia.,Centre for Global Health Research, University of Edinburgh, Edinburgh, UK
| | | | - Vesna Boraska Perica
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.,Department of Medical Biology, University of Split School of Medicine, Split, Croatia
| |
Collapse
|
33
|
Abstract
Deleterious alleles can reach high frequency in small populations because of random fluctuations in allele frequency. This may lead, over time, to reduced average fitness. In this sense, selection is more "effective" in larger populations. Recent studies have considered whether the different demographic histories across human populations have resulted in differences in the number, distribution, and severity of deleterious variants, leading to an animated debate. This article first seeks to clarify some terms of the debate by identifying differences in definitions and assumptions used in recent studies. We argue that variants of Morton, Crow, and Muller's "total mutational damage" provide the soundest and most practical basis for such comparisons. Using simulations, analytical calculations, and 1000 Genomes Project data, we provide an intuitive and quantitative explanation for the observed similarity in genetic load across populations. We show that recent demography has likely modulated the effect of selection and still affects it, but the net result of the accumulated differences is small. Direct observation of differential efficacy of selection for specific allele classes is nevertheless possible with contemporary data sets. By contrast, identifying average genome-wide differences in the efficacy of selection across populations will require many modeling assumptions and is unlikely to provide much biological insight about human populations.
Collapse
|
34
|
Subramanian S. The effects of sample size on population genomic analyses--implications for the tests of neutrality. BMC Genomics 2016; 17:123. [PMID: 26897757 PMCID: PMC4761153 DOI: 10.1186/s12864-016-2441-8] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2015] [Accepted: 02/05/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND One of the fundamental measures of molecular genetic variation is the Watterson's estimator (θ), which is based on the number of segregating sites. The estimation of θ is unbiased only under neutrality and constant population growth. It is well known that the estimation of θ is biased when these assumptions are violated. However, the effects of sample size in modulating the bias was not well appreciated. RESULTS We examined this issue in detail based on large-scale exome data and robust simulations. Our investigation revealed that sample size appreciably influences θ estimation and this effect was much higher for constrained genomic regions than that of neutral regions. For instance, θ estimated for synonymous sites using 512 human exomes was 1.9 times higher than that obtained using 16 exomes. However, this difference was 2.5 times for the nonsynonymous sites of the same data. We observed a positive correlation between the rate of increase in θ estimates (with respect to the sample size) and the magnitude of selection pressure. For example, θ estimated for the nonsynonymous sites of highly constrained genes (dN/dS < 0.1) using 512 exomes was 3.6 times higher than that estimated using 16 exomes. In contrast this difference was only 2 times for the less constrained genes (dN/dS > 0.9). CONCLUSIONS The results of this study reveal the extent of underestimation owing to small sample sizes and thus emphasize the importance of sample size in estimating a number of population genomic parameters. Our results have serious implications for neutrality tests such as Tajima D, Fu-Li D and those based on the McDonald and Kreitman test: Neutrality Index and the fraction of adaptive substitutions. For instance, use of 16 exomes produced 2.4 times higher proportion of adaptive substitutions compared to that obtained using 512 exomes (24% vs 10 %).
Collapse
Affiliation(s)
- Sankar Subramanian
- Research Centre for Human Evolution, Environmental Futures Research Institute, Griffith University, 170 Kessels Road, Nathan, Qld, 4111, Australia.
| |
Collapse
|
35
|
Marsden CD, Ortega-Del Vecchyo D, O'Brien DP, Taylor JF, Ramirez O, Vilà C, Marques-Bonet T, Schnabel RD, Wayne RK, Lohmueller KE. Bottlenecks and selective sweeps during domestication have increased deleterious genetic variation in dogs. Proc Natl Acad Sci U S A 2016; 113:152-7. [PMID: 26699508 PMCID: PMC4711855 DOI: 10.1073/pnas.1512501113] [Citation(s) in RCA: 187] [Impact Index Per Article: 23.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Population bottlenecks, inbreeding, and artificial selection can all, in principle, influence levels of deleterious genetic variation. However, the relative importance of each of these effects on genome-wide patterns of deleterious variation remains controversial. Domestic and wild canids offer a powerful system to address the role of these factors in influencing deleterious variation because their history is dominated by known bottlenecks and intense artificial selection. Here, we assess genome-wide patterns of deleterious variation in 90 whole-genome sequences from breed dogs, village dogs, and gray wolves. We find that the ratio of amino acid changing heterozygosity to silent heterozygosity is higher in dogs than in wolves and, on average, dogs have 2-3% higher genetic load than gray wolves. Multiple lines of evidence indicate this pattern is driven by less efficient natural selection due to bottlenecks associated with domestication and breed formation, rather than recent inbreeding. Further, we find regions of the genome implicated in selective sweeps are enriched for amino acid changing variants and Mendelian disease genes. To our knowledge, these results provide the first quantitative estimates of the increased burden of deleterious variants directly associated with domestication and have important implications for selective breeding programs and the conservation of rare and endangered species. Specifically, they highlight the costs associated with selective breeding and question the practice favoring the breeding of individuals that best fit breed standards. Our results also suggest that maintaining a large population size, rather than just avoiding inbreeding, is a critical factor for preventing the accumulation of deleterious variants.
Collapse
Affiliation(s)
- Clare D Marsden
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095
| | | | - Dennis P O'Brien
- Department of Veterinary Medicine and Surgery, University of Missouri, Columbia, MO 65211
| | - Jeremy F Taylor
- Division of Animal Sciences, University of Missouri, Columbia, MO 65211
| | - Oscar Ramirez
- Institut Catala de Recerca i Estudis Avançats, Institut de Biologia Evolutiva (Centro Superior de Investigaciones Cientificas-Universitat Pompeu Fabra), 08003 Barcelona, Spain
| | - Carles Vilà
- Conservation and Evolutionary Genetics Group, Estación Biológica de Doñana-Consejo Superior de Investigaciones Cientificas, 41092, Seville, Spain
| | - Tomas Marques-Bonet
- Institut Catala de Recerca i Estudis Avançats, Institut de Biologia Evolutiva (Centro Superior de Investigaciones Cientificas-Universitat Pompeu Fabra), 08003 Barcelona, Spain; Centro Nacional Analasis Genomico, 08023, Barcelona, Spain
| | - Robert D Schnabel
- Division of Animal Sciences, University of Missouri, Columbia, MO 65211; Informatics Institute, University of Missouri, Columbia, MO 65211
| | - Robert K Wayne
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095
| | - Kirk E Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095; Interdepartmental Program in Bioinformatics, University of California, Los Angeles, CA 90095; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA 90095
| |
Collapse
|
36
|
Peck KM, Chan CHS, Tanaka MM. Connecting within-host dynamics to the rate of viral molecular evolution. Virus Evol 2015; 1:vev013. [PMID: 27774285 PMCID: PMC5014490 DOI: 10.1093/ve/vev013] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Viruses evolve rapidly, providing a unique system for understanding the processes that influence rates of molecular evolution. Neutral theory posits that the evolutionary rate increases linearly with the mutation rate. The occurrence of deleterious mutations causes this relationship to break down at high mutation rates. Previous studies have identified this as an important phenomenon, particularly for RNA viruses which can mutate at rates near the extinction threshold. We propose that in addition to mutation dynamics, viral within-host dynamics can also affect the between-host evolutionary rate. We present an analytical model that predicts the neutral evolution rate for viruses as a function of both within-host parameters and deleterious mutations. To examine the effect of more detailed aspects of the virus life cycle, we also present a computational model that simulates acute virus evolution using target cell-limited dynamics. Using influenza A virus as a case study, we find that our simulation model can predict empirical rates of evolution better than a model lacking within-host details. The analytical model does not perform as well as the simulation model but shows how the within-host basic reproductive number influences evolutionary rates. These findings lend support to the idea that the mutation rate alone is not sufficient to predict the evolutionary rate in viruses, instead calling for improved models that account for viral within-host dynamics.
Collapse
Affiliation(s)
- Kayla M Peck
- Department of Biology, University of North Carolina - Chapel Hill
| | - Carmen H S Chan
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia and; Evolution & Ecology Research Centre, University of New South Wales, Sydney, NSW, Australia
| | - Mark M Tanaka
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia and; Evolution & Ecology Research Centre, University of New South Wales, Sydney, NSW, Australia
| |
Collapse
|
37
|
Balick DJ, Do R, Cassa CA, Reich D, Sunyaev SR. Dominance of Deleterious Alleles Controls the Response to a Population Bottleneck. PLoS Genet 2015; 11:e1005436. [PMID: 26317225 PMCID: PMC4552954 DOI: 10.1371/journal.pgen.1005436] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2014] [Accepted: 07/09/2015] [Indexed: 11/30/2022] Open
Abstract
Population bottlenecks followed by re-expansions have been common throughout history of many populations. The response of alleles under selection to such demographic perturbations has been a subject of great interest in population genetics. On the basis of theoretical analysis and computer simulations, we suggest that this response qualitatively depends on dominance. The number of dominant or additive deleterious alleles per haploid genome is expected to be slightly increased following the bottleneck and re-expansion. In contrast, the number of completely or partially recessive alleles should be sharply reduced. Changes of population size expose differences between recessive and additive selection, potentially providing insight into the prevalence of dominance in natural populations. Specifically, we use a simple statistic, BR≡∑xipop1/∑xjpop2, where xi represents the derived allele frequency, to compare the number of mutations in different populations, and detail its functional dependence on the strength of selection and the intensity of the population bottleneck. We also provide empirical evidence showing that gene sets associated with autosomal recessive disease in humans may have a BR indicative of recessive selection. Together, these theoretical predictions and empirical observations show that complex demographic history may facilitate rather than impede inference of parameters of natural selection. Dominance has played a central role in classical genetics since its inception. However, the effect of dominance introduces substantial technical complications into theoretical models describing dynamics of alleles in populations. As a result, dominance is often ignored in population genetic models. Statistical tests for selection built on these models do not discriminate between recessive and additive alleles. We show that historical changes in population size can provide a way to differentiate between recessive and additive selection. Our analysis compares two sub-populations with different demographic histories. History of our own species provides plenty of examples of sub-populations that went through population bottlenecks followed by re-expansions. We show that demographic differences, which generally complicate the analysis, can instead aid in the inference of features of natural selection.
Collapse
Affiliation(s)
- Daniel J. Balick
- Division of Genetics, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
| | - Ron Do
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- The Center for Statistical Genetics, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Christopher A. Cassa
- Division of Genetics, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
| | - David Reich
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
- Howard Hughes Medical Institute, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Shamil R. Sunyaev
- Division of Genetics, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
38
|
Lohmueller KE. The distribution of deleterious genetic variation in human populations. Curr Opin Genet Dev 2015; 29:139-46. [PMID: 25461617 DOI: 10.1016/j.gde.2014.09.005] [Citation(s) in RCA: 86] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2014] [Revised: 08/28/2014] [Accepted: 09/05/2014] [Indexed: 11/19/2022]
Abstract
Population genetic studies suggest that most amino-acid changing mutations are deleterious. Such mutations are of tremendous interest in human population genetics as they are important for the evolutionary process and may contribute risk to common disease. Genomic studies over the past 5 years have documented differences across populations in the number of heterozygous deleterious genotypes, number of homozygous derived deleterious genotypes, number of deleterious segregating sites and proportion of sites that are potentially deleterious. These differences have been attributed to population history affecting the ability of natural selection to remove deleterious variants from the population. However, recent studies have suggested that the genetic load is the same across populations and that the efficacy of natural selection has not differed across human populations. Here I show that these observations are not incompatible with each other and that the apparent differences are due to examining different features of the genetic data and differing definitions of terms.
Collapse
|
39
|
Abstract
Next-generation sequencing technology has facilitated the discovery of millions of genetic variants in human genomes. A sizeable fraction of these variants are predicted to be deleterious. Here, we review the pattern of deleterious alleles as ascertained in genome sequencing data sets and ask whether human populations differ in their predicted burden of deleterious alleles - a phenomenon known as mutation load. We discuss three demographic models that are predicted to affect mutation load and relate these models to the evidence (or the lack thereof) for variation in the efficacy of purifying selection in diverse human genomes. We also emphasize why accurate estimation of mutation load depends on assumptions regarding the distribution of dominance and selection coefficients - quantities that remain poorly characterized for current genomic data sets.
Collapse
|
40
|
Boehm JT, Waldman J, Robinson JD, Hickerson MJ. Population genomics reveals seahorses (Hippocampus erectus) of the western mid-Atlantic coast to be residents rather than vagrants. PLoS One 2015; 10:e0116219. [PMID: 25629166 PMCID: PMC4309581 DOI: 10.1371/journal.pone.0116219] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2014] [Accepted: 10/20/2014] [Indexed: 11/29/2022] Open
Abstract
Understanding population structure and areas of demographic persistence and transients is critical for effective species management. However, direct observational evidence to address the geographic scale and delineation of ephemeral or persistent populations for many marine fishes is limited. The Lined seahorse (Hippocampus erectus) can be commonly found in three western Atlantic zoogeographic provinces, though inhabitants of the temperate northern Virginia Province are often considered tropical vagrants that only arrive during warm seasons from the southern provinces and perish as temperatures decline. Although genetics can locate regions of historical population persistence and isolation, previous evidence of Virginia Province persistence is only provisional due to limited genetic sampling (i.e., mitochondrial DNA and five nuclear loci). To test alternative hypotheses of historical persistence versus the ephemerality of a northern Virginia Province population we used a RADseq generated dataset consisting of 11,708 single nucleotide polymorphisms (SNP) sampled from individuals collected from the eastern Gulf of Mexico to Long Island, NY. Concordant results from genomic analyses all infer three genetically divergent subpopulations, and strongly support Virginia Province inhabitants as a genetically diverged and a historically persistent ancestral gene pool. These results suggest that individuals that emerge in coastal areas during the warm season can be considered "local" and supports offshore migration during the colder months. This research demonstrates how a large number of genes sampled across a geographical range can capture the diversity of coalescent histories (across loci) while inferring population history. Moreover, these results clearly demonstrate the utility of population genomic data to infer peripheral subpopulation persistence in difficult-to-observe species.
Collapse
Affiliation(s)
- J. T. Boehm
- Department of Biology, City College of New York, 160 Convent Ave., New York, New York, 10031, United States of America
- Subprogram in Ecology, Evolution and Behavior, The Graduate Center of the City University of New York, 365 5 Ave, New York, New York, 10016, United States of America
| | - John Waldman
- Biology Department, Queens College, City University of New York, 65-30 Kissena Blvd., Queens, New York, 11367-1597, United States of America
- Subprogram in Ecology, Evolution and Behavior, The Graduate Center of the City University of New York, 365 5 Ave, New York, New York, 10016, United States of America
| | - John D. Robinson
- South Carolina Department of Natural Resources, Marine Resources Research Institute, 217 Fort Johnson Rd., Charleston, South Carolina, 29412, United States of America
| | - Michael J. Hickerson
- Department of Biology, City College of New York, 160 Convent Ave., New York, New York, 10031, United States of America
- Subprogram in Ecology, Evolution and Behavior, The Graduate Center of the City University of New York, 365 5 Ave, New York, New York, 10016, United States of America
| |
Collapse
|
41
|
No evidence that selection has been less effective at removing deleterious mutations in Europeans than in Africans. Nat Genet 2015; 47:126-31. [PMID: 25581429 PMCID: PMC4310772 DOI: 10.1038/ng.3186] [Citation(s) in RCA: 130] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2014] [Accepted: 12/09/2014] [Indexed: 01/05/2023]
Abstract
Non-African populations have experienced size reductions in the time since their split from West Africans, leading to the hypothesis that natural selection to remove weakly deleterious mutations has been less effective in the history of non-Africans. To test this hypothesis, we measured the per-genome accumulation of non-synonymous substitutions across diverse pairs of populations. We find no evidence for a higher load of deleterious mutations in non-Africans. However, we detect significant differences among more divergent populations, as archaic Denisovans have accumulated non-synonymous mutations faster than either modern humans or Neanderthals. To reconcile these findings with patterns that have been interpreted as evidence of less effective removal of deleterious mutations in non-Africans than in West Africans, we use simulations to show that the observed patterns are not likely to reflect changes in the effectiveness of selection after the populations split, and instead are likely to be driven by other population genetic factors.
Collapse
|
42
|
Bhaskar A, Wang YXR, Song YS. Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data. Genome Res 2015; 25:268-79. [PMID: 25564017 PMCID: PMC4315300 DOI: 10.1101/gr.178756.114] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
With the recent increase in study sample sizes in human genetics, there has been growing interest in inferring historical population demography from genomic variation data. Here, we present an efficient inference method that can scale up to very large samples, with tens or hundreds of thousands of individuals. Specifically, by utilizing analytic results on the expected frequency spectrum under the coalescent and by leveraging the technique of automatic differentiation, which allows us to compute gradients exactly, we develop a very efficient algorithm to infer piecewise-exponential models of the historical effective population size from the distribution of sample allele frequencies. Our method is orders of magnitude faster than previous demographic inference methods based on the frequency spectrum. In addition to inferring demography, our method can also accurately estimate locus-specific mutation rates. We perform extensive validation of our method on simulated data and show that it can accurately infer multiple recent epochs of rapid exponential growth, a signal that is difficult to pick up with small sample sizes. Lastly, we use our method to analyze data from recent sequencing studies, including a large-sample exome-sequencing data set of tens of thousands of individuals assayed at a few hundred genic regions.
Collapse
Affiliation(s)
- Anand Bhaskar
- Simons Institute for the Theory of Computing, Berkeley, California 94720, USA; Computer Science Division, University of California, Berkeley, California 94720, USA
| | - Y X Rachel Wang
- Department of Statistics, University of California, Berkeley, California 94720, USA
| | - Yun S Song
- Simons Institute for the Theory of Computing, Berkeley, California 94720, USA; Computer Science Division, University of California, Berkeley, California 94720, USA; Department of Statistics, University of California, Berkeley, California 94720, USA; Department of Integrative Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
43
|
Bhaskar A, Song YS. DESCARTES' RULE OF SIGNS AND THE IDENTIFIABILITY OF POPULATION DEMOGRAPHIC MODELS FROM GENOMIC VARIATION DATA. Ann Stat 2014; 42:2469-2493. [PMID: 28018011 DOI: 10.1214/14-aos1264] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
The sample frequency spectrum (SFS) is a widely-used summary statistic of genomic variation in a sample of homologous DNA sequences. It provides a highly efficient dimensional reduction of large-scale population genomic data and its mathematical dependence on the underlying population demography is well understood, thus enabling the development of efficient inference algorithms. However, it has been recently shown that very different population demographies can actually generate the same SFS for arbitrarily large sample sizes. Although in principle this nonidentifiability issue poses a thorny challenge to statistical inference, the population size functions involved in the counterexamples are arguably not so biologically realistic. Here, we revisit this problem and examine the identifiability of demographic models under the restriction that the population sizes are piecewise-defined where each piece belongs to some family of biologically-motivated functions. Under this assumption, we prove that the expected SFS of a sample uniquely determines the underlying demographic model, provided that the sample is sufficiently large. We obtain a general bound on the sample size sufficient for identifiability; the bound depends on the number of pieces in the demographic model and also on the type of population size function in each piece. In the cases of piecewise-constant, piecewise-exponential and piecewise-generalized-exponential models, which are often assumed in population genomic inferences, we provide explicit formulas for the bounds as simple functions of the number of pieces. Lastly, we obtain analogous results for the "folded" SFS, which is often used when there is ambiguity as to which allelic type is ancestral. Our results are proved using a generalization of Descartes' rule of signs for polynomials to the Laplace transform of piecewise continuous functions.
Collapse
|
44
|
Fu W, Gittelman RM, Bamshad MJ, Akey JM. Characteristics of neutral and deleterious protein-coding variation among individuals and populations. Am J Hum Genet 2014; 95:421-36. [PMID: 25279984 PMCID: PMC4185119 DOI: 10.1016/j.ajhg.2014.09.006] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Accepted: 09/11/2014] [Indexed: 01/27/2023] Open
Abstract
Whole-genome and exome data sets continue to be produced at a frenetic pace, resulting in massively large catalogs of human genomic variation. However, a clear picture of the characteristics and patterns of neutral and deleterious variation within and between populations has yet to emerge, given that recent large-scale sequencing studies have often emphasized different aspects of the data and sometimes appear to have conflicting conclusions. Here, we comprehensively studied characteristics of protein-coding variation in high-coverage exome sequence data from 6,515 European American (EA) and African American (AA) individuals. We developed an unbiased approach to identify putatively deleterious variants and investigated patterns of neutral and deleterious single-nucleotide variants and alleles between individuals and populations. We show that there are substantial differences in the composition of genotypes between EA and AA populations and that small but statistically significant differences exist in the average number of deleterious alleles carried by EA and AA individuals. Furthermore, we performed extensive simulations to delineate the temporal dynamics of deleterious alleles for a broad range of demographic models and use these data to inform the interpretation of empirical patterns of deleterious variation. Finally, we illustrate that the effects of demographic perturbations, such as bottlenecks and expansions, often manifest in opposing patterns of neutral and deleterious variation depending on whether the focus is on populations or individuals. Our results clarify seemingly disparate empirical characteristics of protein-coding variation and provide substantial insights into how natural selection and demographic history have patterned neutral and deleterious variation within and between populations.
Collapse
Affiliation(s)
- Wenqing Fu
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.
| | - Rachel M Gittelman
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Michael J Bamshad
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Department of Pediatrics, University of Washington, Seattle, WA 98195, USA
| | - Joshua M Akey
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
45
|
Impact of range expansions on current human genomic diversity. Curr Opin Genet Dev 2014; 29:22-30. [PMID: 25156518 DOI: 10.1016/j.gde.2014.07.007] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2014] [Revised: 07/09/2014] [Accepted: 07/25/2014] [Indexed: 12/19/2022]
Abstract
The patterns of population genetic diversity depend to a large extent on past demographic history. Most human populations are known to have gone recently through a series of range expansions within and out of Africa, but these spatial expansions are rarely taken into account when interpreting observed genomic diversity, possibly because they are difficult to model. Here we review available evidence in favour of range expansions out of Africa, and we discuss several of their consequences on neutral and selected diversity, including some recent observations on an excess of rare neutral and selected variants in large samples. We further show that in spatially subdivided populations, the sampling strategy can severely impact the resulting genetic diversity and be confounded by past demography. We conclude that ignoring the spatial structure of human population can lead to some misinterpretations of extant genetic diversity.
Collapse
|
46
|
Abstract
Evolutionary processes of natural selection may be expected to leave their mark on age patterns of survival and reproduction. Demographic theory includes three main strands--mutation accumulation, stochastic vitality, and optimal life histories. This paper reviews the three strands and, concentrating on mutation accumulation, extends a mathematical result with broad implications concerning the effect of interactions between small age-specific effects of deleterious mutant alleles. Empirical data from genomic sequencing along with prospects for combining strands of theory hold hope for future progress.
Collapse
|
47
|
Lohmueller KE. The impact of population demography and selection on the genetic architecture of complex traits. PLoS Genet 2014; 10:e1004379. [PMID: 24875776 PMCID: PMC4038606 DOI: 10.1371/journal.pgen.1004379] [Citation(s) in RCA: 98] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Accepted: 03/28/2014] [Indexed: 02/06/2023] Open
Abstract
Population genetic studies have found evidence for dramatic population growth in recent human history. It is unclear how this recent population growth, combined with the effects of negative natural selection, has affected patterns of deleterious variation, as well as the number, frequency, and effect sizes of mutations that contribute risk to complex traits. Because researchers are performing exome sequencing studies aimed at uncovering the role of low-frequency variants in the risk of complex traits, this topic is of critical importance. Here I use simulations under population genetic models where a proportion of the heritability of the trait is accounted for by mutations in a subset of the exome. I show that recent population growth increases the proportion of nonsynonymous variants segregating in the population, but does not affect the genetic load relative to a population that did not expand. Under a model where a mutation's effect on a trait is correlated with its effect on fitness, rare variants explain a greater portion of the additive genetic variance of the trait in a population that has recently expanded than in a population that did not recently expand. Further, when using a single-marker test, for a given false-positive rate and sample size, recent population growth decreases the expected number of significant associations with the trait relative to the number detected in a population that did not expand. However, in a model where there is no correlation between a mutation's effect on fitness and the effect on the trait, common variants account for much of the additive genetic variance, regardless of demography. Moreover, here demography does not affect the number of significant associations detected. These findings suggest recent population history may be an important factor influencing the power of association tests and in accounting for the missing heritability of certain complex traits.
Collapse
Affiliation(s)
- Kirk E Lohmueller
- Department of Ecology and Evolutionary Biology, Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California, United States of America
| |
Collapse
|
48
|
Robinson MR, Wray NR, Visscher PM. Explaining additional genetic variation in complex traits. Trends Genet 2014; 30:124-32. [PMID: 24629526 DOI: 10.1016/j.tig.2014.02.003] [Citation(s) in RCA: 89] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Revised: 02/10/2014] [Accepted: 02/12/2014] [Indexed: 12/11/2022]
Abstract
Genome-wide association studies (GWAS) have provided valuable insights into the genetic basis of complex traits, discovering >6000 variants associated with >500 quantitative traits and common complex diseases in humans. The associations identified so far represent only a fraction of those that influence phenotype, because there are likely to be many variants across the entire frequency spectrum, each of which influences multiple traits, with only a small average contribution to the phenotypic variance. This presents a considerable challenge to further dissection of the remaining unexplained genetic variance within populations, which limits our ability to predict disease risk, identify new drug targets, improve and maintain food sources, and understand natural diversity. This challenge will be met within the current framework through larger sample size, better phenotyping, including recording of nongenetic risk factors, focused study designs, and an integration of multiple sources of phenotypic and genetic information. The current evidence supports the application of quantitative genetic approaches, and we argue that one should retain simpler theories until simplicity can be traded for greater explanatory power.
Collapse
Affiliation(s)
- Matthew R Robinson
- The Queensland Brain Institute, The University of Queensland, St Lucia, QLD 4072, Australia
| | - Naomi R Wray
- The Queensland Brain Institute, The University of Queensland, St Lucia, QLD 4072, Australia
| | - Peter M Visscher
- The Queensland Brain Institute, The University of Queensland, St Lucia, QLD 4072, Australia; The University of Queensland Diamantina Institute, The University of Queensland, Translational Research Institute, Brisbane, QLD 4102, Australia.
| |
Collapse
|
49
|
Ewing GB, Jensen JD. Distinguishing neutral from deleterious mutations in growing populations. Front Genet 2014; 5:7. [PMID: 24550931 PMCID: PMC3907712 DOI: 10.3389/fgene.2014.00007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2013] [Accepted: 01/07/2014] [Indexed: 11/29/2022] Open
Affiliation(s)
- Greg B Ewing
- School of Life Sciences, Ecole Polytechnique Federale de Lausanne, Lausanne Switzerland, Swiss Institute of Bioinformatics Lausanne, Switzerland
| | - Jeffrey D Jensen
- School of Life Sciences, Ecole Polytechnique Federale de Lausanne, Lausanne Switzerland, Swiss Institute of Bioinformatics Lausanne, Switzerland
| |
Collapse
|
50
|
Gazave E, Ma L, Chang D, Coventry A, Gao F, Muzny D, Boerwinkle E, Gibbs RA, Sing CF, Clark AG, Keinan A. Neutral genomic regions refine models of recent rapid human population growth. Proc Natl Acad Sci U S A 2014; 111:757-62. [PMID: 24379384 PMCID: PMC3896169 DOI: 10.1073/pnas.1310398110] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Human populations have experienced dramatic growth since the Neolithic revolution. Recent studies that sequenced a very large number of individuals observed an extreme excess of rare variants and provided clear evidence of recent rapid growth in effective population size, although estimates have varied greatly among studies. All these studies were based on protein-coding genes, in which variants are also impacted by natural selection. In this study, we introduce targeted sequencing data for studying recent human history with minimal confounding by natural selection. We sequenced loci far from genes that meet a wide array of additional criteria such that mutations in these loci are putatively neutral. As population structure also skews allele frequencies, we sequenced 500 individuals of relatively homogeneous ancestry by first analyzing the population structure of 9,716 European Americans. We used very high coverage sequencing to reliably call rare variants and fit an extensive array of models of recent European demographic history to the site frequency spectrum. The best-fit model estimates ∼ 3.4% growth per generation during the last ∼ 140 generations, resulting in a population size increase of two orders of magnitude. This model fits the data very well, largely due to our observation that assumptions of more ancient demography can impact estimates of recent growth. This observation and results also shed light on the discrepancy in demographic estimates among recent studies.
Collapse
Affiliation(s)
- Elodie Gazave
- Departments of Biological Statistics and Computational Biology and
| | - Li Ma
- Departments of Biological Statistics and Computational Biology and
| | - Diana Chang
- Departments of Biological Statistics and Computational Biology and
| | - Alex Coventry
- Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853
| | - Feng Gao
- Departments of Biological Statistics and Computational Biology and
| | - Donna Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030; and
| | - Eric Boerwinkle
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030; and
| | - Richard A. Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030; and
| | - Charles F. Sing
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48105
| | - Andrew G. Clark
- Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853
| | - Alon Keinan
- Departments of Biological Statistics and Computational Biology and
| |
Collapse
|