1
|
Crombie TA, Rajaei M, Saxena AS, Johnson LM, Saber S, Tanny RE, Ponciano JM, Andersen EC, Zhou J, Baer CF. Direct inference of the distribution of fitness effects of spontaneous mutations from recombinant inbred Caenorhabditis elegans mutation accumulation lines. Genetics 2024; 228:iyae136. [PMID: 39139098 DOI: 10.1093/genetics/iyae136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 07/30/2024] [Accepted: 08/02/2024] [Indexed: 08/15/2024] Open
Abstract
The distribution of fitness effects of new mutations plays a central role in evolutionary biology. Estimates of the distribution of fitness effect from experimental mutation accumulation lines are compromised by the complete linkage disequilibrium between mutations in different lines. To reduce the linkage disequilibrium, we constructed 2 sets of recombinant inbred lines from a cross of 2 Caenorhabditis elegans mutation accumulation lines. One set of lines ("RIAILs") was intercrossed for 10 generations prior to 10 generations of selfing; the second set of lines ("RILs") omitted the intercrossing. Residual linkage disequilibrium in the RIAILs is much less than in the RILs, which affects the inferred distribution of fitness effect when the sets of lines are analyzed separately. The best-fit model estimated from all lines (RIAILs + RILs) infers a large fraction of mutations with positive effects (∼40%); models that constrain mutations to have negative effects fit much worse. The conclusion is the same using only the RILs. For the RIAILs, however, models that constrain mutations to have negative effects fit nearly as well as models that allow positive effects. When mutations in high linkage disequilibrium are pooled into haplotypes, the inferred distribution of fitness effect becomes increasingly negative-skewed and leptokurtic. We conclude that the conventional wisdom-most mutations have effects near 0, a handful of mutations have effects that are substantially negative, and mutations with positive effects are very rare-is likely correct, and that unless it can be shown otherwise, estimates of the distribution of fitness effect that infer a substantial fraction of mutations with positive effects are likely confounded by linkage disequilibrium.
Collapse
Affiliation(s)
- Timothy A Crombie
- Department of Biology, University of Florida, Gainesville, FL 32611, USA
- Department of Molecular Biosciences, Northwestern University, Evanston, IL 60208, USA
| | - Moein Rajaei
- Department of Biology, University of Florida, Gainesville, FL 32611, USA
| | | | - Lindsay M Johnson
- Department of Biology, University of Florida, Gainesville, FL 32611, USA
| | - Sayran Saber
- Department of Biology, University of Florida, Gainesville, FL 32611, USA
| | - Robyn E Tanny
- Department of Molecular Biosciences, Northwestern University, Evanston, IL 60208, USA
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | | | - Erik C Andersen
- Department of Molecular Biosciences, Northwestern University, Evanston, IL 60208, USA
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Juannan Zhou
- Department of Biology, University of Florida, Gainesville, FL 32611, USA
| | - Charles F Baer
- Department of Biology, University of Florida, Gainesville, FL 32611, USA
- University of Florida Genetics Institute, Gainesville, FL 32611, USA
| |
Collapse
|
2
|
Taylor CS, Lawson DJ. Heritability of complex traits in sub-populations experiencing bottlenecks and growth. J Hum Genet 2024; 69:329-335. [PMID: 38589509 PMCID: PMC11199143 DOI: 10.1038/s10038-024-01249-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Revised: 03/20/2024] [Accepted: 03/23/2024] [Indexed: 04/10/2024]
Abstract
Populations that have experienced a bottleneck are regularly used in Genome Wide Association Studies (GWAS) to investigate variants associated with complex traits. It is generally understood that these isolated sub-populations may experience high frequency of otherwise rare variants with large effect size, and therefore provide a unique opportunity to study said trait. However, the demographic history of the population under investigation affects all SNPs that determine the complex trait genome-wide, changing its heritability and genetic architecture. We use a simulation based approach to identify the impact of the demographic processes of drift, expansion, and migration on the heritability of complex trait. We show that demography has considerable impact on complex traits. We then investigate the power to resolve heritability of complex traits in GWAS studies subjected to demographic effects. We find that demography is an important component for interpreting inference of complex traits and has a nuanced impact on the power of GWAS. We conclude that demographic histories need to be explicitly modelled to properly quantify the history of selection on a complex trait.
Collapse
Affiliation(s)
| | - Daniel J Lawson
- School of Mathematics, University of Bristol, Bristol, UK.
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, UK.
| |
Collapse
|
3
|
Patel RA, Weiß CL, Zhu H, Mostafavi H, Simons YB, Spence JP, Pritchard JK. Conditional frequency spectra as a tool for studying selection on complex traits in biobanks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.15.599126. [PMID: 38948697 PMCID: PMC11212903 DOI: 10.1101/2024.06.15.599126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Natural selection on complex traits is difficult to study in part due to the ascertainment inherent to genome-wide association studies (GWAS). The power to detect a trait-associated variant in GWAS is a function of frequency and effect size - but for traits under selection, the effect size of a variant determines the strength of selection against it, constraining its frequency. To account for GWAS ascertainment, we propose studying the joint distribution of allele frequencies across populations, conditional on the frequencies in the GWAS cohort. Before considering these conditional frequency spectra, we first characterized the impact of selection and non-equilibrium demography on allele frequency dynamics forwards and backwards in time. We then used these results to understand conditional frequency spectra under realistic human demography. Finally, we investigated empirical conditional frequency spectra for GWAS variants associated with 106 complex traits, finding compelling evidence for either stabilizing or purifying selection. Our results provide insight into polygenic score portability and other properties of variants ascertained with GWAS, highlighting the utility of conditional frequency spectra.
Collapse
Affiliation(s)
- Roshni A. Patel
- Department of Genetics, Stanford University School of Medicine, Stanford, CA
| | - Clemens L. Weiß
- Stanford Cancer Institute Core, Stanford University School of Medicine, Stanford, CA
| | - Huisheng Zhu
- Department of Biology, Stanford University, Stanford, CA
| | - Hakhamanesh Mostafavi
- Center for Human Genetics and Genomics, New York University School of Medicine, New York, NY
- Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY
| | | | - Jeffrey P. Spence
- Department of Genetics, Stanford University School of Medicine, Stanford, CA
| | - Jonathan K. Pritchard
- Department of Genetics, Stanford University School of Medicine, Stanford, CA
- Department of Biology, Stanford University, Stanford, CA
| |
Collapse
|
4
|
O’Brien NLV, Holland B, Engelstädter J, Ortiz-Barrientos D. The distribution of fitness effects during adaptive walks using a simple genetic network. PLoS Genet 2024; 20:e1011289. [PMID: 38787919 PMCID: PMC11156440 DOI: 10.1371/journal.pgen.1011289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 06/06/2024] [Accepted: 05/04/2024] [Indexed: 05/26/2024] Open
Abstract
The tempo and mode of adaptation depends on the availability of beneficial alleles. Genetic interactions arising from gene networks can restrict this availability. However, the extent to which networks affect adaptation remains largely unknown. Current models of evolution consider additive genotype-phenotype relationships while often ignoring the contribution of gene interactions to phenotypic variance. In this study, we model a quantitative trait as the product of a simple gene regulatory network, the negative autoregulation motif. Using forward-time genetic simulations, we measure adaptive walks towards a phenotypic optimum in both additive and network models. A key expectation from adaptive walk theory is that the distribution of fitness effects of new beneficial mutations is exponential. We found that both models instead harbored distributions with fewer large-effect beneficial alleles than expected. The network model also had a complex and bimodal distribution of fitness effects among all mutations, with a considerable density at deleterious selection coefficients. This behavior is reminiscent of the cost of complexity, where correlations among traits constrain adaptation. Our results suggest that the interactions emerging from genetic networks can generate complex and multimodal distributions of fitness effects.
Collapse
Affiliation(s)
- Nicholas L. V. O’Brien
- School of the Environment, The University of Queensland, Brisbane, Queensland, Australia
- ARC Centre of Excellence for Plant Success in Nature and Agriculture, The University of Queensland, Brisbane, QLD, Australia
| | - Barbara Holland
- School of Natural Sciences, University of Tasmania, Hobart, Tasmania, Australia
- ARC Centre of Excellence for Plant Success in Nature and Agriculture, University of Tasmania, Hobart, Tasmania, Australia
| | - Jan Engelstädter
- School of the Environment, The University of Queensland, Brisbane, Queensland, Australia
- ARC Centre of Excellence for Plant Success in Nature and Agriculture, The University of Queensland, Brisbane, QLD, Australia
| | - Daniel Ortiz-Barrientos
- School of the Environment, The University of Queensland, Brisbane, Queensland, Australia
- ARC Centre of Excellence for Plant Success in Nature and Agriculture, The University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
5
|
Murga-Moreno J, Casillas S, Barbadilla A, Uricchio L, Enard D. An efficient and robust ABC approach to infer the rate and strength of adaptation. G3 (BETHESDA, MD.) 2024; 14:jkae031. [PMID: 38365205 PMCID: PMC11090462 DOI: 10.1093/g3journal/jkae031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 10/10/2023] [Accepted: 01/29/2024] [Indexed: 02/18/2024]
Abstract
Inferring the effects of positive selection on genomes remains a critical step in characterizing the ultimate and proximate causes of adaptation across species, and quantifying positive selection remains a challenge due to the confounding effects of many other evolutionary processes. Robust and efficient approaches for adaptation inference could help characterize the rate and strength of adaptation in nonmodel species for which demographic history, mutational processes, and recombination patterns are not currently well-described. Here, we introduce an efficient and user-friendly extension of the McDonald-Kreitman test (ABC-MK) for quantifying long-term protein adaptation in specific lineages of interest. We characterize the performance of our approach with forward simulations and find that it is robust to many demographic perturbations and positive selection configurations, demonstrating its suitability for applications to nonmodel genomes. We apply ABC-MK to the human proteome and a set of known virus interacting proteins (VIPs) to test the long-term adaptation in genes interacting with viruses. We find substantially stronger signatures of positive selection on RNA-VIPs than DNA-VIPs, suggesting that RNA viruses may be an important driver of human adaptation over deep evolutionary time scales.
Collapse
Affiliation(s)
- Jesús Murga-Moreno
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85719, USA
| | - Sònia Casillas
- Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| | - Antonio Barbadilla
- Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| | | | - David Enard
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85719, USA
| |
Collapse
|
6
|
Kim D, Song J, Mancuso N, Mangul S, Jung J, Jang W. Large-scale integrative analysis of juvenile idiopathic arthritis for new insight into its pathogenesis. Arthritis Res Ther 2024; 26:47. [PMID: 38336809 PMCID: PMC10858498 DOI: 10.1186/s13075-024-03280-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 01/29/2024] [Indexed: 02/12/2024] Open
Abstract
BACKGROUND Juvenile idiopathic arthritis (JIA) is one of the most prevalent rheumatic disorders in children and is classified as an autoimmune disease (AID). While a robust genetic contribution to JIA etiology has been established, the exact pathogenesis remains unclear. METHODS To prioritize biologically interpretable susceptibility genes and proteins for JIA, we conducted transcriptome-wide and proteome-wide association studies (TWAS/PWAS). Then, to understand the genetic architecture of JIA, we systematically analyzed single-nucleotide polymorphism (SNP)-based heritability, a signature of natural selection, and polygenicity. Next, we conducted HLA typing using multi-ethnicity RNA sequencing data. Additionally, we examined the T cell receptor (TCR) repertoire at a single-cell level to explore the potential links between immunity and JIA risk. RESULTS We have identified 19 TWAS genes and two PWAS proteins associated with JIA risks. Furthermore, we observe that the heritability and cell type enrichment analysis of JIA are enriched in T lymphocytes and HLA regions and that JIA shows higher polygenicity compared to other AIDs. In multi-ancestry HLA typing, B*45:01 is more prevalent in African JIA patients than in European JIA patients, whereas DQA1*01:01, DQA1*03:01, and DRB1*04:01 exhibit a higher frequency in European JIA patients. Using single-cell immune repertoire analysis, we identify clonally expanded T cell subpopulations in JIA patients, including CXCL13+BHLHE40+ TH cells which are significantly associated with JIA risks. CONCLUSION Our findings shed new light on the pathogenesis of JIA and provide a strong foundation for future mechanistic studies aimed at uncovering the molecular drivers of JIA.
Collapse
Affiliation(s)
- Daeun Kim
- Department of Life Sciences, Dongguk University-Seoul, Seoul, 04620, Republic of Korea
| | - Jaeseung Song
- Department of Life Sciences, Dongguk University-Seoul, Seoul, 04620, Republic of Korea
| | - Nicholas Mancuso
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, USC Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, CA, USA
| | - Serghei Mangul
- Department of Quantitative and Computational Biology, USC Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, CA, USA
- Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, USA
| | - Junghyun Jung
- Department of Life Sciences, Dongguk University-Seoul, Seoul, 04620, Republic of Korea.
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Hollywood, CA, USA.
| | - Wonhee Jang
- Department of Life Sciences, Dongguk University-Seoul, Seoul, 04620, Republic of Korea.
| |
Collapse
|
7
|
Zurita AMI, Kyriazis CC, Lohmueller KE. The impact of non-neutral synonymous mutations when inferring selection on non-synonymous mutations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.07.579314. [PMID: 38370782 PMCID: PMC10871344 DOI: 10.1101/2024.02.07.579314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
The distribution of fitness effects (DFE) describes the proportions of new mutations that have different effects on reproductive fitness. Accurate measurements of the DFE are important because the DFE is a fundamental parameter in evolutionary genetics and has implications for our understanding of other phenomena like complex disease or inbreeding depression. Current computational methods to infer the DFE for nonsynonymous mutations from natural variation first estimate demographic parameters from synonymous variants to control for the effects of demography and background selection. Then, conditional on these parameters, the DFE is then inferred for nonsynonymous mutations. This approach relies on the assumption that synonymous variants are neutrally evolving. However, some evidence points toward synonymous mutations having measurable effects on fitness. To test whether selection on synonymous mutations affects inference of the DFE of nonsynonymous mutations, we simulated several possible models of selection on synonymous mutations using SLiM and attempted to recover the DFE of nonsynonymous mutations using Fit∂a∂i, a common method for DFE inference. Our results show that the presence of selection on synonymous variants leads to incorrect inferences of recent population growth. Furthermore, under certain parameter combinations, inferences of the DFE can have an inflated proportion of highly deleterious nonsynonymous mutations. However, this bias can be eliminated if the correct demographic parameters are used for DFE inference instead of the biased ones inferred from synonymous variants. Our work demonstrates how unmodeled selection on synonymous mutations may affect downstream inferences of the DFE.
Collapse
Affiliation(s)
- Aina Martinez I Zurita
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, USA
| | - Christopher C Kyriazis
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, USA
| | - Kirk E Lohmueller
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, USA
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, USA
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, USA
| |
Collapse
|
8
|
Skinner MK. Epigenetic biomarkers for disease susceptibility and preventative medicine. Cell Metab 2024; 36:263-277. [PMID: 38176413 DOI: 10.1016/j.cmet.2023.11.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 10/11/2023] [Accepted: 11/28/2023] [Indexed: 01/06/2024]
Abstract
The development of molecular biomarkers for disease makes it possible for preventative medicine approaches to be considered. Therefore, therapeutics, treatments, or clinical management can be used to delay or prevent disease development. The problem with genetic mutations as biomarkers is the low frequency with genome-wide association studies (GWASs), generally at best a 1% association of the patients with the disease. In contrast, epigenetic alterations have a high-frequency association of greater than 90%-95% of individuals with pathology in epigenome-wide association studies (EWASs). A wide variety of human diseases have been shown to have epigenetic biomarkers that are disease specific and that detect pathology susceptibility. This review is focused on the epigenetic biomarkers for disease susceptibility, and it distinct from the large literature on epigenetics of disease etiology or progression. The development of efficient epigenetic biomarkers for disease susceptibility will facilitate a paradigm shift from reactionary medicine to preventative medicine.
Collapse
Affiliation(s)
- Michael K Skinner
- Center for Reproductive Biology, School of Biological Sciences, Washington State University, Pullman, WA 99164-4236, USA.
| |
Collapse
|
9
|
Murga-Moreno J, Casillas S, Barbadilla A, Uricchio L, Enard D. An efficient and robust ABC approach to infer the rate and strength of adaptation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.29.555322. [PMID: 37693550 PMCID: PMC10491248 DOI: 10.1101/2023.08.29.555322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Inferring the effects of positive selection on genomes remains a critical step in characterizing the ultimate and proximate causes of adaptation across species, and quantifying positive selection remains a challenge due to the confounding effects of many other evolutionary processes. Robust and efficient approaches for adaptation inference could help characterize the rate and strength of adaptation in non-model species for which demographic history, mutational processes, and recombination patterns are not currently well-described. Here, we introduce an efficient and user-friendly extension of the McDonald-Kreitman test (ABC-MK) for quantifying long-term protein adaptation in specific lineages of interest. We characterize the performance of our approach with forward simulations and find that it is robust to many demographic perturbations and positive selection configurations, demonstrating its suitability for applications to non-model genomes. We apply ABC-MK to the human proteome and a set of known Virus Interacting Proteins (VIPs) to test the long-term adaptation in genes interacting with viruses. We find substantially stronger signatures of positive selection on RNA-VIPs than DNA-VIPs, suggesting that RNA viruses may be an important driver of human adaptation over deep evolutionary time scales.
Collapse
Affiliation(s)
- Jesús Murga-Moreno
- University of Arizona Department of Ecology and Evolutionary Biology, Tucson, USA
| | - Sònia Casillas
- Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| | - Antonio Barbadilla
- Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| | | | - David Enard
- University of Arizona Department of Ecology and Evolutionary Biology, Tucson, USA
| |
Collapse
|
10
|
Wientjes YCJ, Bijma P, van den Heuvel J, Zwaan BJ, Vitezica ZG, Calus MPL. The long-term effects of genomic selection: 2. Changes in allele frequencies of causal loci and new mutations. Genetics 2023; 225:iyad141. [PMID: 37506255 PMCID: PMC10471209 DOI: 10.1093/genetics/iyad141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 05/17/2023] [Accepted: 07/18/2023] [Indexed: 07/30/2023] Open
Abstract
Genetic selection has been applied for many generations in animal, plant, and experimental populations. Selection changes the allelic architecture of traits to create genetic gain. It remains unknown whether the changes in allelic architecture are different for the recently introduced technique of genomic selection compared to traditional selection methods and whether they depend on the genetic architectures of traits. Here, we investigate the allele frequency changes of old and new causal loci under 50 generations of phenotypic, pedigree, and genomic selection, for a trait controlled by either additive, additive and dominance, or additive, dominance, and epistatic effects. Genomic selection resulted in slightly larger and faster changes in allele frequencies of causal loci than pedigree selection. For each locus, allele frequency change per generation was not only influenced by its statistical additive effect but also to a large extent by the linkage phase with other loci and its allele frequency. Selection fixed a large number of loci, and 5 times more unfavorable alleles became fixed with genomic and pedigree selection than with phenotypic selection. For pedigree selection, this was mainly a result of increased genetic drift, while genetic hitchhiking had a larger effect on genomic selection. When epistasis was present, the average allele frequency change was smaller (∼15% lower), and a lower number of loci became fixed for all selection methods. We conclude that for long-term genetic improvement using genomic selection, it is important to consider hitchhiking and to limit the loss of favorable alleles.
Collapse
Affiliation(s)
- Yvonne C J Wientjes
- Animal Breeding and Genomics, Wageningen University & Research, 6700 AH Wageningen, The Netherlands
| | - Piter Bijma
- Animal Breeding and Genomics, Wageningen University & Research, 6700 AH Wageningen, The Netherlands
| | - Joost van den Heuvel
- Laboratory of Genetics, Wageningen University & Research, 6700 AH Wageningen, The Netherlands
| | - Bas J Zwaan
- Laboratory of Genetics, Wageningen University & Research, 6700 AH Wageningen, The Netherlands
| | | | - Mario P L Calus
- Animal Breeding and Genomics, Wageningen University & Research, 6700 AH Wageningen, The Netherlands
| |
Collapse
|
11
|
Marrella MA, Biase FH. Robust identification of regulatory variants (eQTLs) using a differential expression framework developed for RNA-sequencing. J Anim Sci Biotechnol 2023; 14:62. [PMID: 37143150 PMCID: PMC10161580 DOI: 10.1186/s40104-023-00861-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 03/05/2023] [Indexed: 05/06/2023] Open
Abstract
BACKGROUND A gap currently exists between genetic variants and the underlying cell and tissue biology of a trait, and expression quantitative trait loci (eQTL) studies provide important information to help close that gap. However, two concerns that arise with eQTL analyses using RNA-sequencing data are normalization of data across samples and the data not following a normal distribution. Multiple pipelines have been suggested to address this. For instance, the most recent analysis of the human and farm Genotype-Tissue Expression (GTEx) project proposes using trimmed means of M-values (TMM) to normalize the data followed by an inverse normal transformation. RESULTS In this study, we reasoned that eQTL analysis could be carried out using the same framework used for differential gene expression (DGE), which uses a negative binomial model, a statistical test feasible for count data. Using the GTEx framework, we identified 35 significant eQTLs (P < 5 × 10-8) following the ANOVA model and 39 significant eQTLs (P < 5 × 10-8) following the additive model. Using a differential gene expression framework, we identified 930 and six significant eQTLs (P < 5 × 10-8) following an analytical framework equivalent to the ANOVA and additive model, respectively. When we compared the two approaches, there was no overlap of significant eQTLs between the two frameworks. Because we defined specific contrasts, we identified trans eQTLs that more closely resembled what we expect from genetic variants showing complete dominance between alleles. Yet, these were not identified by the GTEx framework. CONCLUSIONS Our results show that transforming RNA-sequencing data to fit a normal distribution prior to eQTL analysis is not required when the DGE framework is employed. Our proposed approach detected biologically relevant variants that otherwise would not have been identified due to data transformation to fit a normal distribution.
Collapse
Affiliation(s)
- Mackenzie A Marrella
- School of Animal Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
| | - Fernando H Biase
- School of Animal Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA.
| |
Collapse
|
12
|
Aqil A, Speidel L, Pavlidis P, Gokcumen O. Balancing selection on genomic deletion polymorphisms in humans. eLife 2023; 12:79111. [PMID: 36625544 PMCID: PMC9943071 DOI: 10.7554/elife.79111] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 01/05/2023] [Indexed: 01/11/2023] Open
Abstract
A key question in biology is why genomic variation persists in a population for extended periods. Recent studies have identified examples of genomic deletions that have remained polymorphic in the human lineage for hundreds of millennia, ostensibly owing to balancing selection. Nevertheless, genome-wide investigation of ancient and possibly adaptive deletions remains an imperative exercise. Here, we demonstrate an excess of polymorphisms in present-day humans that predate the modern human-Neanderthal split (ancient polymorphisms), which cannot be explained solely by selectively neutral scenarios. We analyze the adaptive mechanisms that underlie this excess in deletion polymorphisms. Using a previously published measure of balancing selection, we show that this excess of ancient deletions is largely owing to balancing selection. Based on the absence of signatures of overdominance, we conclude that it is a rare mode of balancing selection among ancient deletions. Instead, more complex scenarios involving spatially and temporally variable selective pressures are likely more common mechanisms. Our results suggest that balancing selection resulted in ancient deletions harboring disproportionately more exonic variants with GWAS (genome-wide association studies) associations. We further found that ancient deletions are significantly enriched for traits related to metabolism and immunity. As a by-product of our analysis, we show that deletions are, on average, more deleterious than single nucleotide variants. We can now argue that not only is a vast majority of common variants shared among human populations, but a considerable portion of biologically relevant variants has been segregating among our ancestors for hundreds of thousands, if not millions, of years.
Collapse
Affiliation(s)
- Alber Aqil
- Department of Biological Sciences, University at BuffaloBuffaloUnited States
| | - Leo Speidel
- University College London, Genetics InstituteLondonUnited Kingdom
- The Francis Crick InstituteLondonUnited Kingdom
| | - Pavlos Pavlidis
- Institute of Computer Science (ICS), Foundation of Research and Technology-HellasHeraklionGreece
| | - Omer Gokcumen
- Department of Biological Sciences, University at BuffaloBuffaloUnited States
| |
Collapse
|
13
|
Jang SK, Evans L, Fialkowski A, Arnett DK, Ashley-Koch AE, Barnes KC, Becker DM, Bis JC, Blangero J, Bleecker ER, Boorgula MP, Bowden DW, Brody JA, Cade BE, Jenkins BWC, Carson AP, Chavan S, Cupples LA, Custer B, Damrauer SM, David SP, de Andrade M, Dinardo CL, Fingerlin TE, Fornage M, Freedman BI, Garrett ME, Gharib SA, Glahn DC, Haessler J, Heckbert SR, Hokanson JE, Hou L, Hwang SJ, Hyman MC, Judy R, Justice AE, Kaplan RC, Kardia SLR, Kelly S, Kim W, Kooperberg C, Levy D, Lloyd-Jones DM, Loos RJF, Manichaikul AW, Gladwin MT, Martin LW, Nouraie M, Melander O, Meyers DA, Montgomery CG, North KE, Oelsner EC, Palmer ND, Payton M, Peljto AL, Peyser PA, Preuss M, Psaty BM, Qiao D, Rader DJ, Rafaels N, Redline S, Reed RM, Reiner AP, Rich SS, Rotter JI, Schwartz DA, Shadyab AH, Silverman EK, Smith NL, Smith JG, Smith AV, Smith JA, Tang W, Taylor KD, Telen MJ, Vasan RS, Gordeuk VR, Wang Z, Wiggins KL, Yanek LR, Yang IV, Young KA, Young KL, Zhang Y, Liu DJ, Keller MC, Vrieze S. Rare genetic variants explain missing heritability in smoking. Nat Hum Behav 2022; 6:1577-1586. [PMID: 35927319 PMCID: PMC9985486 DOI: 10.1038/s41562-022-01408-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 06/10/2022] [Indexed: 12/11/2022]
Abstract
Common genetic variants explain less variation in complex phenotypes than inferred from family-based studies, and there is a debate on the source of this 'missing heritability'. We investigated the contribution of rare genetic variants to tobacco use with whole-genome sequences from up to 26,257 unrelated individuals of European ancestries and 11,743 individuals of African ancestries. Across four smoking traits, single-nucleotide-polymorphism-based heritability ([Formula: see text]) was estimated from 0.13 to 0.28 (s.e., 0.10-0.13) in European ancestries, with 35-74% of it attributable to rare variants with minor allele frequencies between 0.01% and 1%. These heritability estimates are 1.5-4 times higher than past estimates based on common variants alone and accounted for 60% to 100% of our pedigree-based estimates of narrow-sense heritability ([Formula: see text], 0.18-0.34). In the African ancestry samples, [Formula: see text] was estimated from 0.03 to 0.33 (s.e., 0.09-0.14) across the four smoking traits. These results suggest that rare variants are important contributors to the heritability of smoking.
Collapse
Affiliation(s)
- Seon-Kyeong Jang
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA
| | - Luke Evans
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, CO, USA
- Department of Ecology & Evolution, University of Colorado Boulder, Boulder, CO, USA
| | | | - Donna K Arnett
- Dean's Office, University of Kentucky College of Public Health, Lexington, KY, USA
| | | | - Kathleen C Barnes
- Division of Biomedical Informatics & Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Diane M Becker
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Joshua C Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - John Blangero
- Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | | | - Meher Preethi Boorgula
- Division of Biomedical Informatics & Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Donald W Bowden
- Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Jennifer A Brody
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Brian E Cade
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Brenda W Campbell Jenkins
- Jackson Heart Study Graduate Training and Education Center, Jackson State University School of Public Health, Jackson, MS, USA
| | - April P Carson
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
| | - Sameer Chavan
- Division of Biomedical Informatics & Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - L Adrienne Cupples
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Brian Custer
- Vitalant Research Institute, San Francisco, CA, USA
| | - Scott M Damrauer
- Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Surgery, Corporal Michael Crescenz VA Medical Center, Philadelphia, PA, USA
| | - Sean P David
- Department of Family Medicine, Prtizker School of Medicine, University of Chicago, Chicago, IL, USA
- NorthShore University HealthSystem, Evanston, IL, USA
| | - Mariza de Andrade
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | | | - Tasha E Fingerlin
- Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
- Center for Genes Environment and Health, National Jewish Health, Denver, CO, USA
| | - Myriam Fornage
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Barry I Freedman
- Section on Nephrology, Department of Internal Medicine, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Melanie E Garrett
- Department of Medicine, Duke University School of Medicine, Durham, NC, USA
| | - Sina A Gharib
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
- Center for Lung Biology, Division of Pulmonary, Critical Care and Sleep Medicine, University of Washington, Seattle, WA, USA
| | - David C Glahn
- Department of Psychiatry, Boston Children's Hosptial and Harvard Medical School, Boston, MA, USA
| | - Jeffrey Haessler
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Susan R Heckbert
- Department of Epidemiology, University of Washington, Seattle, WA, USA
- Kaiser Permanente Washington Health Research Institute, Kaiser Permanente Washington, Seattle, WA, USA
| | - John E Hokanson
- Department of Epidemiology, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Lifang Hou
- Department of Preventive Medicine, Northwestern University, Chicago, IL, USA
| | - Shih-Jen Hwang
- Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Matthew C Hyman
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Renae Judy
- Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Anne E Justice
- Department of Population Health Sciences, Geisinger Health System, Danville, PA, USA
| | - Robert C Kaplan
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Sharon L R Kardia
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Shannon Kelly
- Department of Pediatrics, UCSF Benioff Children's Hospital Oakland, Oakland, CA, USA
| | - Wonji Kim
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Daniel Levy
- Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
- Framingham Heart Study, Framingham, MA, USA
| | | | - Ruth J F Loos
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ani W Manichaikul
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA, USA
| | - Mark T Gladwin
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | | | - Mehdi Nouraie
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Olle Melander
- Department of Clinical Sciences, Lund University, Malmö, Sweden
- Department of Internal Medicine, Skåne University Hospital, Malmö, Sweden
| | | | - Courtney G Montgomery
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA
| | - Kari E North
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Elizabeth C Oelsner
- Division of General Medicine, Columbia University Irving Medical Center, Columbia University, New York, NY, USA
| | - Nicholette D Palmer
- Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Marinelle Payton
- Department of Epidemiology and Biostatistics, Jackson Heart Study Graduate Training and Education Center, Jackson State University School of Public Health, Jackson, MS, USA
| | - Anna L Peljto
- Department of Medicine, University of Colorado School of Medicine, Aurora, CO, USA
| | - Patricia A Peyser
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Michael Preuss
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Department of Medicine, Epidemiology and Health Services, University of Washington, Seattle, WA, USA
| | - Dandi Qiao
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Daniel J Rader
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Nicholas Rafaels
- Division of Biomedical Informatics & Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Susan Redline
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Robert M Reed
- University of Maryland School of Medicine, Baltimore, MD, USA
| | - Alexander P Reiner
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Stephen S Rich
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA, USA
| | - Jerome I Rotter
- Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - David A Schwartz
- Department of Medicine, School of Medicine, University of Colorado Denver, Aurora, CO, USA
- Department of Immunology, School of Medicine, University of Colorado Denver, Aurora, CO, USA
| | - Aladdin H Shadyab
- Herbert Wertheim School of Public Health and Human Longevity Science, University of California, San Diego, La Jolla, CA, USA
| | - Edwin K Silverman
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Nicholas L Smith
- Department of Epidemiology, University of Washington, Seattle, WA, USA
- Kaiser Permanente Washington Health Research Institute, Kaiser Permanente Washington, Seattle, WA, USA
| | - J Gustav Smith
- Wallenberg Laboratory/Department of Molecular and Clinical Medicine, Institute of Medicine, Gothenburg University, Gothenburg, Sweden
- Department of Cardiology, Sahlgrenska University Hospital, Gothenburg, Sweden
| | - Albert V Smith
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Jennifer A Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Weihong Tang
- Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Kent D Taylor
- Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Marilyn J Telen
- Department of Medicine, Duke University School of Medicine, Durham, NC, USA
| | - Ramachandran S Vasan
- Sections of Preventive Medicine and Epidemiology and Cardiovascular Medicine, Department of Medicine, Boston University School of Medicine, Boston, MA, USA
- Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA
| | - Victor R Gordeuk
- Department of Medicine, University of Illinois at Chicago, Chicago, IL, USA
| | - Zhe Wang
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Kerri L Wiggins
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Lisa R Yanek
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Ivana V Yang
- Department of Medicine, University of Colorado School of Medicine, Aurora, CO, USA
| | - Kendra A Young
- Department of Epidemiology, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Kristin L Young
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Yingze Zhang
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Dajiang J Liu
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA
| | - Matthew C Keller
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, CO, USA
| | - Scott Vrieze
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA.
| |
Collapse
|
14
|
Ruzicka F, Holman L, Connallon T. Polygenic signals of sex differences in selection in humans from the UK Biobank. PLoS Biol 2022; 20:e3001768. [PMID: 36067235 PMCID: PMC9481184 DOI: 10.1371/journal.pbio.3001768] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 09/16/2022] [Accepted: 07/27/2022] [Indexed: 11/19/2022] Open
Abstract
Sex differences in the fitness effects of genetic variants can influence the rate of adaptation and the maintenance of genetic variation. For example, "sexually antagonistic" (SA) variants, which are beneficial for one sex and harmful for the other, can both constrain adaptation and increase genetic variability for fitness components such as survival, fertility, and disease susceptibility. However, detecting variants with sex-differential fitness effects is difficult, requiring genome sequences and fitness measurements from large numbers of individuals. Here, we develop new theory for studying sex-differential selection across a complete life cycle and test our models with genotypic and reproductive success data from approximately 250,000 UK Biobank individuals. We uncover polygenic signals of sex-differential selection affecting survival, reproductive success, and overall fitness, with signals of sex-differential reproductive selection reflecting a combination of SA polymorphisms and sexually concordant polymorphisms in which the strength of selection differs between the sexes. Moreover, these signals hold up to rigorous controls that minimise the contributions of potential confounders, including sequence mapping errors, population structure, and ascertainment bias. Functional analyses reveal that sex-differentiated sites are enriched in phenotype-altering genomic regions, including coding regions and loci affecting a range of quantitative traits. Population genetic analyses show that sex-differentiated sites exhibit evolutionary histories dominated by genetic drift and/or transient balancing selection, but not long-term balancing selection, which is consistent with theoretical predictions of effectively weak SA balancing selection in historically small populations. Overall, our results are consistent with polygenic sex-differential-including SA-selection in humans. Evidence for sex-differential selection is particularly strong for variants affecting reproductive success, in which the potential contributions of nonrandom sampling to signals of sex differentiation can be excluded.
Collapse
Affiliation(s)
- Filip Ruzicka
- School of Biological Sciences, Monash University, Clayton, Victoria, Australia
| | - Luke Holman
- School of BioSciences, University of Melbourne, Parkville, Victoria, Australia
- School of Applied Sciences, Edinburgh Napier University, Edinburgh, United Kingdom
| | - Tim Connallon
- School of Biological Sciences, Monash University, Clayton, Victoria, Australia
| |
Collapse
|
15
|
Hine E, Runcie DE, Allen SL, Wang Y, Chenoweth SF, Blows MW, McGuigan K. Maintenance of quantitative genetic variance in complex, multi-trait phenotypes: The contribution of rare, large effect variants in two Drosophila species. Genetics 2022; 222:6663993. [PMID: 35961029 PMCID: PMC9526065 DOI: 10.1093/genetics/iyac122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 08/02/2022] [Indexed: 11/29/2022] Open
Abstract
The interaction of evolutionary processes to determine quantitative genetic variation has implications for contemporary and future phenotypic evolution, as well as for our ability to detect causal genetic variants. While theoretical studies have provided robust predictions to discriminate among competing models, empirical assessment of these has been limited. In particular, theory highlights the importance of pleiotropy in resolving observations of selection and mutation, but empirical investigations have typically been limited to few traits. Here, we applied high-dimensional Bayesian Sparse Factor Genetic modeling to gene expression datasets in 2 species, Drosophila melanogaster and Drosophila serrata, to explore the distributions of genetic variance across high-dimensional phenotypic space. Surprisingly, most of the heritable trait covariation was due to few lines (genotypes) with extreme [>3 interquartile ranges (IQR) from the median] values. Intriguingly, while genotypes extreme for a multivariate factor also tended to have a higher proportion of individual traits that were extreme, we also observed genotypes that were extreme for multivariate factors but not for any individual trait. We observed other consistent differences between heritable multivariate factors with outlier lines vs those factors without extreme values, including differences in gene functions. We use these observations to identify further data required to advance our understanding of the evolutionary dynamics and nature of standing genetic variation for quantitative traits.
Collapse
Affiliation(s)
- Emma Hine
- School of Biological Sciences, The University of Queensland, Brisbane 4072 Australia
| | - Daniel E Runcie
- Department of Plant Sciences, University of California Davis, Davis, CA 95616, USA
| | - Scott L Allen
- School of Biological Sciences, The University of Queensland, Brisbane 4072 Australia
| | - Yiguan Wang
- School of Biological Sciences, The University of Queensland, Brisbane 4072 Australia.,Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, EH9 3FL, UK
| | - Stephen F Chenoweth
- School of Biological Sciences, The University of Queensland, Brisbane 4072 Australia
| | - Mark W Blows
- School of Biological Sciences, The University of Queensland, Brisbane 4072 Australia
| | - Katrina McGuigan
- School of Biological Sciences, The University of Queensland, Brisbane 4072 Australia
| |
Collapse
|
16
|
Smail C, Ferraro NM, Hui Q, Durrant MG, Aguirre M, Tanigawa Y, Keever-Keigher MR, Rao AS, Justesen JM, Li X, Gloudemans MJ, Assimes TL, Kooperberg C, Reiner AP, Huang J, O'Donnell CJ, Sun YV, Rivas MA, Montgomery SB. Integration of rare expression outlier-associated variants improves polygenic risk prediction. Am J Hum Genet 2022; 109:1055-1064. [PMID: 35588732 PMCID: PMC9247823 DOI: 10.1016/j.ajhg.2022.04.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 04/25/2022] [Indexed: 11/28/2022] Open
Abstract
Polygenic risk scores (PRSs) quantify the contribution of multiple genetic loci to an individual's likelihood of a complex trait or disease. However, existing PRSs estimate this likelihood with common genetic variants, excluding the impact of rare variants. Here, we report on a method to identify rare variants associated with outlier gene expression and integrate their impact into PRS predictions for body mass index (BMI), obesity, and bariatric surgery. Between the top and bottom 10%, we observed a 20.8% increase in risk for obesity (p = 3 × 10-14), 62.3% increase in risk for severe obesity (p = 1 × 10-6), and median 5.29 years earlier onset for bariatric surgery (p = 0.008), as a function of expression outlier-associated rare variant burden when controlling for common variant PRS. We show that these predictions were more significant than integrating the effects of rare protein-truncating variants (PTVs), observing a mean 19% increase in phenotypic variance explained with expression outlier-associated rare variants when compared with PTVs (p = 2 × 10-15). We replicated these findings by using data from the Million Veteran Program and demonstrated that PRSs across multiple traits and diseases can benefit from the inclusion of expression outlier-associated rare variants identified through population-scale transcriptome sequencing.
Collapse
Affiliation(s)
- Craig Smail
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA; Genomic Medicine Center, Children's Mercy Research Institute and Children's Mercy Kansas City, Kansas City, MO, USA.
| | - Nicole M Ferraro
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Qin Hui
- Atlanta VA Health Care System, Decatur, GA, USA; Department of Epidemiology, Emory University Rollins School of Public Health, Atlanta, GA, USA
| | - Matthew G Durrant
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Matthew Aguirre
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Yosuke Tanigawa
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Marissa R Keever-Keigher
- Genomic Medicine Center, Children's Mercy Research Institute and Children's Mercy Kansas City, Kansas City, MO, USA
| | - Abhiram S Rao
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA; Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Johanne M Justesen
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Xin Li
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Michael J Gloudemans
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Themistocles L Assimes
- Palo Alto VA Health Care System, Palo Alto, CA, USA; Division of Cardiovascular Medicine, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Charles Kooperberg
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | | | - Jie Huang
- School of Public Health and Emergency Management, Southern University of Science and Technology, Shenzhen, Guangdong, China
| | - Christopher J O'Donnell
- Boston VA Health Care System, Boston, MA, USA; Division of Cardiology, Department of Medicine, Harvard Medical School, Boston, MA, USA; Division of Cardiology, Department of Medicine, Brigham Women's Hospital, Boston, MA, USA
| | - Yan V Sun
- Atlanta VA Health Care System, Decatur, GA, USA; Department of Epidemiology, Emory University Rollins School of Public Health, Atlanta, GA, USA
| | - Manuel A Rivas
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Stephen B Montgomery
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA; Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA.
| |
Collapse
|
17
|
Smith SP, Shahamatdar S, Cheng W, Zhang S, Paik J, Graff M, Haiman C, Matise TC, North KE, Peters U, Kenny E, Gignoux C, Wojcik G, Crawford L, Ramachandran S. Enrichment analyses identify shared associations for 25 quantitative traits in over 600,000 individuals from seven diverse ancestries. Am J Hum Genet 2022; 109:871-884. [PMID: 35349783 PMCID: PMC9118115 DOI: 10.1016/j.ajhg.2022.03.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 03/02/2022] [Indexed: 12/12/2022] Open
Abstract
Since 2005, genome-wide association (GWA) datasets have been largely biased toward sampling European ancestry individuals, and recent studies have shown that GWA results estimated from self-identified European individuals are not transferable to non-European individuals because of various confounding challenges. Here, we demonstrate that enrichment analyses that aggregate SNP-level association statistics at multiple genomic scales-from genes to genomic regions and pathways-have been underutilized in the GWA era and can generate biologically interpretable hypotheses regarding the genetic basis of complex trait architecture. We illustrate examples of the robust associations generated by enrichment analyses while studying 25 continuous traits assayed in 566,786 individuals from seven diverse self-identified human ancestries in the UK Biobank and the Biobank Japan as well as 44,348 admixed individuals from the PAGE consortium including cohorts of African American, Hispanic and Latin American, Native Hawaiian, and American Indian/Alaska Native individuals. We identify 1,000 gene-level associations that are genome-wide significant in at least two ancestry cohorts across these 25 traits as well as highly conserved pathway associations with triglyceride levels in European, East Asian, and Native Hawaiian cohorts.
Collapse
Affiliation(s)
- Samuel Pattillo Smith
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02912, USA
| | - Sahar Shahamatdar
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02912, USA
| | - Wei Cheng
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02912, USA
| | - Selena Zhang
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA
| | - Joseph Paik
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA
| | - Misa Graff
- Department of Epidemiology, University of North Carolina, Chapel Hill, Chapel Hill, NC 27599, USA
| | - Christopher Haiman
- Department of Preventative Medicine, University of Southern California, Los Angeles, CA 90089, USA
| | - T C Matise
- Department of Genetics, Rutgers University, Piscataway, NJ 08854, USA
| | - Kari E North
- Department of Epidemiology, University of North Carolina, Chapel Hill, Chapel Hill, NC 27599, USA
| | - Ulrike Peters
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Eimear Kenny
- The Center for Genomic Health, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA; The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA
| | - Chris Gignoux
- Division of Biomedical Informatics and Personalized Medicine, University of Colorado, Denver, CO 80204, USA
| | - Genevieve Wojcik
- Department of Epidemiology, Johns Hopkins University, Baltimore, MD 21287, USA
| | - Lorin Crawford
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Biostatistics, Brown University, Providence, RI 02906, USA; Microsoft Research New England, Cambridge, MA 02142, USA
| | - Sohini Ramachandran
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02912, USA; Data Science Initiative, Brown University, Providence, RI 02912, USA.
| |
Collapse
|
18
|
Burch KS, Hou K, Ding Y, Wang Y, Gazal S, Shi H, Pasaniuc B. Partitioning gene-level contributions to complex-trait heritability by allele frequency identifies disease-relevant genes. Am J Hum Genet 2022; 109:692-709. [PMID: 35271803 PMCID: PMC9069080 DOI: 10.1016/j.ajhg.2022.02.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Accepted: 02/15/2022] [Indexed: 11/15/2022] Open
Abstract
Recent works have shown that SNP heritability-which is dominated by low-effect common variants-may not be the most relevant quantity for localizing high-effect/critical disease genes. Here, we introduce methods to estimate the proportion of phenotypic variance explained by a given assignment of SNPs to a single gene ("gene-level heritability"). We partition gene-level heritability by minor allele frequency (MAF) to find genes whose gene-level heritability is explained exclusively by "low-frequency/rare" variants (0.5% ≤ MAF < 1%). Applying our method to ∼16K protein-coding genes and 25 quantitative traits in the UK Biobank (N = 290K "White British"), we find that, on average across traits, ∼2.5% of nonzero-heritability genes have a rare-variant component and only ∼0.8% (327 gene-trait pairs) have heritability exclusively from rare variants. Of these 327 gene-trait pairs, 114 (35%) were not detected by existing gene-level association testing methods. The additional genes we identify are significantly enriched for known disease genes, and we find several examples of genes that have been previously implicated in phenotypically related Mendelian disorders. Notably, the rare-variant component of gene-level heritability exhibits trends different from those of common-variant gene-level heritability. For example, while total gene-level heritability increases with gene length, the rare-variant component is significantly larger among shorter genes; the cumulative distributions of gene-level heritability also vary across traits and reveal differences in the relative contributions of rare/common variants to overall gene-level polygenicity. While nonzero gene-level heritability does not imply causality, if interpreted in the correct context, gene-level heritability can reveal useful insights into complex-trait genetic architecture.
Collapse
Affiliation(s)
- Kathryn S Burch
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Computational Medicine, David Geffen School of Medicine at University of California, Los Angeles, Los Angeles, CA 90095, USA.
| | - Kangcheng Hou
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Computational Medicine, David Geffen School of Medicine at University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Yi Ding
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Computational Medicine, David Geffen School of Medicine at University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Yifei Wang
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Steven Gazal
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Huwenbo Shi
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; OMNI Bioinformatics, Genentech, 1 DNA Way, South San Francisco, CA 94080, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Human Genetics, David Geffen School of Medicine at University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Computational Medicine, David Geffen School of Medicine at University of California, Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|
19
|
Vecchyo DOD, Lohmueller KE, Novembre J. Haplotype-based inference of the distribution of fitness effects. Genetics 2022; 220:6501446. [PMID: 35100400 PMCID: PMC8982047 DOI: 10.1093/genetics/iyac002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 12/18/2021] [Indexed: 11/13/2022] Open
Abstract
Abstract
Recent genome sequencing studies with large sample sizes in humans have discovered a vast quantity of low-frequency variants, providing an important source of information to analyze how selection is acting on human genetic variation. In order to estimate the strength of natural selection acting on low-frequency variants, we have developed a likelihood-based method that uses the lengths of pairwise identity-by-state between haplotypes carrying low-frequency variants. We show that in some non-equilibrium populations (such as those that have had recent population expansions) it is possible to distinguish between positive or negative selection acting on a set of variants. With our new framework, one can infer a fixed selection intensity acting on a set of variants at a particular frequency, or a distribution of selection coefficients for standing variants and new mutations. We show an application of our method to the UK10K phased haplotype dataset of individuals.
Collapse
Affiliation(s)
- Diego Ortega-Del Vecchyo
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Juriquilla, Querétaro, 76230, México
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, California, 90095, United States of America
| | - Kirk E Lohmueller
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, California, 90095, United States of America
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, 90095, United States of America
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, 90095, United States of America
| | - John Novembre
- Department of Human Genetics, University of Chicago, Chicago, Illinois, 60637, United States of America
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, 60637, United States of America
| |
Collapse
|
20
|
Gilbert KJ, Zdraljevic S, Cook DE, Cutter AD, Andersen EC, Baer CF. The distribution of mutational effects on fitness in Caenorhabditis elegans inferred from standing genetic variation. Genetics 2022; 220:iyab166. [PMID: 34791202 PMCID: PMC8733438 DOI: 10.1093/genetics/iyab166] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Accepted: 09/27/2021] [Indexed: 11/14/2022] Open
Abstract
The distribution of fitness effects (DFE) for new mutations is one of the most theoretically important but difficult to estimate properties in population genetics. A crucial challenge to inferring the DFE from natural genetic variation is the sensitivity of the site frequency spectrum to factors like population size change, population substructure, genome structure, and nonrandom mating. Although inference methods aim to control for population size changes, the influence of nonrandom mating remains incompletely understood, despite being a common feature of many species. We report the DFE estimated from 326 genomes of Caenorhabditis elegans, a nematode roundworm with a high rate of self-fertilization. We evaluate the robustness of DFE inferences using simulated data that mimics the genomic structure and reproductive life history of C. elegans. Our observations demonstrate how the combined influence of self-fertilization, genome structure, and natural selection on linked sites can conspire to compromise estimates of the DFE from extant polymorphisms with existing methods. These factors together tend to bias inferences toward weakly deleterious mutations, making it challenging to have full confidence in the inferred DFE of new mutations as deduced from standing genetic variation in species like C. elegans. Improved methods for inferring the DFE are needed to appropriately handle strong linked selection and selfing. These results highlight the importance of understanding the combined effects of processes that can bias our interpretations of evolution in natural populations.
Collapse
Affiliation(s)
| | - Stefan Zdraljevic
- Department of Molecular Biosciences, Northwestern University, Evanston, IL 60208, USA
- Department of Human Genetics, Department of Biological Chemistry, and Howard Hughes Medical Institute, University of California, Los Angeles, CA 90095, USA
| | - Daniel E Cook
- Department of Molecular Biosciences, Northwestern University, Evanston, IL 60208, USA
| | - Asher D Cutter
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON M5S 3B2, Canada
| | - Erik C Andersen
- Department of Molecular Biosciences, Northwestern University, Evanston, IL 60208, USA
| | - Charles F Baer
- Department of Biology, University of Florida, Gainesville, FL 32611-8525, USA
- University of Florida Genetics Institute, Gainesville, FL 32611, USA
| |
Collapse
|
21
|
Legarra A, Garcia-Baccino CA, Wientjes YCJ, Vitezica ZG. The correlation of substitution effects across populations and generations in the presence of nonadditive functional gene action. Genetics 2021; 219:iyab138. [PMID: 34718531 PMCID: PMC8664574 DOI: 10.1093/genetics/iyab138] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 08/19/2021] [Indexed: 11/14/2022] Open
Abstract
Allele substitution effects at quantitative trait loci (QTL) are part of the basis of quantitative genetics theory and applications such as association analysis and genomic prediction. In the presence of nonadditive functional gene action, substitution effects are not constant across populations. We develop an original approach to model the difference in substitution effects across populations as a first order Taylor series expansion from a "focal" population. This expansion involves the difference in allele frequencies and second-order statistical effects (additive by additive and dominance). The change in allele frequencies is a function of relationships (or genetic distances) across populations. As a result, it is possible to estimate the correlation of substitution effects across two populations using three elements: magnitudes of additive, dominance, and additive by additive variances; relationships (Nei's minimum distances or Fst indexes); and assumed heterozygosities. Similarly, the theory applies as well to distinct generations in a population, in which case the distance across generations is a function of increase of inbreeding. Simulation results confirmed our derivations. Slight biases were observed, depending on the nonadditive mechanism and the reference allele. Our derivations are useful to understand and forecast the possibility of prediction across populations and the similarity of GWAS effects.
Collapse
Affiliation(s)
- Andres Legarra
- INRAE/INP, UMR 1388 GenPhySE, Castanet-Tolosan 31326, France
| | - Carolina A. Garcia-Baccino
- INRAE/INP, UMR 1388 GenPhySE, Castanet-Tolosan 31326, France
- Departamento de Producción Animal, Facultad de Agronomía, Universidad de Buenos Aires, Buenos Aires C1417DSQ, Argentina
- SAS NUCLEUS, Le Rheu 35650, France
| | - Yvonne C. J. Wientjes
- Wageningen University & Research, Animal Breeding and Genomics, Wageningen 6700 AH, the Netherlands
| | | |
Collapse
|
22
|
Koch EM, Sunyaev SR. Maintenance of Complex Trait Variation: Classic Theory and Modern Data. Front Genet 2021; 12:763363. [PMID: 34868244 PMCID: PMC8636146 DOI: 10.3389/fgene.2021.763363] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 10/19/2021] [Indexed: 12/16/2022] Open
Abstract
Numerous studies have found evidence that GWAS loci experience negative selection, which increases in intensity with the effect size of identified variants. However, there is also accumulating evidence that this selection is not entirely mediated by the focal trait and contains a substantial pleiotropic component. Understanding how selective constraint shapes phenotypic variation requires advancing models capable of balancing these and other components of selection, as well as empirical analyses capable of inferring this balance and how it is generated by the underlying biology. We first review the classic theory connecting phenotypic selection to selection at individual loci as well as approaches and findings from recent analyses of negative selection in GWAS data. We then discuss geometric theories of pleiotropic selection with the potential to guide future modeling efforts. Recent findings revealing the nature of pleiotropic genetic variation provide clues to which genetic relationships are important and should be incorporated into analyses of selection, while findings that effect sizes vary between populations indicate that GWAS measurements could be misleading if effect sizes have also changed throughout human history.
Collapse
Affiliation(s)
- Evan M. Koch
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
- Division of Genetics, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
| | - Shamil R. Sunyaev
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
- Division of Genetics, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
23
|
Kreiner JM, Tranel PJ, Weigel D, Stinchcombe JR, Wright SI. The genetic architecture and population genomic signatures of glyphosate resistance in Amaranthus tuberculatus. Mol Ecol 2021; 30:5373-5389. [PMID: 33853196 DOI: 10.1111/mec.15920] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 03/15/2021] [Accepted: 04/06/2021] [Indexed: 01/04/2023]
Abstract
Much of what we know about the genetic basis of herbicide resistance has come from detailed investigations of monogenic adaptation at known target-sites, despite the increasingly recognized importance of polygenic resistance. Little work has been done to characterize the broader genomic basis of herbicide resistance, including the number and distribution of genes involved, their effect sizes, allele frequencies and signatures of selection. In this work, we implemented genome-wide association (GWA) and population genomic approaches to examine the genetic architecture of glyphosate (Round-up) resistance in the problematic agricultural weed Amaranthus tuberculatus. A GWA was able to correctly identify the known target-gene but statistically controlling for two causal target-site mechanisms revealed an additional 250 genes across all 16 chromosomes associated with non-target-site resistance (NTSR). The encoded proteins had functions that have been linked to NTSR, the most significant of which is response to chemicals, but also showed pleiotropic roles in reproduction and growth. Compared to an empirical null that accounts for complex population structure, the architecture of NTSR was enriched for large effect sizes and low allele frequencies, suggesting the role of pleiotropic constraints on its evolution. The enrichment of rare alleles also suggested that the genetic architecture of NTSR may be population-specific and heterogeneous across the range. Despite their rarity, we found signals of recent positive selection on NTSR-alleles by both window- and haplotype-based statistics, and an enrichment of amino acid changing variants. In our samples, genome-wide single nucleotide polymorphisms explain a comparable amount of the total variation in glyphosate resistance to monogenic mechanisms, even in a collection of individuals where 80% of resistant individuals have large-effect TSR mutations, indicating an underappreciated polygenic contribution to the evolution of herbicide resistance in weed populations.
Collapse
Affiliation(s)
- Julia M Kreiner
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON, Canada
| | - Patrick J Tranel
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Detlef Weigel
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - John R Stinchcombe
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON, Canada
- Koffler Scientific Reserve, University of Toronto, King City, ON, Canada
| | - Stephen I Wright
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON, Canada
- Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
24
|
Ashraf B, Lawson DJ. Genetic drift from the out-of-Africa bottleneck leads to biased estimation of genetic architecture and selection. Eur J Hum Genet 2021; 29:1549-1556. [PMID: 33846580 PMCID: PMC8484570 DOI: 10.1038/s41431-021-00873-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 02/17/2021] [Accepted: 03/17/2021] [Indexed: 02/07/2023] Open
Abstract
Most complex traits evolved in the ancestors of all modern humans and have been under negative or balancing selection to maintain the distribution of phenotypes observed today. Yet all large studies mapping genomes to complex traits occur in populations that have experienced the Out-of-Africa bottleneck. Does this bottleneck affect the way we characterise complex traits? We demonstrate using the 1000 Genomes dataset and hypothetical complex traits that genetic drift can strongly affect the joint distribution of effect size and SNP frequency, and that the bias can be positive or negative depending on subtle details. Characterisations that rely on this distribution therefore conflate genetic drift and selection. We provide a model to identify the underlying selection parameter in the presence of drift, and demonstrate that a simple sensitivity analysis may be enough to validate existing characterisations. We conclude that biobanks characterising more worldwide diversity would benefit studies of complex traits.
Collapse
Affiliation(s)
- Bilal Ashraf
- Department of Statistical Sciences, School of Mathematics, University of Bristol, Fry Building, Bristol, BS8 1UG, UK
- Department of Anthropology, Durham Research Methods Centre, University of Durham, Dawson Building, Durham, DH13LE, UK
| | - Daniel John Lawson
- Department of Statistical Sciences, School of Mathematics, University of Bristol, Fry Building, Bristol, BS8 1UG, UK.
- Integrative Epidemiology Unit, Population Health Sciences, University of Bristol, Oakfield House, Bristol, BS8 2BN, UK.
| |
Collapse
|
25
|
Visscher PM, Yengo L, Cox NJ, Wray NR. Discovery and implications of polygenicity of common diseases. Science 2021; 373:1468-1473. [PMID: 34554790 PMCID: PMC9945947 DOI: 10.1126/science.abi8206] [Citation(s) in RCA: 70] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
The sequencing of the human genome has allowed the study of the genetic architecture of common diseases: the number of genomic variants that contribute to risk of disease and their joint frequency and effect size distribution. Common diseases are polygenic, with many loci contributing to phenotype, and the cumulative burden of risk alleles determines individual risk in conjunction with environmental factors. Most risk loci occur in noncoding regions of the genome regulating cell- and context-specific gene expression. Although the effect sizes of most risk alleles are small, their cumulative effects in individuals, quantified as a polygenic (risk) score, can identify people at increased risk of disease, thereby facilitating prevention or early intervention.
Collapse
Affiliation(s)
- Peter M. Visscher
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia,Corresponding author.
| | - Loic Yengo
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia
| | - Nancy J. Cox
- Vanderbilt Genetics Institute and Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Naomi R. Wray
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia,Queensland Brain Institute, University of Queensland, Brisbane, QLD 4072, Australia
| |
Collapse
|
26
|
Otte KA, Nolte V, Mallard F, Schlötterer C. The genetic architecture of temperature adaptation is shaped by population ancestry and not by selection regime. Genome Biol 2021; 22:211. [PMID: 34271951 PMCID: PMC8285869 DOI: 10.1186/s13059-021-02425-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2020] [Accepted: 06/29/2021] [Indexed: 12/28/2022] Open
Abstract
Background Understanding the genetic architecture of temperature adaptation is key for characterizing and predicting the effect of climate change on natural populations. One particularly promising approach is Evolve and Resequence, which combines advantages of experimental evolution such as time series, replicate populations, and controlled environmental conditions, with whole genome sequencing. Recent analysis of replicate populations from two different Drosophila simulans founder populations, which were adapting to the same novel hot environment, uncovered very different architectures—either many selection targets with large heterogeneity among replicates or fewer selection targets with a consistent response among replicates. Results Here, we expose the founder population from Portugal to a cold temperature regime. Although almost no selection targets are shared between the hot and cold selection regime, the adaptive architecture was similar. We identify a moderate number of targets under strong selection (19 selection targets, mean selection coefficient = 0.072) and parallel responses in the cold evolved replicates. This similarity across different environments indicates that the adaptive architecture depends more on the ancestry of the founder population than the specific selection regime. Conclusions These observations will have broad implications for the correct interpretation of the genomic responses to a changing climate in natural populations. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-021-02425-9.
Collapse
Affiliation(s)
- Kathrin A Otte
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria.,Present address: Institute for Zoology, University of Cologne, Cologne, Germany
| | - Viola Nolte
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
| | - François Mallard
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria.,Present address: Institut de Biologie de l'École Normale Supérieure, CNRS UMR 8197, Inserm U1024, PSL Research University, F-75005, Paris, France
| | | |
Collapse
|
27
|
Garcia JA, Lohmueller KE. Negative linkage disequilibrium between amino acid changing variants reveals interference among deleterious mutations in the human genome. PLoS Genet 2021; 17:e1009676. [PMID: 34319975 PMCID: PMC8351996 DOI: 10.1371/journal.pgen.1009676] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 08/09/2021] [Accepted: 06/22/2021] [Indexed: 11/18/2022] Open
Abstract
Evolutionary forces like Hill-Robertson interference and negative epistasis can lead to deleterious mutations being found on distinct haplotypes. However, the extent to which these forces depend on the selection and dominance coefficients of deleterious mutations and shape genome-wide patterns of linkage disequilibrium (LD) in natural populations with complex demographic histories has not been tested. In this study, we first used forward-in-time simulations to predict how negative selection impacts LD. Under models where deleterious mutations have additive effects on fitness, deleterious variants less than 10 kb apart tend to be carried on different haplotypes relative to pairs of synonymous SNPs. In contrast, for recessive mutations, there is no consistent ordering of how selection coefficients affect LD decay, due to the complex interplay of different evolutionary effects. We then examined empirical data of modern humans from the 1000 Genomes Project. LD between derived alleles at nonsynonymous SNPs is lower compared to pairs of derived synonymous variants, suggesting that nonsynonymous derived alleles tend to occur on different haplotypes more than synonymous variants. This result holds when controlling for potential confounding factors by matching SNPs for frequency in the sample (allele count), physical distance, magnitude of background selection, and genetic distance between pairs of variants. Lastly, we introduce a new statistic HR(j) which allows us to detect interference using unphased genotypes. Application of this approach to high-coverage human genome sequences confirms our finding that nonsynonymous derived alleles tend to be located on different haplotypes more often than are synonymous derived alleles. Our findings suggest that interference may play a pervasive role in shaping patterns of LD between deleterious variants in the human genome, and consequently influences genome-wide patterns of LD.
Collapse
Affiliation(s)
- Jesse A. Garcia
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California, United States of America
| | - Kirk E. Lohmueller
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California, United States of America
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, United States of America
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California, United States of America
| |
Collapse
|
28
|
Chen ZQ, Zan Y, Milesi P, Zhou L, Chen J, Li L, Cui B, Niu S, Westin J, Karlsson B, García-Gil MR, Lascoux M, Wu HX. Leveraging breeding programs and genomic data in Norway spruce (Picea abies L. Karst) for GWAS analysis. Genome Biol 2021; 22:179. [PMID: 34120648 PMCID: PMC8201819 DOI: 10.1186/s13059-021-02392-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 05/26/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Genome-wide association studies (GWAS) identify loci underlying the variation of complex traits. One of the main limitations of GWAS is the availability of reliable phenotypic data, particularly for long-lived tree species. Although an extensive amount of phenotypic data already exists in breeding programs, accounting for its high heterogeneity is a great challenge. We combine spatial and factor-analytics analyses to standardize the heterogeneous data from 120 field experiments of 483,424 progenies of Norway spruce to implement the largest reported GWAS for trees using 134 605 SNPs from exome sequencing of 5056 parental trees. RESULTS We identify 55 novel quantitative trait loci (QTLs) that are associated with phenotypic variation. The largest number of QTLs is associated with the budburst stage, followed by diameter at breast height, wood quality, and frost damage. Two QTLs with the largest effect have a pleiotropic effect for budburst stage, frost damage, and diameter and are associated with MAP3K genes. Genotype data called from exome capture, recently developed SNP array and gene expression data indirectly support this discovery. CONCLUSION Several important QTLs associated with growth and frost damage have been verified in several southern and northern progeny plantations, indicating that these loci can be used in QTL-assisted genomic selection. Our study also demonstrates that existing heterogeneous phenotypic data from breeding programs, collected over several decades, is an important source for GWAS and that such integration into GWAS should be a major area of inquiry in the future.
Collapse
Affiliation(s)
- Zhi-Qiang Chen
- Umeå Plant Science Centre, Department Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, SE-90183, Umeå, Sweden
| | - Yanjun Zan
- Umeå Plant Science Centre, Department Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, SE-90183, Umeå, Sweden
| | - Pascal Milesi
- Program in Plant Ecology and Evolution, Department of Ecology and Genetics, Evolutionary Biology Centre and SciLifeLab, Uppsala University, Uppsala, Sweden
| | - Linghua Zhou
- Umeå Plant Science Centre, Department Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, SE-90183, Umeå, Sweden
| | - Jun Chen
- Program in Plant Ecology and Evolution, Department of Ecology and Genetics, Evolutionary Biology Centre and SciLifeLab, Uppsala University, Uppsala, Sweden
- College of Life Sciences, Zhejiang University, Zhejiang, 310058, Hangzhou, China
| | - Lili Li
- Program in Plant Ecology and Evolution, Department of Ecology and Genetics, Evolutionary Biology Centre and SciLifeLab, Uppsala University, Uppsala, Sweden
| | - BinBin Cui
- College of Biochemistry and Environmental Engineering, Baoding University, Baoding, 071000, Hebei, China
| | - Shihui Niu
- Beijing Advanced Innovation Centre for Tree Breeding by Molecular Design, Beijing Forestry University, Beijing, China
| | - Johan Westin
- Skogforsk, Box 3, SE-91821, Sävar, Sweden
- Unit for Field-Based Forest Research, Swedish University of Agricultural Sciences, SE-90183, Umeå, Sweden
| | - Bo Karlsson
- Skogforsk, Ekebo, 2250, SE-26890, Svalöv, Sweden
| | - Maria Rosario García-Gil
- Umeå Plant Science Centre, Department Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, SE-90183, Umeå, Sweden
| | - Martin Lascoux
- Program in Plant Ecology and Evolution, Department of Ecology and Genetics, Evolutionary Biology Centre and SciLifeLab, Uppsala University, Uppsala, Sweden
| | - Harry X Wu
- Umeå Plant Science Centre, Department Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, SE-90183, Umeå, Sweden.
- Beijing Advanced Innovation Centre for Tree Breeding by Molecular Design, Beijing Forestry University, Beijing, China.
- CSIRO National Collection Research Australia, Black Mountain Laboratory, Canberra, ACT, 2601, Australia.
| |
Collapse
|
29
|
Durvasula A, Lohmueller KE. Negative selection on complex traits limits phenotype prediction accuracy between populations. Am J Hum Genet 2021; 108:620-631. [PMID: 33691092 DOI: 10.1016/j.ajhg.2021.02.013] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 02/17/2021] [Indexed: 12/22/2022] Open
Abstract
Phenotype prediction is a key goal for medical genetics. Unfortunately, most genome-wide association studies are done in European populations, which reduces the accuracy of predictions via polygenic scores in non-European populations. Here, we use population genetic models to show that human demographic history and negative selection on complex traits can result in population-specific genetic architectures. For traits where alleles with the largest effect on the trait are under the strongest negative selection, approximately half of the heritability can be accounted for by variants in Europe that are absent from Africa, leading to poor performance in phenotype prediction across these populations. Further, under such a model, individuals in the tails of the genetic risk distribution may not be identified via polygenic scores generated in another population. We empirically test these predictions by building a model to stratify heritability between European-specific and shared variants and applied it to 37 traits and diseases in the UK Biobank. Across these phenotypes, ∼30% of the heritability comes from European-specific variants. We conclude that genetic association studies need to include more diverse populations to enable the utility of phenotype prediction in all populations.
Collapse
Affiliation(s)
- Arun Durvasula
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Kirk E Lohmueller
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, CA 90095, USA; Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|
30
|
Holland D, Frei O, Desikan R, Fan CC, Shadrin AA, Smeland OB, Andreassen OA, Dale AM. The genetic architecture of human complex phenotypes is modulated by linkage disequilibrium and heterozygosity. Genetics 2021; 217:iyaa046. [PMID: 33789345 PMCID: PMC8045737 DOI: 10.1093/genetics/iyaa046] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Accepted: 12/17/2020] [Indexed: 12/16/2022] Open
Abstract
We propose an extended Gaussian mixture model for the distribution of causal effects of common single nucleotide polymorphisms (SNPs) for human complex phenotypes that depends on linkage disequilibrium (LD) and heterozygosity (H), while also allowing for independent components for small and large effects. Using a precise methodology showing how genome-wide association studies (GWASs) summary statistics (z-scores) arise through LD with underlying causal SNPs, we applied the model to GWAS of multiple human phenotypes. Our findings indicated that causal effects are distributed with dependence on total LD and H, whereby SNPs with lower total LD and H are more likely to be causal with larger effects; this dependence is consistent with models of the influence of negative pressure from natural selection. Compared with the basic Gaussian mixture model it is built on, the extended model-primarily through quantification of selection pressure-reproduces with greater accuracy the empirical distributions of z-scores, thus providing better estimates of genetic quantities, such as polygenicity and heritability, that arise from the distribution of causal effects.
Collapse
Affiliation(s)
- Dominic Holland
- Center for Multimodal Imaging and Genetics, University of California at San Diego, La Jolla, CA 92037, USA
| | - Oleksandr Frei
- NORMENT, KG Jebsen Centre for Psychosis Research, Institute of Clinical Medicine, University of Oslo, Oslo 0424, Norway
| | - Rahul Desikan
- Department of Radiology, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Chun-Chieh Fan
- Center for Multimodal Imaging and Genetics, University of California at San Diego, La Jolla, CA 92037, USA
| | - Alexey A Shadrin
- NORMENT, KG Jebsen Centre for Psychosis Research, Institute of Clinical Medicine, University of Oslo, Oslo 0424, Norway
| | - Olav B Smeland
- NORMENT, KG Jebsen Centre for Psychosis Research, Institute of Clinical Medicine, University of Oslo, Oslo 0424, Norway
| | - Ole A Andreassen
- NORMENT, KG Jebsen Centre for Psychosis Research, Institute of Clinical Medicine, University of Oslo, Oslo 0424, Norway
| | - Anders M Dale
- Department of Radiology, University of California, San Francisco, San Francisco, CA 94158, USA
| |
Collapse
|
31
|
Widespread signatures of natural selection across human complex traits and functional genomic categories. Nat Commun 2021; 12:1164. [PMID: 33608517 PMCID: PMC7896067 DOI: 10.1038/s41467-021-21446-3] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Accepted: 01/27/2021] [Indexed: 01/16/2023] Open
Abstract
Understanding how natural selection has shaped genetic architecture of complex traits is of importance in medical and evolutionary genetics. Bayesian methods have been developed using individual-level GWAS data to estimate multiple genetic architecture parameters including selection signature. Here, we present a method (SBayesS) that only requires GWAS summary statistics. We analyse data for 155 complex traits (n = 27k–547k) and project the estimates onto those obtained from evolutionary simulations. We estimate that, on average across traits, about 1% of human genome sequence are mutational targets with a mean selection coefficient of ~0.001. Common diseases, on average, show a smaller number of mutational targets and have been under stronger selection, compared to other traits. SBayesS analyses incorporating functional annotations reveal that selection signatures vary across genomic regions, among which coding regions have the strongest selection signature and are enriched for both the number of associated variants and the magnitude of effect sizes. Methods to study how natural selection shapes genetic architecture of complex traits rely on individual level genome-wide association study (GWAS) data. Here, the authors present a Bayesian method using GWAS summary statistics to study genetic architecture and apply this to 155 complex traits.
Collapse
|
32
|
Shi H, Gazal S, Kanai M, Koch EM, Schoech AP, Siewert KM, Kim SS, Luo Y, Amariuta T, Huang H, Okada Y, Raychaudhuri S, Sunyaev SR, Price AL. Population-specific causal disease effect sizes in functionally important regions impacted by selection. Nat Commun 2021; 12:1098. [PMID: 33597505 PMCID: PMC7889654 DOI: 10.1038/s41467-021-21286-1] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Accepted: 01/15/2021] [Indexed: 01/31/2023] Open
Abstract
Many diseases exhibit population-specific causal effect sizes with trans-ethnic genetic correlations significantly less than 1, limiting trans-ethnic polygenic risk prediction. We develop a new method, S-LDXR, for stratifying squared trans-ethnic genetic correlation across genomic annotations, and apply S-LDXR to genome-wide summary statistics for 31 diseases and complex traits in East Asians (average N = 90K) and Europeans (average N = 267K) with an average trans-ethnic genetic correlation of 0.85. We determine that squared trans-ethnic genetic correlation is 0.82× (s.e. 0.01) depleted in the top quintile of background selection statistic, implying more population-specific causal effect sizes. Accordingly, causal effect sizes are more population-specific in functionally important regions, including conserved and regulatory regions. In regions surrounding specifically expressed genes, causal effect sizes are most population-specific for skin and immune genes, and least population-specific for brain genes. Our results could potentially be explained by stronger gene-environment interaction at loci impacted by selection, particularly positive selection.
Collapse
Affiliation(s)
- Huwenbo Shi
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Steven Gazal
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Masahiro Kanai
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Evan M Koch
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Armin P Schoech
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Katherine M Siewert
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Samuel S Kim
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Yang Luo
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Tiffany Amariuta
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Graduate School of Arts and Sciences, Harvard University, Cambridge, MA, USA
| | - Hailiang Huang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, Japan
| | - Soumya Raychaudhuri
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Arthritis Research UK Centre for Genetics and Genomics, Centre for Musculoskeletal Research, Manchester Academic Health Science Centre, The University of Manchester, Manchester, UK
| | - Shamil R Sunyaev
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Alkes L Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
33
|
Spear ML, Diaz-Papkovich A, Ziv E, Yracheta JM, Gravel S, Torgerson DG, Hernandez RD. Recent shifts in the genomic ancestry of Mexican Americans may alter the genetic architecture of biomedical traits. eLife 2020; 9:e56029. [PMID: 33372659 PMCID: PMC7771964 DOI: 10.7554/elife.56029] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Accepted: 12/13/2020] [Indexed: 11/13/2022] Open
Abstract
People in the Americas represent a diverse continuum of populations with varying degrees of admixture among African, European, and Amerindigenous ancestries. In the United States, populations with non-European ancestry remain understudied, and thus little is known about the genetic architecture of phenotypic variation in these populations. Using genotype data from the Hispanic Community Health Study/Study of Latinos, we find that Amerindigenous ancestry increased by an average of ~20% spanning 1940s-1990s in Mexican Americans. These patterns result from complex interactions between several population and cultural factors which shaped patterns of genetic variation and influenced the genetic architecture of complex traits in Mexican Americans. We show for height how polygenic risk scores based on summary statistics from a European-based genome-wide association study perform poorly in Mexican Americans. Our findings reveal temporal changes in population structure within Hispanics/Latinos that may influence biomedical traits, demonstrating a need to improve our understanding of admixed populations.
Collapse
Affiliation(s)
- Melissa L Spear
- Biomedical Sciences Graduate Program, University of California, San FranciscoSan FranciscoUnited States
- Department of Bioengineering and Therapeutic Sciences, University of California, San FranciscoSan FranciscoUnited States
- McGill Genome Centre, McGill UniversityMontrealCanada
- Department of Human Genetics, McGill UniversityMontrealCanada
| | - Alex Diaz-Papkovich
- McGill Genome Centre, McGill UniversityMontrealCanada
- Quantitative Life Sciences Program, McGill UniversityMontrealCanada
| | - Elad Ziv
- Division of General Internal Medicine, University of California, San FranciscoSan FranciscoUnited States
- Department of Medicine, University of California, San FranciscoSan FranciscoUnited States
- Institute of Human Genetics, University of California, San FranciscoSan FranciscoUnited States
- Helen Diller Family Comprehensive Cancer Center, University of California, San FranciscoSan FranciscoUnited States
| | - Joseph M Yracheta
- Native BioData ConsortiumEagle ButteUnited States
- Bloomberg School of Public Health, Johns Hopkins UniversityBaltimoreUnited States
| | - Simon Gravel
- McGill Genome Centre, McGill UniversityMontrealCanada
- Department of Human Genetics, McGill UniversityMontrealCanada
| | - Dara G Torgerson
- McGill Genome Centre, McGill UniversityMontrealCanada
- Department of Human Genetics, McGill UniversityMontrealCanada
- Department of Epidemiology and Biostatistics University of California, San FranciscoSan FranciscoUnited States
| | - Ryan D Hernandez
- Department of Bioengineering and Therapeutic Sciences, University of California, San FranciscoSan FranciscoUnited States
- McGill Genome Centre, McGill UniversityMontrealCanada
- Department of Human Genetics, McGill UniversityMontrealCanada
- Institute of Human Genetics, University of California, San FranciscoSan FranciscoUnited States
- Bakar Computational Health Sciences Institute, University of California, San FranciscoSan FranciscoUnited States
- Quantitative Biosciences Institute, University of California, San FranciscoSan FranciscoUnited States
| |
Collapse
|
34
|
Brion C, Lutz SM, Albert FW. Simultaneous quantification of mRNA and protein in single cells reveals post-transcriptional effects of genetic variation. eLife 2020; 9:60645. [PMID: 33191917 PMCID: PMC7707838 DOI: 10.7554/elife.60645] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Accepted: 11/14/2020] [Indexed: 01/27/2023] Open
Abstract
Trans-acting DNA variants may specifically affect mRNA or protein levels of genes located throughout the genome. However, prior work compared trans-acting loci mapped in separate studies, many of which had limited statistical power. Here, we developed a CRISPR-based system for simultaneous quantification of mRNA and protein of a given gene via dual fluorescent reporters in single, live cells of the yeast Saccharomyces cerevisiae. In large populations of recombinant cells from a cross between two genetically divergent strains, we mapped 86 trans-acting loci affecting the expression of ten genes. Less than 20% of these loci had concordant effects on mRNA and protein of the same gene. Most loci influenced protein but not mRNA of a given gene. One locus harbored a premature stop variant in the YAK1 kinase gene that had specific effects on protein or mRNA of dozens of genes. These results demonstrate complex, post-transcriptional genetic effects on gene expression.
Collapse
Affiliation(s)
- Christian Brion
- Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, United States
| | - Sheila M Lutz
- Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, United States
| | - Frank Wolfgang Albert
- Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, United States
| |
Collapse
|
35
|
|
36
|
Neuner SM, Tcw J, Goate AM. Genetic architecture of Alzheimer's disease. Neurobiol Dis 2020; 143:104976. [PMID: 32565066 PMCID: PMC7409822 DOI: 10.1016/j.nbd.2020.104976] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Revised: 05/30/2020] [Accepted: 06/13/2020] [Indexed: 02/06/2023] Open
Abstract
Advances in genetic and genomic technologies over the last thirty years have greatly enhanced our knowledge concerning the genetic architecture of Alzheimer's disease (AD). Several genes including APP, PSEN1, PSEN2, and APOE have been shown to exhibit large effects on disease susceptibility, with the remaining risk loci having much smaller effects on AD risk. Notably, common genetic variants impacting AD are not randomly distributed across the genome. Instead, these variants are enriched within regulatory elements active in human myeloid cells, and to a lesser extent liver cells, implicating these cell and tissue types as critical to disease etiology. Integrative approaches are emerging as highly effective for identifying the specific target genes through which AD risk variants act and will likely yield important insights related to potential therapeutic targets in the coming years. In the future, additional consideration of sex- and ethnicity-specific contributions to risk as well as the contribution of complex gene-gene and gene-environment interactions will likely be necessary to further improve our understanding of AD genetic architecture.
Collapse
Affiliation(s)
- Sarah M Neuner
- Nash Department of Neuroscience, Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Julia Tcw
- Nash Department of Neuroscience, Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Alison M Goate
- Nash Department of Neuroscience, Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, USA.
| |
Collapse
|
37
|
Liu S, Yu Y, Zhang S, Cole JB, Tenesa A, Wang T, McDaneld TG, Ma L, Liu GE, Fang L. Epigenomics and genotype-phenotype association analyses reveal conserved genetic architecture of complex traits in cattle and human. BMC Biol 2020; 18:80. [PMID: 32620158 PMCID: PMC7334855 DOI: 10.1186/s12915-020-00792-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 05/12/2020] [Indexed: 02/01/2023] Open
Abstract
Background Lack of comprehensive functional annotations across a wide range of tissues and cell types severely hinders the biological interpretations of phenotypic variation, adaptive evolution, and domestication in livestock. Here we used a combination of comparative epigenomics, genome-wide association study (GWAS), and selection signature analysis, to shed light on potential adaptive evolution in cattle. Results We cross-mapped 8 histone marks of 1300 samples from human to cattle, covering 178 unique tissues/cell types. By uniformly analyzing 723 RNA-seq and 40 whole genome bisulfite sequencing (WGBS) datasets in cattle, we validated that cross-mapped histone marks captured tissue-specific expression and methylation, reflecting tissue-relevant biology. Through integrating cross-mapped tissue-specific histone marks with large-scale GWAS and selection signature results, we for the first time detected relevant tissues and cell types for 45 economically important traits and artificial selection in cattle. For instance, immune tissues are significantly associated with health and reproduction traits, multiple tissues for milk production and body conformation traits (reflecting their highly polygenic architecture), and thyroid for the different selection between beef and dairy cattle. Similarly, we detected relevant tissues for 58 complex traits and diseases in humans and observed that immune and fertility traits in humans significantly correlated with those in cattle in terms of relevant tissues, which facilitated the identification of causal genes for such traits. For instance, PIK3CG, a gene highly specifically expressed in mononuclear cells, was significantly associated with both age-at-menopause in human and daughter-still-birth in cattle. ICAM, a T cell-specific gene, was significantly associated with both allergic diseases in human and metritis in cattle. Conclusion Collectively, our results highlighted that comparative epigenomics in conjunction with GWAS and selection signature analyses could provide biological insights into the phenotypic variation and adaptive evolution. Cattle may serve as a model for human complex traits, by providing additional information beyond laboratory model organisms, particularly when more novel phenotypes become available in the near future.
Collapse
Affiliation(s)
- Shuli Liu
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, BARC-East, Beltsville, MD, 20705, USA.,College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Ying Yu
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Shengli Zhang
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - John B Cole
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, BARC-East, Beltsville, MD, 20705, USA
| | - Albert Tenesa
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, UK.,The Roslin Institute, University of Edinburgh, Edinburgh, EH25 9RG, UK
| | - Ting Wang
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, 63110, USA
| | - Tara G McDaneld
- US Meat Animal Research Center, Agricultural Research Service, USDA, Clay Center, NE, 68933, USA
| | - Li Ma
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD, 20742, USA.
| | - George E Liu
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, BARC-East, Beltsville, MD, 20705, USA.
| | - Lingzhao Fang
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, BARC-East, Beltsville, MD, 20705, USA. .,MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, UK. .,Department of Animal and Avian Sciences, University of Maryland, College Park, MD, 20742, USA.
| |
Collapse
|
38
|
Uricchio LH. Evolutionary perspectives on polygenic selection, missing heritability, and GWAS. Hum Genet 2020; 139:5-21. [PMID: 31201529 PMCID: PMC8059781 DOI: 10.1007/s00439-019-02040-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Accepted: 06/06/2019] [Indexed: 12/26/2022]
Abstract
Genome-wide association studies (GWAS) have successfully identified many trait-associated variants, but there is still much we do not know about the genetic basis of complex traits. Here, we review recent theoretical and empirical literature regarding selection on complex traits to argue that "missing heritability" is as much an evolutionary problem as it is a statistical problem. We discuss empirical findings that suggest a role for selection in shaping the effect sizes and allele frequencies of causal variation underlying complex traits, and the limitations of these studies. We then use simulations of selection, realistic genome structure, and complex human demography to illustrate the results of recent theoretical work on polygenic selection, and show that statistical inference of causal loci is sharply affected by evolutionary processes. In particular, when selection acts on causal alleles, it hampers the ability to detect causal loci and constrains the transferability of GWAS results across populations. Last, we discuss the implications of these findings for future association studies, and suggest that future statistical methods to infer causal loci for genetic traits will benefit from explicit modeling of the joint distribution of effect sizes and allele frequencies under plausible evolutionary models.
Collapse
Affiliation(s)
- Lawrence H Uricchio
- Department of Biology, Stanford University, Stanford, CA, USA.
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA.
| |
Collapse
|
39
|
Kono TJY, Liu C, Vonderharr EE, Koenig D, Fay JC, Smith KP, Morrell PL. The Fate of Deleterious Variants in a Barley Genomic Prediction Population. Genetics 2019; 213:1531-1544. [PMID: 31653677 PMCID: PMC6893365 DOI: 10.1534/genetics.119.302733] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Accepted: 10/11/2019] [Indexed: 02/07/2023] Open
Abstract
Targeted identification and purging of deleterious genetic variants has been proposed as a novel approach to animal and plant breeding. This strategy is motivated, in part, by the observation that demographic events and strong selection associated with cultivated species pose a "cost of domestication." This includes an increase in the proportion of genetic variants that are likely to reduce fitness. Recent advances in DNA resequencing and sequence constraint-based approaches to predict the functional impact of a mutation permit the identification of putatively deleterious SNPs (dSNPs) on a genome-wide scale. Using exome capture resequencing of 21 barley lines, we identified 3855 dSNPs among 497,754 total SNPs. We generated whole-genome resequencing data of Hordeum murinum ssp. glaucum as a phylogenetic outgroup to polarize SNPs as ancestral vs. derived. We also observed a higher proportion of dSNPs per synonymous SNPs (sSNPs) in low-recombination regions of the genome. Using 5215 progeny from a genomic prediction experiment, we examined the fate of dSNPs over three breeding cycles. Adjusting for initial frequency, derived alleles at dSNPs reduced in frequency or were lost more often than other classes of SNPs. The highest-yielding lines in the experiment, as chosen by standard genomic prediction approaches, carried fewer homozygous dSNPs than randomly sampled lines from the same progeny cycle. In the final cycle of the experiment, progeny selected by genomic prediction had a mean of 5.6% fewer homozygous dSNPs relative to randomly chosen progeny from the same cycle.
Collapse
Affiliation(s)
- Thomas J Y Kono
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, Minnesota 55108
| | - Chaochih Liu
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, Minnesota 55108
| | - Emily E Vonderharr
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, Minnesota 55108
| | - Daniel Koenig
- Department of Botany and Plant Sciences, University of California, Riverside, California 92521
| | - Justin C Fay
- Department of Biology, University of Rochester, New York 14627
| | - Kevin P Smith
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, Minnesota 55108
| | - Peter L Morrell
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, Minnesota 55108
| |
Collapse
|
40
|
Bloom JS, Boocock J, Treusch S, Sadhu MJ, Day L, Oates-Barker H, Kruglyak L. Rare variants contribute disproportionately to quantitative trait variation in yeast. eLife 2019; 8:49212. [PMID: 31647408 PMCID: PMC6892613 DOI: 10.7554/elife.49212] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Accepted: 10/23/2019] [Indexed: 11/24/2022] Open
Abstract
How variants with different frequencies contribute to trait variation is a central question in genetics. We use a unique model system to disentangle the contributions of common and rare variants to quantitative traits. We generated ~14,000 progeny from crosses among 16 diverse yeast strains and identified thousands of quantitative trait loci (QTLs) for 38 traits. We combined our results with sequencing data for 1011 yeast isolates to show that rare variants make a disproportionate contribution to trait variation. Evolutionary analyses revealed that this contribution is driven by rare variants that arose recently, and that negative selection has shaped the relationship between variant frequency and effect size. We leveraged the structure of the crosses to resolve hundreds of QTLs to single genes. These results refine our understanding of trait variation at the population level and suggest that studies of rare variants are a fertile ground for discovery of genetic effects.
Collapse
Affiliation(s)
- Joshua S Bloom
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, United States.,Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, United States.,Howard Hughes Medical Institute, University of California, Los Angeles, Los Angeles, United States.,Institute for Quantitative and Computational Biology, University of California, Los Angeles, Los Angeles, United States
| | - James Boocock
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, United States.,Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, United States.,Howard Hughes Medical Institute, University of California, Los Angeles, Los Angeles, United States.,Institute for Quantitative and Computational Biology, University of California, Los Angeles, Los Angeles, United States
| | - Sebastian Treusch
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, United States.,Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, United States.,Howard Hughes Medical Institute, University of California, Los Angeles, Los Angeles, United States.,Institute for Quantitative and Computational Biology, University of California, Los Angeles, Los Angeles, United States
| | - Meru J Sadhu
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, United States.,Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, United States.,Howard Hughes Medical Institute, University of California, Los Angeles, Los Angeles, United States.,Institute for Quantitative and Computational Biology, University of California, Los Angeles, Los Angeles, United States
| | - Laura Day
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, United States.,Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, United States.,Howard Hughes Medical Institute, University of California, Los Angeles, Los Angeles, United States
| | - Holly Oates-Barker
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, United States.,Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, United States.,Howard Hughes Medical Institute, University of California, Los Angeles, Los Angeles, United States
| | - Leonid Kruglyak
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, United States.,Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, United States.,Howard Hughes Medical Institute, University of California, Los Angeles, Los Angeles, United States.,Institute for Quantitative and Computational Biology, University of California, Los Angeles, Los Angeles, United States
| |
Collapse
|
41
|
Oliynyk RT. Future Preventive Gene Therapy of Polygenic Diseases from a Population Genetics Perspective. Int J Mol Sci 2019; 20:E5013. [PMID: 31658652 PMCID: PMC6834143 DOI: 10.3390/ijms20205013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 10/01/2019] [Accepted: 10/08/2019] [Indexed: 12/15/2022] Open
Abstract
With the accumulation of scientific knowledge of the genetic causes of common diseases and continuous advancement of gene-editing technologies, gene therapies to prevent polygenic diseases may soon become possible. This study endeavored to assess population genetics consequences of such therapies. Computer simulations were used to evaluate the heterogeneity in causal alleles for polygenic diseases that could exist among geographically distinct populations. The results show that although heterogeneity would not be easily detectable by epidemiological studies following population admixture, even significant heterogeneity would not impede the outcomes of preventive gene therapies. Preventive gene therapies designed to correct causal alleles to a naturally-occurring neutral state of nucleotides would lower the prevalence of polygenic early- to middle-age-onset diseases in proportion to the decreased population relative risk attributable to the edited alleles. The outcome would manifest differently for late-onset diseases, for which the therapies would result in a delayed disease onset and decreased lifetime risk; however, the lifetime risk would increase again with prolonging population life expectancy, which is a likely consequence of such therapies. If the preventive heritable gene therapies were to be applied on a large scale, the decreasing frequency of risk alleles in populations would reduce the disease risk or delay the age of onset, even with a fraction of the population receiving such therapies. With ongoing population admixture, all groups would benefit over generations.
Collapse
Affiliation(s)
- Roman Teo Oliynyk
- Centre for Computational Evolution, University of Auckland, Auckland 1010, New Zealand.
- Department of Computer Science, University of Auckland, Auckland 1010, New Zealand.
| |
Collapse
|
42
|
O'Connor LJ, Schoech AP, Hormozdiari F, Gazal S, Patterson N, Price AL. Extreme Polygenicity of Complex Traits Is Explained by Negative Selection. Am J Hum Genet 2019; 105:456-476. [PMID: 31402091 PMCID: PMC6732528 DOI: 10.1016/j.ajhg.2019.07.003] [Citation(s) in RCA: 131] [Impact Index Per Article: 26.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Accepted: 07/03/2019] [Indexed: 12/16/2022] Open
Abstract
Complex traits and common diseases are extremely polygenic, their heritability spread across thousands of loci. One possible explanation is that thousands of genes and loci have similarly important biological effects when mutated. However, we hypothesize that for most complex traits, relatively few genes and loci are critical, and negative selection-purging large-effect mutations in these regions-leaves behind common-variant associations in thousands of less critical regions instead. We refer to this phenomenon as flattening. To quantify its effects, we introduce a mathematical definition of polygenicity, the effective number of independently associated SNPs (Me), which describes how evenly the heritability of a trait is spread across the genome. We developed a method, stratified LD fourth moments regression (S-LD4M), to estimate Me, validating that it produces robust estimates in simulations. Analyzing 33 complex traits (average N = 361k), we determined that heritability is spread ∼4× more evenly among common SNPs than among low-frequency SNPs. This difference, together with evolutionary modeling of new mutations, suggests that complex traits would be orders of magnitude less polygenic if not for the influence of negative selection. We also determined that heritability is spread more evenly within functionally important regions in proportion to their heritability enrichment; functionally important regions do not harbor common SNPs with greatly increased causal effect sizes, due to selective constraint. Our results suggest that for most complex traits, the genes and loci with the most critical biological effects often differ from those with the strongest common-variant associations.
Collapse
Affiliation(s)
- Luke J O'Connor
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Bioinformatics and Integrative Genomics, Harvard Graduate School of Arts and Sciences, Boston, MA 02115, USA.
| | - Armin P Schoech
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Farhad Hormozdiari
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Steven Gazal
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Nick Patterson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Alkes L Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
43
|
Hernandez RD, Uricchio LH, Hartman K, Ye C, Dahl A, Zaitlen N. Ultrarare variants drive substantial cis heritability of human gene expression. Nat Genet 2019; 51:1349-1355. [PMID: 31477931 PMCID: PMC6730564 DOI: 10.1038/s41588-019-0487-7] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2018] [Accepted: 07/08/2019] [Indexed: 11/09/2022]
Abstract
The vast majority of human mutations have minor allele frequencies under 1%, with the plurality observed only once (that is, 'singletons'). While Mendelian diseases are predominantly caused by rare alleles, their cumulative contribution to complex phenotypes is largely unknown. We develop and rigorously validate an approach to jointly estimate the contribution of all alleles, including singletons, to phenotypic variation. We apply our approach to transcriptional regulation, an intermediate between genetic variation and complex disease. Using whole-genome DNA and lymphoblastoid cell line RNA sequencing data from 360 European individuals, we conservatively estimate that singletons contribute approximately 25% of cis heritability across genes (dwarfing the contributions of other frequencies). The majority (approximately 76%) of singleton heritability derives from ultrarare variants absent from thousands of additional samples. We develop an inference procedure to demonstrate that our results are consistent with pervasive purifying selection shaping the regulatory architecture of most human genes.
Collapse
Affiliation(s)
- Ryan D Hernandez
- Bioengineering & Therapeutic Sciences, UCSF, San Francisco, CA, USA.
- Institute for Human Genetics, UCSF, San Francisco, CA, USA.
- Institute for Quantitative Biosciences, UCSF, San Francisco, CA, USA.
- Institute for Computational Health Sciences, UCSF, San Francisco, CA, USA.
- Department of Human Genetics, McGill University, Montreal, Quebec, Canada.
- McGill University and the Genome Quebec Innovation Center, Montreal, Quebec, Canada.
| | | | - Kevin Hartman
- Biological and Medical Informatics Graduate Program, UCSF, San Francisco, CA, USA
| | - Chun Ye
- Institute for Human Genetics, UCSF, San Francisco, CA, USA
- Epidemiology & Biostatistics, UCSF, San Francisco, CA, USA
| | - Andrew Dahl
- Institute for Human Genetics, UCSF, San Francisco, CA, USA
- Institute for Quantitative Biosciences, UCSF, San Francisco, CA, USA
| | - Noah Zaitlen
- Institute for Human Genetics, UCSF, San Francisco, CA, USA.
- Institute for Quantitative Biosciences, UCSF, San Francisco, CA, USA.
- Department of Medicine Lung Biology Center, UCSF, San Francisco, CA, USA.
| |
Collapse
|
44
|
Sella G, Barton NH. Thinking About the Evolution of Complex Traits in the Era of Genome-Wide Association Studies. Annu Rev Genomics Hum Genet 2019; 20:461-493. [DOI: 10.1146/annurev-genom-083115-022316] [Citation(s) in RCA: 123] [Impact Index Per Article: 24.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Many traits of interest are highly heritable and genetically complex, meaning that much of the variation they exhibit arises from differences at numerous loci in the genome. Complex traits and their evolution have been studied for more than a century, but only in the last decade have genome-wide association studies (GWASs) in humans begun to reveal their genetic basis. Here, we bring these threads of research together to ask how findings from GWASs can further our understanding of the processes that give rise to heritable variation in complex traits and of the genetic basis of complex trait evolution in response to changing selection pressures (i.e., of polygenic adaptation). Conversely, we ask how evolutionary thinking helps us to interpret findings from GWASs and informs related efforts of practical importance.
Collapse
Affiliation(s)
- Guy Sella
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
- Program for Mathematical Genomics, Columbia University, New York, NY 10032, USA
| | - Nicholas H. Barton
- Institute of Science and Technology Austria, 3400 Klosterneuburg, Austria
| |
Collapse
|
45
|
Hou K, Burch KS, Majumdar A, Shi H, Mancuso N, Wu Y, Sankararaman S, Pasaniuc B. Accurate estimation of SNP-heritability from biobank-scale data irrespective of genetic architecture. Nat Genet 2019; 51:1244-1251. [PMID: 31358995 PMCID: PMC6686906 DOI: 10.1038/s41588-019-0465-0] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Accepted: 06/13/2019] [Indexed: 12/14/2022]
Abstract
SNP-heritability is a fundamental quantity in the study of complex traits. Recent studies have shown that existing methods to estimate genome-wide SNP-heritability can yield biases when their assumptions are violated. While various approaches have been proposed to account for frequency- and linkage disequilibrium (LD)-dependent genetic architectures, it remains unclear which estimates reported in the literature are reliable. Here we show that genome-wide SNP-heritability can be accurately estimated from biobank-scale data irrespective of genetic architecture, without specifying a heritability model or partitioning SNPs by allele frequency and/or LD. We show analytically and through extensive simulations starting from real genotypes (UK Biobank, N = 337 K) that, unlike existing methods, our closed-form estimator is robust across a wide range of architectures. We provide estimates of SNP-heritability for 22 complex traits in the UK Biobank and show that, consistent with our results in simulations, existing biobank-scale methods yield estimates up to 30% different from our theoretically-justified approach.
Collapse
Affiliation(s)
- Kangcheng Hou
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- College of Computer Science and Technology, Zhejiang University, Hangzhou, China
| | - Kathryn S Burch
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA.
| | - Arunabha Majumdar
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Huwenbo Shi
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Nicholas Mancuso
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Biostatistics Division, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Yue Wu
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Sriram Sankararaman
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Bogdan Pasaniuc
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA.
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
| |
Collapse
|
46
|
Oliynyk RT. Evaluating the Potential of Younger Cases and Older Controls Cohorts to Improve Discovery Power in Genome-Wide Association Studies of Late-Onset Diseases. J Pers Med 2019; 9:jpm9030038. [PMID: 31336617 PMCID: PMC6789773 DOI: 10.3390/jpm9030038] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Revised: 07/15/2019] [Accepted: 07/16/2019] [Indexed: 11/25/2022] Open
Abstract
For more than a decade, genome-wide association studies have been making steady progress in discovering the causal gene variants that contribute to late-onset human diseases. Polygenic late-onset diseases in an aging population display a risk allele frequency decrease at older ages, caused by individuals with higher polygenic risk scores becoming ill proportionately earlier and bringing about a change in the distribution of risk alleles between new cases and the as-yet-unaffected population. This phenomenon is most prominent for diseases characterized by high cumulative incidence and high heritability, examples of which include Alzheimer’s disease, coronary artery disease, cerebral stroke, and type 2 diabetes, while for late-onset diseases with relatively lower prevalence and heritability, exemplified by cancers, the effect is significantly lower. In this research, computer simulations have demonstrated that genome-wide association studies of late-onset polygenic diseases showing high cumulative incidence together with high initial heritability will benefit from using the youngest possible age-matched cohorts. Moreover, rather than using age-matched cohorts, study cohorts combining the youngest possible cases with the oldest possible controls may significantly improve the discovery power of genome-wide association studies.
Collapse
Affiliation(s)
- Roman Teo Oliynyk
- Centre for Computational Evolution, University of Auckland, Auckland 1010, New Zealand.
- Department of Computer Science, University of Auckland, Auckland 1010, New Zealand.
| |
Collapse
|
47
|
Oliynyk RT. Quantifying the Potential for Future Gene Therapy to Lower Lifetime Risk of Polygenic Late-Onset Diseases. Int J Mol Sci 2019; 20:ijms20133352. [PMID: 31288412 DOI: 10.1101/390773] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 07/05/2019] [Accepted: 07/05/2019] [Indexed: 05/26/2023] Open
Abstract
Gene therapy techniques and genetic knowledge may sufficiently advance, within the next few decades, to support prophylactic gene therapy for the prevention of polygenic late-onset diseases. The risk of these diseases may, hypothetically, be lowered by correcting the effects of a subset of common low effect gene variants. In this paper, simulations show that if such gene therapy were to become technically possible; and if the incidences of the treated diseases follow the proportional hazards model with a multiplicative genetic architecture composed of a sufficient number of common small effect gene variants, then: (a) late-onset diseases with the highest familial heritability will have the largest number of variants available for editing; (b) diseases that currently have the highest lifetime risk, particularly those with the highest incidence rate continuing into older ages, will prove the most challenging cases in lowering lifetime risk and delaying the age of onset at a population-wide level; (c) diseases that are characterized by the lowest lifetime risk will show the strongest and longest-lasting response to such therapies; and (d) longer life expectancy is associated with a higher lifetime risk of these diseases, and this tendency, while delayed, will continue after therapy.
Collapse
Affiliation(s)
- Roman Teo Oliynyk
- Centre for Computational Evolution, University of Auckland, Auckland 1010, New Zealand.
- Department of Computer Science, University of Auckland, Auckland 1010, New Zealand.
| |
Collapse
|
48
|
Oliynyk RT. Quantifying the Potential for Future Gene Therapy to Lower Lifetime Risk of Polygenic Late-Onset Diseases. Int J Mol Sci 2019; 20:E3352. [PMID: 31288412 PMCID: PMC6651814 DOI: 10.3390/ijms20133352] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 07/05/2019] [Accepted: 07/05/2019] [Indexed: 12/28/2022] Open
Abstract
Gene therapy techniques and genetic knowledge may sufficiently advance, within the next few decades, to support prophylactic gene therapy for the prevention of polygenic late-onset diseases. The risk of these diseases may, hypothetically, be lowered by correcting the effects of a subset of common low effect gene variants. In this paper, simulations show that if such gene therapy were to become technically possible; and if the incidences of the treated diseases follow the proportional hazards model with a multiplicative genetic architecture composed of a sufficient number of common small effect gene variants, then: (a) late-onset diseases with the highest familial heritability will have the largest number of variants available for editing; (b) diseases that currently have the highest lifetime risk, particularly those with the highest incidence rate continuing into older ages, will prove the most challenging cases in lowering lifetime risk and delaying the age of onset at a population-wide level; (c) diseases that are characterized by the lowest lifetime risk will show the strongest and longest-lasting response to such therapies; and (d) longer life expectancy is associated with a higher lifetime risk of these diseases, and this tendency, while delayed, will continue after therapy.
Collapse
Affiliation(s)
- Roman Teo Oliynyk
- Centre for Computational Evolution, University of Auckland, Auckland 1010, New Zealand.
- Department of Computer Science, University of Auckland, Auckland 1010, New Zealand.
| |
Collapse
|
49
|
López-Cortegano E, Caballero A. Inferring the Nature of Missing Heritability in Human Traits Using Data from the GWAS Catalog. Genetics 2019; 212:891-904. [PMID: 31123044 PMCID: PMC6614893 DOI: 10.1534/genetics.119.302077] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Accepted: 05/11/2019] [Indexed: 02/07/2023] Open
Abstract
Thousands of genes responsible for many diseases and other common traits in humans have been detected by Genome Wide Association Studies (GWAS) in the last decade. However, candidate causal variants found so far usually explain only a small fraction of the heritability estimated by family data. The most common explanation for this observation is that the missing heritability corresponds to variants, either rare or common, with very small effect, which pass undetected due to a lack of statistical power. We carried out a meta-analysis using data from the NHGRI-EBI GWAS Catalog in order to explore the observed distribution of locus effects for a set of 42 complex traits and to quantify their contribution to narrow-sense heritability. With the data at hand, we were able to predict the expected distribution of locus effects for 16 traits and diseases, their expected contribution to heritability, and the missing number of loci yet to be discovered to fully explain the familial heritability estimates. Our results indicate that, for 6 out of the 16 traits, the additive contribution of a great number of loci is unable to explain the familial (broad-sense) heritability, suggesting that the gap between GWAS and familial estimates of heritability may not ever be closed for these traits. In contrast, for the other 10 traits, the additive contribution of hundreds or thousands of loci yet to be found could potentially explain the familial heritability estimates, if this were the case. Computer simulations are used to illustrate the possible contribution from nonadditive genetic effects to the gap between GWAS and familial estimates of heritability.
Collapse
Affiliation(s)
| | - Armando Caballero
- Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310, Spain
| |
Collapse
|
50
|
Castellano D, James J, Eyre-Walker A. Nearly Neutral Evolution across the Drosophila melanogaster Genome. Mol Biol Evol 2019; 35:2685-2694. [PMID: 30418639 DOI: 10.1093/molbev/msy164] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Under the nearly neutral theory of molecular evolution, the proportion of effectively neutral mutations is expected to depend upon the effective population size (Ne). Here, we investigate whether this is the case across the genome of Drosophila melanogaster using polymorphism data from North American and African lines. We show that the ratio of the number of nonsynonymous and synonymous polymorphisms is negatively correlated to the number of synonymous polymorphisms, even when the nonindependence is accounted for. The relationship is such that the proportion of effectively neutral nonsynonymous mutations increases by ∼45% as Ne is halved. However, we also show that this relationship is steeper than expected from an independent estimate of the distribution of fitness effects from the site frequency spectrum. We investigate a number of potential explanations for this and show, using simulation, that this is consistent with a model of genetic hitchhiking: Genetic hitchhiking depresses diversity at neutral and weakly selected sites, but has little effect on the diversity of strongly selected sites.
Collapse
Affiliation(s)
- David Castellano
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Jennifer James
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Adam Eyre-Walker
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| |
Collapse
|