1
|
Taylor DJ, Chhetri SB, Tassia MG, Biddanda A, Yan SM, Wojcik GL, Battle A, McCoy RC. Sources of gene expression variation in a globally diverse human cohort. Nature 2024; 632:122-130. [PMID: 39020179 PMCID: PMC11291278 DOI: 10.1038/s41586-024-07708-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 06/12/2024] [Indexed: 07/19/2024]
Abstract
Genetic variation that influences gene expression and splicing is a key source of phenotypic diversity1-5. Although invaluable, studies investigating these links in humans have been strongly biased towards participants of European ancestries, which constrains generalizability and hinders evolutionary research. Here to address these limitations, we developed MAGE, an open-access RNA sequencing dataset of lymphoblastoid cell lines from 731 individuals from the 1000 Genomes Project6, spread across 5 continental groups and 26 populations. Most variation in gene expression (92%) and splicing (95%) was distributed within versus between populations, which mirrored the variation in DNA sequence. We mapped associations between genetic variants and expression and splicing of nearby genes (cis-expression quantitative trait loci (eQTLs) and cis-splicing QTLs (sQTLs), respectively). We identified more than 15,000 putatively causal eQTLs and more than 16,000 putatively causal sQTLs that are enriched for relevant epigenomic signatures. These include 1,310 eQTLs and 1,657 sQTLs that are largely private to underrepresented populations. Our data further indicate that the magnitude and direction of causal eQTL effects are highly consistent across populations. Moreover, the apparent 'population-specific' effects observed in previous studies were largely driven by low resolution or additional independent eQTLs of the same genes that were not detected. Together, our study expands our understanding of human gene expression diversity and provides an inclusive resource for studying the evolution and function of human genomes.
Collapse
Affiliation(s)
- Dylan J Taylor
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Surya B Chhetri
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Michael G Tassia
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Arjun Biddanda
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Stephanie M Yan
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Genevieve L Wojcik
- Department of Epidemiology, Johns Hopkins University, Baltimore, MD, USA
| | - Alexis Battle
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, USA
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
2
|
Taylor DJ, Chhetri SB, Tassia MG, Biddanda A, Battle A, McCoy RC. Sources of gene expression variation in a globally diverse human cohort. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.04.565639. [PMID: 37965206 PMCID: PMC10635147 DOI: 10.1101/2023.11.04.565639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2023]
Abstract
Genetic variation influencing gene expression and splicing is a key source of phenotypic diversity. Though invaluable, studies investigating these links in humans have been strongly biased toward participants of European ancestries, diminishing generalizability and hindering evolutionary research. To address these limitations, we developed MAGE, an open-access RNA-seq data set of lymphoblastoid cell lines from 731 individuals from the 1000 Genomes Project spread across 5 continental groups and 26 populations. Most variation in gene expression (92%) and splicing (95%) was distributed within versus between populations, mirroring variation in DNA sequence. We mapped associations between genetic variants and expression and splicing of nearby genes (cis-eQTLs and cis-sQTLs, respective), identifying >15,000 putatively causal eQTLs and >16,000 putatively causal sQTLs that are enriched for relevant epigenomic signatures. These include 1310 eQTLs and 1657 sQTLs that are largely private to previously underrepresented populations. Our data further indicate that the magnitude and direction of causal eQTL effects are highly consistent across populations and that apparent "population-specific" effects observed in previous studies were largely driven by low resolution or additional independent eQTLs of the same genes that were not detected. Together, our study expands understanding of gene expression diversity across human populations and provides an inclusive resource for studying the evolution and function of human genomes.
Collapse
Affiliation(s)
- Dylan J. Taylor
- Department of Biology, Johns Hopkins University, Baltimore MD, USA
| | - Surya B. Chhetri
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore MD, USA
| | | | - Arjun Biddanda
- Department of Biology, Johns Hopkins University, Baltimore MD, USA
| | - Alexis Battle
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore MD, USA
- Department of Genetic Medicine, Johns Hopkins University, Baltimore MD, USA
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore MD, USA
| | - Rajiv C. McCoy
- Department of Biology, Johns Hopkins University, Baltimore MD, USA
| |
Collapse
|
3
|
Tang D, Freudenberg J, Dahl A. Factorizing polygenic epistasis improves prediction and uncovers biological pathways in complex traits. Am J Hum Genet 2023; 110:1875-1887. [PMID: 37922884 PMCID: PMC10645564 DOI: 10.1016/j.ajhg.2023.10.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 10/04/2023] [Accepted: 10/05/2023] [Indexed: 11/07/2023] Open
Abstract
Epistasis is central in many domains of biology, but it has not yet been proven useful for understanding the etiology of complex traits. This is partly because complex-trait epistasis involves polygenic interactions that are poorly captured in current models. To address this gap, we developed a model called Epistasis Factor Analysis (EFA). EFA assumes that polygenic epistasis can be factorized into interactions between a few epistasis factors (EFs), which represent latent polygenic components of the observed complex trait. The statistical goals of EFA are to improve polygenic prediction and to increase power to detect epistasis, while the biological goal is to unravel genetic effects into more-homogeneous units. We mathematically characterize EFA and use simulations to show that EFA outperforms current epistasis models when its assumptions approximately hold. Applied to predicting yeast growth rates, EFA outperforms the additive model for several traits with large epistasis heritability and uniformly outperforms the standard epistasis model. We replicate these prediction improvements in a second dataset. We then apply EFA to four previously characterized traits in the UK Biobank and find statistically significant epistasis in all four, including two that are robust to scale transformation. Moreover, we find that the inferred EFs partly recover pre-defined biological pathways for two of the traits. Our results demonstrate that more realistic models can identify biologically and statistically meaningful epistasis in complex traits, indicating that epistasis has potential for precision medicine and characterizing the biology underlying GWAS results.
Collapse
Affiliation(s)
- David Tang
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA; Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA, USA.
| | - Jerome Freudenberg
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA; Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Andy Dahl
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
4
|
Lin M, Park DS, Zaitlen NA, Henn BM, Gignoux CR. Admixed Populations Improve Power for Variant Discovery and Portability in Genome-Wide Association Studies. Front Genet 2021; 12:673167. [PMID: 34108994 PMCID: PMC8181458 DOI: 10.3389/fgene.2021.673167] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Accepted: 04/27/2021] [Indexed: 11/13/2022] Open
Abstract
Genome-wide association studies (GWAS) are primarily conducted in single-ancestry settings. The low transferability of results has limited our understanding of human genetic architecture across a range of complex traits. In contrast to homogeneous populations, admixed populations provide an opportunity to capture genetic architecture contributed from multiple source populations and thus improve statistical power. Here, we provide a mechanistic simulation framework to investigate the statistical power and transferability of GWAS under directional polygenic selection or varying divergence. We focus on a two-way admixed population and show that GWAS in admixed populations can be enriched for power in discovery by up to 2-fold compared to the ancestral populations under similar sample size. Moreover, higher accuracy of cross-population polygenic score estimates is also observed if variants and weights are trained in the admixed group rather than in the ancestral groups. Common variant associations are also more likely to replicate if first discovered in the admixed group and then transferred to an ancestral population, than the other way around (across 50 iterations with 1,000 causal SNPs, training on 10,000 individuals, testing on 1,000 in each population, p = 3.78e-6, 6.19e-101, ∼0 for FST = 0.2, 0.5, 0.8, respectively). While some of these FST values may appear extreme, we demonstrate that they are found across the entire phenome in the GWAS catalog. This framework demonstrates that investigation of admixed populations harbors significant advantages over GWAS in single-ancestry cohorts for uncovering the genetic architecture of traits and will improve downstream applications such as personalized medicine across diverse populations.
Collapse
Affiliation(s)
- Meng Lin
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Danny S Park
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, San Francisco, CA, United States
| | - Noah A Zaitlen
- Department of Neurology and Computational Medicine, University of California, Los Angeles, Los Angeles, CA, United States
| | - Brenna M Henn
- Department of Anthropology, Center for Population Biology and the Genome Center, University of California, Davis, Davis, CA, United States
| | - Christopher R Gignoux
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| |
Collapse
|
5
|
Sheppard B, Rappoport N, Loh PR, Sanders SJ, Zaitlen N, Dahl A. A model and test for coordinated polygenic epistasis in complex traits. Proc Natl Acad Sci U S A 2021; 118:e1922305118. [PMID: 33833052 PMCID: PMC8053945 DOI: 10.1073/pnas.1922305118] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Interactions between genetic variants-epistasis-is pervasive in model systems and can profoundly impact evolutionary adaption, population disease dynamics, genetic mapping, and precision medicine efforts. In this work, we develop a model for structured polygenic epistasis, called coordinated epistasis (CE), and prove that several recent theories of genetic architecture fall under the formal umbrella of CE. Unlike standard epistasis models that assume epistasis and main effects are independent, CE captures systematic correlations between epistasis and main effects that result from pathway-level epistasis, on balance skewing the penetrance of genetic effects. To test for the existence of CE, we propose the even-odd (EO) test and prove it is calibrated in a range of realistic biological models. Applying the EO test in the UK Biobank, we find evidence of CE in 18 of 26 traits spanning disease, anthropometric, and blood categories. Finally, we extend the EO test to tissue-specific enrichment and identify several plausible tissue-trait pairs. Overall, CE is a dimension of genetic architecture that can capture structured, systemic forms of epistasis in complex human traits.
Collapse
Affiliation(s)
- Brooke Sheppard
- Department of Psychiatry and Behavioral Sciences, Weill Institute for Neurosciences, University of California San Francisco, San Francisco, CA 94143
| | - Nadav Rappoport
- Department of Psychiatry and Behavioral Sciences, Weill Institute for Neurosciences, University of California San Francisco, San Francisco, CA 94143
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA 94143
| | - Po-Ru Loh
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115
| | - Stephan J Sanders
- Department of Psychiatry and Behavioral Sciences, Weill Institute for Neurosciences, University of California San Francisco, San Francisco, CA 94143
| | - Noah Zaitlen
- Department of Psychiatry and Behavioral Sciences, Weill Institute for Neurosciences, University of California San Francisco, San Francisco, CA 94143;
- Department of Neurology, University of California Los Angeles, Los Angeles, CA 90095
- Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA 90095
| | - Andy Dahl
- Department of Neurology, University of California Los Angeles, Los Angeles, CA 90095;
- Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA 90095
- Section of Genetic Medicine, University of Chicago, Chicago, IL 60637
| |
Collapse
|