1
|
Saferali A, Kim W, Chase RP, Vollmers C, Silverman EK, Cho MH, Castaldi PJ, Hersh CP. Overlap between COPD genetic association results and transcriptional quantitative trait loci. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.07.08.24310079. [PMID: 39040180 PMCID: PMC11261918 DOI: 10.1101/2024.07.08.24310079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/24/2024]
Abstract
Rationale Genome-wide association studies (GWAS) have identified multiple genetic loci associated with chronic obstructive pulmonary disease (COPD). When integrated with GWAS results, expression quantitative trait locus (eQTL) studies can provide insight into biological mechanisms involved in disease by identifying single nucleotide polymorphisms (SNPs) that contribute to whole gene expression. However, there are multiple genetically driven regulatory and isoform-specific effects which cannot be detected in traditional eQTL analyses. Here, we identify SNPs that are associated with alternative splicing (sQTL) in addition to eQTLs to identify novel functions for COPD associated genetic variants. Methods We performed RNA sequencing on whole blood from 3743 subjects in the COPDGene Study. RNA sequencing data from lung tissue of 1241 subjects from the Lung Tissue Research Consortium (LTRC), and whole genome sequencing data on all subjects. Associations between all SNPs within 1000 kb of a gene (cis-) and splice and gene expression quantifications were tested using tensorQTL. In COPDGene a total of 11,869,333 SNPs were tested for association with 58,318 splice clusters, and 8,792,206 SNPs were tested for association with 70,094 splice clusters in LTRC. We assessed colocalization with COPD-associated SNPs from a published GWAS[1]. Results After adjustment for multiple statistical testing, we identified 28,110 splice-sites corresponding to 3,889 unique genes that were significantly associated with genotype in COPDGene whole blood, and 58,258 splice-sites corresponding to 10,307 unique genes associated with genotype in LTRC lung tissue. We found 7,576 sQTL splice-sites corresponding to 2,110 sQTL genes were shared between whole blood and lung, while 20,534 sQTL splice-sites in 3,518 genes were unique to blood and 50,682 splice-sites in 9,677 genes were unique to lung. To determine what proportion of COPD-associated SNPs were associated with transcriptional splicing, we performed colocalization analysis between COPD GWAS and sQTL data, and found that 38 genomic windows, corresponding to 38 COPD GWAS loci had evidence of colocalization between QTLs and COPD. The top five colocalizations between COPD and lung sQTLs include NPNT , FBXO38 , HHIP , NTN4 and BTC . Conclusions A total of 38 COPD GWAS loci contain evidence of sQTLs, suggesting that analysis of sQTLs in whole blood and lung tissue can provide novel insights into disease mechanisms.
Collapse
|
2
|
Kunkel D, Sørensen P, Shankar V, Morgante F. Improving polygenic prediction from summary data by learning patterns of effect sharing across multiple phenotypes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.06.592745. [PMID: 38766136 PMCID: PMC11100663 DOI: 10.1101/2024.05.06.592745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Polygenic prediction of complex trait phenotypes has become important in human genetics, especially in the context of precision medicine. Recently, Morgante et al. introduced mr.mash, a flexible and computationally efficient method that models multiple phenotypes jointly and leverages sharing of effects across such phenotypes to improve prediction accuracy. However, a drawback of mr.mash is that it requires individual-level data, which are often not publicly available. In this work, we introduce mr.mash-rss, an extension of the mr.mash model that requires only summary statistics from Genome-Wide Association Studies (GWAS) and linkage disequilibrium (LD) estimates from a reference panel. By using summary data, we achieve the twin goal of increasing the applicability of the mr.mash model to data sets that are not publicly available and making it scalable to biobank-size data. Through simulations, we show that mr.mash-rss is competitive with, and often outperforms, current state-of-the-art methods for single- and multi-phenotype polygenic prediction in a variety of scenarios that differ in the pattern of effect sharing across phenotypes, the number of phenotypes, the number of causal variants, and the genomic heritability. We also present a real data analysis of 16 blood cell phenotypes in UK Biobank, showing that mr.mash-rss achieves higher prediction accuracy than competing methods for the majority of traits, especially when the data has smaller sample size.
Collapse
Affiliation(s)
- Deborah Kunkel
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, SC, United States of America
| | - Peter Sørensen
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| | - Vijay Shankar
- Center for Human Genetics, Clemson University, Greenwood, SC, United States of America
| | - Fabio Morgante
- Center for Human Genetics, Clemson University, Greenwood, SC, United States of America
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, United States of America
| |
Collapse
|
3
|
Lu Z, Wang X, Carr M, Kim A, Gazal S, Mohammadi P, Wu L, Gusev A, Pirruccello J, Kachuri L, Mancuso N. Improved multi-ancestry fine-mapping identifies cis-regulatory variants underlying molecular traits and disease risk. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.15.24305836. [PMID: 38699369 PMCID: PMC11065034 DOI: 10.1101/2024.04.15.24305836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
Multi-ancestry statistical fine-mapping of cis-molecular quantitative trait loci (cis-molQTL) aims to improve the precision of distinguishing causal cis-molQTLs from tagging variants. However, existing approaches fail to reflect shared genetic architectures. To solve this limitation, we present the Sum of Shared Single Effects (SuShiE) model, which leverages LD heterogeneity to improve fine-mapping precision, infer cross-ancestry effect size correlations, and estimate ancestry-specific expression prediction weights. We apply SuShiE to mRNA expression measured in PBMCs (n=956) and LCLs (n=814) together with plasma protein levels (n=854) from individuals of diverse ancestries in the TOPMed MESA and GENOA studies. We find SuShiE fine-maps cis-molQTLs for 16% more genes compared with baselines while prioritizing fewer variants with greater functional enrichment. SuShiE infers highly consistent cis-molQTL architectures across ancestries on average; however, we also find evidence of heterogeneity at genes with predicted loss-of-function intolerance, suggesting that environmental interactions may partially explain differences in cis-molQTL effect sizes across ancestries. Lastly, we leverage estimated cis-molQTL effect-sizes to perform individual-level TWAS and PWAS on six white blood cell-related traits in AOU Biobank individuals (n=86k), and identify 44 more genes compared with baselines, further highlighting its benefits in identifying genes relevant for complex disease risk. Overall, SuShiE provides new insights into the cis-genetic architecture of molecular traits.
Collapse
Affiliation(s)
- Zeyun Lu
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Xinran Wang
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Matthew Carr
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Artem Kim
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Steven Gazal
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA
| | - Pejman Mohammadi
- Center for Immunity and Immunotherapies, Seattle Children’s Research Institute, Seattle, WA, USA
- Department of Pediatrics, University of Washington School of Medicine, Seattle, WA, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Lang Wu
- Cancer Epidemiology Division, Population Sciences in the Pacific Program, University of Hawaiʻi Cancer Center, University of Hawaiʻi at Mānoa, Honolulu, HI, USA
| | - Alexander Gusev
- Harvard Medical School and Dana-Farber Cancer Institute, Boston, MA, USA
| | - James Pirruccello
- Division of Cardiology, University of California San Francisco, San Francisco, CA, USA
| | - Linda Kachuri
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Nicholas Mancuso
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA
| |
Collapse
|
4
|
Conery M, Pippin JA, Wagley Y, Trang K, Pahl MC, Villani DA, Favazzo LJ, Ackert-Bicknell CL, Zuscik MJ, Katsevich E, Wells AD, Zemel BS, Voight BF, Hankenson KD, Chesi A, Grant SF. GWAS-informed data integration and non-coding CRISPRi screen illuminate genetic etiology of bone mineral density. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.19.585778. [PMID: 38562830 PMCID: PMC10983984 DOI: 10.1101/2024.03.19.585778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Over 1,100 independent signals have been identified with genome-wide association studies (GWAS) for bone mineral density (BMD), a key risk factor for mortality-increasing fragility fractures; however, the effector gene(s) for most remain unknown. Informed by a variant-to-gene mapping strategy implicating 89 non-coding elements predicted to regulate osteoblast gene expression at BMD GWAS loci, we executed a single-cell CRISPRi screen in human fetal osteoblast 1.19 cells (hFOBs). The BMD relevance of hFOBs was supported by heritability enrichment from cross-cell type stratified LD-score regression involving 98 cell types grouped into 15 tissues. 24 genes showed perturbation in the screen, with four (ARID5B, CC2D1B, EIF4G2, and NCOA3) exhibiting consistent effects upon siRNA knockdown on three measures of osteoblast maturation and mineralization. Lastly, additional heritability enrichments, genetic correlations, and multi-trait fine-mapping revealed that many BMD GWAS signals are pleiotropic and likely mediate their effects via non-bone tissues that warrant attention in future screens.
Collapse
Affiliation(s)
- Mitchell Conery
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - James A. Pippin
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Yadav Wagley
- Department of Orthopaedic Surgery, University of Michigan Medical School, Ann Arbor, MI 48109
| | - Khanh Trang
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Matthew C. Pahl
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - David A. Villani
- Colorado Program for Musculoskeletal Research, University of Colorado Anschutz Medical Campus, Aurora, CO
- Cell Biology, Stems Cells and Development Ph.D. Program, University of Colorado Anschutz Medical Campus, Aurora, CO
| | - Lacey J. Favazzo
- Colorado Program for Musculoskeletal Research, University of Colorado Anschutz Medical Campus, Aurora, CO
- Department of Orthopedics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States
- University of Colorado Interdisciplinary Joint Biology Program, University of Colorado Anschutz Medical Campus, Aurora, CO
| | - Cheryl L. Ackert-Bicknell
- Colorado Program for Musculoskeletal Research, University of Colorado Anschutz Medical Campus, Aurora, CO
- Department of Orthopedics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States
- University of Colorado Interdisciplinary Joint Biology Program, University of Colorado Anschutz Medical Campus, Aurora, CO
| | - Michael J. Zuscik
- Colorado Program for Musculoskeletal Research, University of Colorado Anschutz Medical Campus, Aurora, CO
- Department of Orthopedics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States
- University of Colorado Interdisciplinary Joint Biology Program, University of Colorado Anschutz Medical Campus, Aurora, CO
| | - Eugene Katsevich
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Andrew D. Wells
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Babette S. Zemel
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Division of Gastroenterology, Hepatology and Nutrition, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Benjamin F. Voight
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Institute of Diabetes, Obesity and Metabolism, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Kurt D. Hankenson
- Department of Orthopaedic Surgery, University of Michigan Medical School, Ann Arbor, MI 48109
| | - Alessandra Chesi
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Struan F.A. Grant
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Institute of Diabetes, Obesity and Metabolism, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Division of Endocrinology and Diabetes, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| |
Collapse
|
5
|
Gao B, Zhou X. MESuSiE enables scalable and powerful multi-ancestry fine-mapping of causal variants in genome-wide association studies. Nat Genet 2024; 56:170-179. [PMID: 38168930 DOI: 10.1038/s41588-023-01604-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 10/30/2023] [Indexed: 01/05/2024]
Abstract
Fine-mapping in genome-wide association studies attempts to identify causal SNPs from a set of candidate SNPs in a local genomic region of interest and is commonly performed in one genetic ancestry at a time. Here, we present multi-ancestry sum of the single effects model (MESuSiE), a probabilistic multi-ancestry fine-mapping method, to improve the accuracy and resolution of fine-mapping by leveraging association information across ancestries. MESuSiE uses summary statistics as input, accounts for the diverse linkage disequilibrium pattern observed in different ancestries, explicitly models both shared and ancestry-specific causal SNPs, and relies on a variational inference algorithm for scalable computation. We evaluated the performance of MESuSiE through comprehensive simulations and multi-ancestry fine-mapping of four lipid traits with both European and African samples. In the real data, MESuSiE improves fine-mapping resolution by 19.0% to 72.0% compared to existing approaches, is an order of magnitude faster, and captures and categorizes shared and ancestry-specific causal signals with enhanced functional enrichment.
Collapse
Affiliation(s)
- Boran Gao
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA.
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
6
|
Zhou F, Soremekun O, Chikowore T, Fatumo S, Barroso I, Morris AP, Asimit JL. Leveraging information between multiple population groups and traits improves fine-mapping resolution. Nat Commun 2023; 14:7279. [PMID: 37949886 PMCID: PMC10638399 DOI: 10.1038/s41467-023-43159-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 11/01/2023] [Indexed: 11/12/2023] Open
Abstract
Statistical fine-mapping helps to pinpoint likely causal variants underlying genetic association signals. Its resolution can be improved by (i) leveraging information between traits; and (ii) exploiting differences in linkage disequilibrium structure between diverse population groups. Using association summary statistics, MGflashfm jointly fine-maps signals from multiple traits and population groups; MGfm uses an analogous framework to analyse each trait separately. We also provide a practical approach to fine-mapping with out-of-sample reference panels. In simulation studies we show that MGflashfm and MGfm are well-calibrated and that the mean proportion of causal variants with PP > 0.80 is above 0.75 (MGflashfm) and 0.70 (MGfm). In our analysis of four lipids traits across five population groups, MGflashfm gives a median 99% credible set reduction of 10.5% over MGfm. MGflashfm and MGfm only require summary level data, making them very useful fine-mapping tools in consortia efforts where individual-level data cannot be shared.
Collapse
Affiliation(s)
- Feng Zhou
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Opeyemi Soremekun
- The African Computational Genomic (TACG) Research Group, MRC/UVRI and LSHTM, Entebbe, Uganda
| | - Tinashe Chikowore
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
- MRC/Wits Developmental Pathways for Health Research Unit, Department of Paediatrics, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Segun Fatumo
- The African Computational Genomic (TACG) Research Group, MRC/UVRI and LSHTM, Entebbe, Uganda
- Department of Non-Communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK
| | - Inês Barroso
- Exeter Centre of Excellence for Diabetes Research (EXCEED), University of Exeter Medical School, Exeter, UK
| | - Andrew P Morris
- Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, University of Manchester, Manchester, UK
| | | |
Collapse
|