1
|
Yu X, Hu X, Wan X, Zhang Z, Wan X, Cai M, Yu T, Xiao J. A unified framework for cell-type-specific eQTL prioritization by integrating bulk and scRNA-seq data. Am J Hum Genet 2025; 112:332-352. [PMID: 39824189 DOI: 10.1016/j.ajhg.2024.12.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 12/17/2024] [Accepted: 12/18/2024] [Indexed: 01/20/2025] Open
Abstract
Genome-wide association studies (GWASs) have identified numerous genetic variants associated with complex traits, yet the biological interpretation remains challenging, especially for variants in non-coding regions. Expression quantitative trait locus (eQTL) studies have linked these variations to gene expression, aiding in identifying genes involved in disease mechanisms. Traditional eQTL analyses using bulk RNA sequencing (bulk RNA-seq) provide tissue-level insights but suffer from signal loss and distortion due to unaddressed cellular heterogeneity. Recently, single-cell RNA-seq (scRNA-seq) has provided higher resolution, enabling cell-type-specific eQTL (ct-eQTL) analyses. However, these studies are limited by their smaller sample sizes and technical constraints. In this paper, we present a statistical framework, IBSEP, which integrates bulk RNA-seq and scRNA-seq data for enhanced ct-eQTL prioritization. Our method employs a hierarchical linear model to combine summary statistics from both data types, overcoming the limitations while leveraging the advantages associated with each technique. Through extensive simulations and real data analyses, including peripheral blood mononuclear cells and brain cortex datasets, IBSEP demonstrated superior performance in identifying ct-eQTLs compared to existing methods. Our approach unveils transcriptional regulatory mechanisms specific to cell types, offering deeper insights into the genetic basis of complex diseases at a cellular resolution.
Collapse
Affiliation(s)
- Xinyi Yu
- Shenzhen Research Institute of Big Data, Shenzhen 518172, China; School of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Shenzhen 518172, China
| | - Xianghong Hu
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Xiaomeng Wan
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Zhiyong Zhang
- Shenzhen Research Institute of Big Data, Shenzhen 518172, China; School of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Shenzhen 518172, China
| | - Xiang Wan
- Shenzhen Research Institute of Big Data, Shenzhen 518172, China
| | - Mingxuan Cai
- Department of Biostatistics, City University of Hong Kong, Hong Kong SAR, China
| | - Tianwei Yu
- Shenzhen Research Institute of Big Data, Shenzhen 518172, China; School of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Shenzhen 518172, China.
| | - Jiashun Xiao
- Shenzhen Research Institute of Big Data, Shenzhen 518172, China.
| |
Collapse
|
2
|
Chignon A, Lettre G. Using omics data and genome editing methods to decipher GWAS loci associated with coronary artery disease. Atherosclerosis 2025; 401:118621. [PMID: 39909615 DOI: 10.1016/j.atherosclerosis.2024.118621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 09/18/2024] [Accepted: 10/03/2024] [Indexed: 02/07/2025]
Abstract
Coronary artery disease (CAD) is due to atherosclerosis, a pathophysiological process that involves several cell-types and results in the accumulation of lipid-rich plaque that disrupt the normal blood flow through the coronary arteries to the heart. Genome-wide association studies have identified 1000s of genetic variants robustly associated with CAD or its traditional risk factors (e.g. blood pressure, blood lipids, type 2 diabetes, smoking). However, gaining biological insights from these genetic discoveries remain challenging because of linkage disequilibrium and the difficulty to interpret the functions of non-coding regulatory elements in the human genome. In this review, we present different statistical methods (e.g. Mendelian randomization) and molecular datasets (e.g. expression or protein quantitative trait loci) that have helped connect CAD-associated variants with genes, biological pathways, and cell-types or tissues. We emphasize that these various strategies make predictions, which need to be validated in orthologous systems. We discuss specific examples where the integration of omics data with GWAS results has prioritized causal CAD variants and genes. Finally, we review how targeted and genome-wide genome editing experiments using the CRISPR/Cas9 toolbox have been used to characterize new CAD genes in human cells. Researchers now have the statistical and bioinformatic methods, the molecular datasets, and the experimental tools to dissect comprehensively the loci that contribute to CAD risk in humans.
Collapse
Affiliation(s)
- Arnaud Chignon
- Montreal Heart Institute, Montreal, Quebec, Canada; Faculté de Médecine, Université de Montréal, Montreal, Quebec, Canada
| | - Guillaume Lettre
- Montreal Heart Institute, Montreal, Quebec, Canada; Faculté de Médecine, Université de Montréal, Montreal, Quebec, Canada.
| |
Collapse
|
3
|
Jee YH, Wang Y, Jung KJ, Lee JY, Kimm H, Duan R, Price AL, Martin AR, Kraft P. Genome-wide association studies in a large Korean cohort identify novel quantitative trait loci for 36 traits and illuminate their genetic architectures. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2024.05.17.24307550. [PMID: 38798434 PMCID: PMC11118625 DOI: 10.1101/2024.05.17.24307550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Genome-wide association studies (GWAS) have been predominantly conducted in populations of European ancestry, limiting opportunities for biological discovery in diverse populations. We report GWAS findings from 153,950 individuals across 36 quantitative traits in the Korean Cancer Prevention Study-II (KCPS2) Biobank. We discovered 301 novel genetic loci in KCPS2, including an association between thyroid-stimulating hormone and CD36. Meta-analysis with the Korean Genome and Epidemiology Study, Biobank Japan, Taiwan Biobank, and UK Biobank identified 4,588 loci that were not significant in any contributing GWAS. We describe differences in genetic architectures across these East Asian and European samples. We also highlight East Asian specific associations, including a known pleiotropic missense variant in ALDH2, which fine-mapping identified as a likely causal variant for a diverse set of traits. Our findings provide insights into the genetic architecture of complex traits in East Asian populations and highlight how broadening the population diversity of GWAS samples can aid discovery.
Collapse
Affiliation(s)
- Yon Ho Jee
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Ying Wang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Keum Ji Jung
- Institute for Health Promotion, Department of Epidemiology and Health Promotion, Graduate School of Public Health, Yonsei University, Seoul, Korea
| | - Ji-Young Lee
- Institute for Health Promotion, Department of Epidemiology and Health Promotion, Graduate School of Public Health, Yonsei University, Seoul, Korea
| | - Heejin Kimm
- Institute for Health Promotion, Department of Epidemiology and Health Promotion, Graduate School of Public Health, Yonsei University, Seoul, Korea
| | - Rui Duan
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Alkes L. Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Alicia R. Martin
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Peter Kraft
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Transdivisional Research Program, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, MD, USA
| |
Collapse
|
4
|
Sens D, Shilova L, Gräf L, Grebenshchikova M, Eskofier BM, Casale FP. Genetics-driven risk predictions leveraging the Mendelian randomization framework. Genome Res 2024; 34:1276-1285. [PMID: 39332904 PMCID: PMC11529896 DOI: 10.1101/gr.279252.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 09/03/2024] [Indexed: 09/29/2024]
Abstract
Accurate predictive models of future disease onset are crucial for effective preventive healthcare, yet longitudinal data sets linking early risk factors to subsequent health outcomes are limited. To overcome this challenge, we introduce a novel framework, Predictive Risk modeling using Mendelian Randomization (PRiMeR), which utilizes genetic effects as supervisory signals to learn disease risk predictors without relying on longitudinal data. To do so, PRiMeR leverages risk factors and genetic data from a healthy cohort, along with results from genome-wide association studies of diseases of interest. After training, the learned predictor can be used to assess risk for new patients solely based on risk factors. We validate PRiMeR through comprehensive simulations and in future type 2 diabetes predictions in UK Biobank participants without diabetes, using follow-up onset labels for validation. Moreover, we apply PRiMeR to predict future Alzheimer's disease onset from brain imaging biomarkers and future Parkinson's disease onset from accelerometer-derived traits. Overall, with PRiMeR we offer a new perspective in predictive modeling, showing it is possible to learn risk predictors leveraging genetics rather than longitudinal data.
Collapse
Affiliation(s)
- Daniel Sens
- Institute of AI for Health, Helmholtz Zentrum München-German Research Center for Environmental Health, 85764 Neuherberg, Germany
- Helmholtz Pioneer Campus, Helmholtz Zentrum München-German Research Center for Environmental Health, 85764 Neuherberg, Germany
| | - Liubov Shilova
- Institute of AI for Health, Helmholtz Zentrum München-German Research Center for Environmental Health, 85764 Neuherberg, Germany
- Helmholtz Pioneer Campus, Helmholtz Zentrum München-German Research Center for Environmental Health, 85764 Neuherberg, Germany
- Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Ludwig Gräf
- Institute of AI for Health, Helmholtz Zentrum München-German Research Center for Environmental Health, 85764 Neuherberg, Germany
- School of Computation, Information and Technology, Technical University of Munich, 85748 Garching, Germany
| | - Maria Grebenshchikova
- Institute of AI for Health, Helmholtz Zentrum München-German Research Center for Environmental Health, 85764 Neuherberg, Germany
- School of Management, Technical University of Munich, 80333 Munich, Germany
| | - Bjoern M Eskofier
- Institute of AI for Health, Helmholtz Zentrum München-German Research Center for Environmental Health, 85764 Neuherberg, Germany
- Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Francesco Paolo Casale
- Institute of AI for Health, Helmholtz Zentrum München-German Research Center for Environmental Health, 85764 Neuherberg, Germany;
- Helmholtz Pioneer Campus, Helmholtz Zentrum München-German Research Center for Environmental Health, 85764 Neuherberg, Germany
- School of Computation, Information and Technology, Technical University of Munich, 85748 Garching, Germany
| |
Collapse
|
5
|
Kontou PI, Bagos PG. The goldmine of GWAS summary statistics: a systematic review of methods and tools. BioData Min 2024; 17:31. [PMID: 39238044 PMCID: PMC11375927 DOI: 10.1186/s13040-024-00385-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 08/27/2024] [Indexed: 09/07/2024] Open
Abstract
Genome-wide association studies (GWAS) have revolutionized our understanding of the genetic architecture of complex traits and diseases. GWAS summary statistics have become essential tools for various genetic analyses, including meta-analysis, fine-mapping, and risk prediction. However, the increasing number of GWAS summary statistics and the diversity of software tools available for their analysis can make it challenging for researchers to select the most appropriate tools for their specific needs. This systematic review aims to provide a comprehensive overview of the currently available software tools and databases for GWAS summary statistics analysis. We conducted a comprehensive literature search to identify relevant software tools and databases. We categorized the tools and databases by their functionality, including data management, quality control, single-trait analysis, and multiple-trait analysis. We also compared the tools and databases based on their features, limitations, and user-friendliness. Our review identified a total of 305 functioning software tools and databases dedicated to GWAS summary statistics, each with unique strengths and limitations. We provide descriptions of the key features of each tool and database, including their input/output formats, data types, and computational requirements. We also discuss the overall usability and applicability of each tool for different research scenarios. This comprehensive review will serve as a valuable resource for researchers who are interested in using GWAS summary statistics to investigate the genetic basis of complex traits and diseases. By providing a detailed overview of the available tools and databases, we aim to facilitate informed tool selection and maximize the effectiveness of GWAS summary statistics analysis.
Collapse
Affiliation(s)
| | - Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131, Lamia, Greece.
| |
Collapse
|
6
|
Linna-Kuosmanen S, Vuori M, Kiviniemi T, Palmu J, Niiranen T. Genetics, transcriptomics, metagenomics, and metabolomics in the pathogenesis and prediction of atrial fibrillation. Eur Heart J Suppl 2024; 26:iv33-iv40. [PMID: 39099578 PMCID: PMC11292413 DOI: 10.1093/eurheartjsupp/suae072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/06/2024]
Abstract
The primary cellular substrates of atrial fibrillation (AF) and the mechanisms underlying AF onset remain poorly characterized and therefore, its risk assessment lacks precision. While the use of omics may enable discovery of novel AF risk factors and narrow down the cellular pathways involved in AF pathogenesis, the work is far from complete. Large-scale genome-wide association studies and transcriptomic analyses that allow an unbiased, non-candidate-gene-based delineation of molecular changes associated with AF in humans have identified at least 150 genetic loci associated with AF. However, only few of these loci have been thoroughly mechanistically dissected, indicating that much remains to be discovered for targeted diagnostics and therapeutics. Metabolomics and metagenomics, on the other hand, add to the understanding of AF downstream of the primary substrate and integrate the signalling of environmental and host factors, respectively. These two rapidly developing fields have already provided several correlates of prevalent and incident AF that require additional validation in external cohorts and experimental studies. In this review, we take a look at the recent developments in genetics, transcriptomics, metagenomics, and metabolomics and how they may aid in improving the discovery of AF risk factors and shed light into the molecular mechanisms leading to AF onset.
Collapse
Affiliation(s)
- Suvi Linna-Kuosmanen
- A. I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Neulaniementie 2, 70211 Kuopio, Finland
| | - Matti Vuori
- Division of Medicine, Turku University Hospital, Turku, Finland
- Department of Internal Medicine, University of Turku, Turku, Finland
| | - Tuomas Kiviniemi
- Department of Internal Medicine, University of Turku, Turku, Finland
- Heart Center, Turku University Hospital, Turku, Finland
| | - Joonatan Palmu
- Department of Internal Medicine, University of Turku, Turku, Finland
| | - Teemu Niiranen
- Division of Medicine, Turku University Hospital, Turku, Finland
- Department of Internal Medicine, University of Turku, Turku, Finland
- Department of Public Health Solutions, Finnish Institute for Health and Welfare, Turku, Finland
| |
Collapse
|
7
|
Zhang W, Lu T, Sladek R, Li Y, Najafabadi H, Dupuis J. SharePro: an accurate and efficient genetic colocalization method accounting for multiple causal signals. Bioinformatics 2024; 40:btae295. [PMID: 38688586 PMCID: PMC11105950 DOI: 10.1093/bioinformatics/btae295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 04/11/2024] [Accepted: 04/29/2024] [Indexed: 05/02/2024] Open
Abstract
MOTIVATION Colocalization analysis is commonly used to assess whether two or more traits share the same genetic signals identified in genome-wide association studies (GWAS), and is important for prioritizing targets for functional follow-up of GWAS results. Existing colocalization methods can have suboptimal performance when there are multiple causal variants in one genomic locus. RESULTS We propose SharePro to extend the COLOC framework for colocalization analysis. SharePro integrates linkage disequilibrium (LD) modeling and colocalization assessment by grouping correlated variants into effect groups. With an efficient variational inference algorithm, posterior colocalization probabilities can be accurately estimated. In simulation studies, SharePro demonstrated increased power with a well-controlled false positive rate at a low computational cost. Compared to existing methods, SharePro provided stronger and more consistent colocalization evidence for known lipid-lowering drug target proteins and their corresponding lipid traits. Through an additional challenging case of the colocalization analysis of the circulating abundance of R-spondin 3 GWAS and estimated bone mineral density GWAS, we demonstrated the utility of SharePro in identifying biologically plausible colocalized signals. AVAILABILITY AND IMPLEMENTATION SharePro for colocalization analysis is written in Python and openly available at https://github.com/zhwm/SharePro_coloc.
Collapse
Affiliation(s)
- Wenmin Zhang
- Quantitative Life Sciences Program, McGill University, Montreal, Quebec H3A 1E3, Canada
- Montreal Heart Institute, Université de Montréal, Montreal, Quebec H1T 1C8, Canada
| | - Tianyuan Lu
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario M5S 1A1, Canada
| | - Robert Sladek
- Quantitative Life Sciences Program, McGill University, Montreal, Quebec H3A 1E3, Canada
- Department of Human Genetics, McGill University, Montreal, Quebec H3A 0C7, Canada
- Dahdaleh Institute of Genomic Medicine, McGill University, Montreal, Quebec H3A 0G1, Canada
| | - Yue Li
- Quantitative Life Sciences Program, McGill University, Montreal, Quebec H3A 1E3, Canada
- School of Computer Science, McGill University, Montreal, Quebec H3A 2A7, Canada
| | - Hamed Najafabadi
- Quantitative Life Sciences Program, McGill University, Montreal, Quebec H3A 1E3, Canada
- Department of Human Genetics, McGill University, Montreal, Quebec H3A 0C7, Canada
- Dahdaleh Institute of Genomic Medicine, McGill University, Montreal, Quebec H3A 0G1, Canada
| | - Josée Dupuis
- Quantitative Life Sciences Program, McGill University, Montreal, Quebec H3A 1E3, Canada
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, McGill College, QC H3A 1Y7, Canada
| |
Collapse
|
8
|
Lu Z, Wang X, Carr M, Kim A, Gazal S, Mohammadi P, Wu L, Gusev A, Pirruccello J, Kachuri L, Mancuso N. Improved multi-ancestry fine-mapping identifies cis-regulatory variants underlying molecular traits and disease risk. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.15.24305836. [PMID: 38699369 PMCID: PMC11065034 DOI: 10.1101/2024.04.15.24305836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
Multi-ancestry statistical fine-mapping of cis-molecular quantitative trait loci (cis-molQTL) aims to improve the precision of distinguishing causal cis-molQTLs from tagging variants. However, existing approaches fail to reflect shared genetic architectures. To solve this limitation, we present the Sum of Shared Single Effects (SuShiE) model, which leverages LD heterogeneity to improve fine-mapping precision, infer cross-ancestry effect size correlations, and estimate ancestry-specific expression prediction weights. We apply SuShiE to mRNA expression measured in PBMCs (n=956) and LCLs (n=814) together with plasma protein levels (n=854) from individuals of diverse ancestries in the TOPMed MESA and GENOA studies. We find SuShiE fine-maps cis-molQTLs for 16% more genes compared with baselines while prioritizing fewer variants with greater functional enrichment. SuShiE infers highly consistent cis-molQTL architectures across ancestries on average; however, we also find evidence of heterogeneity at genes with predicted loss-of-function intolerance, suggesting that environmental interactions may partially explain differences in cis-molQTL effect sizes across ancestries. Lastly, we leverage estimated cis-molQTL effect-sizes to perform individual-level TWAS and PWAS on six white blood cell-related traits in AOU Biobank individuals (n=86k), and identify 44 more genes compared with baselines, further highlighting its benefits in identifying genes relevant for complex disease risk. Overall, SuShiE provides new insights into the cis-genetic architecture of molecular traits.
Collapse
Affiliation(s)
- Zeyun Lu
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Xinran Wang
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Matthew Carr
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Artem Kim
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Steven Gazal
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA
| | - Pejman Mohammadi
- Center for Immunity and Immunotherapies, Seattle Children’s Research Institute, Seattle, WA, USA
- Department of Pediatrics, University of Washington School of Medicine, Seattle, WA, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Lang Wu
- Cancer Epidemiology Division, Population Sciences in the Pacific Program, University of Hawaiʻi Cancer Center, University of Hawaiʻi at Mānoa, Honolulu, HI, USA
| | - Alexander Gusev
- Harvard Medical School and Dana-Farber Cancer Institute, Boston, MA, USA
| | - James Pirruccello
- Division of Cardiology, University of California San Francisco, San Francisco, CA, USA
| | - Linda Kachuri
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Nicholas Mancuso
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA
| |
Collapse
|
9
|
Li X, Sham PC, Zhang YD. A Bayesian fine-mapping model using a continuous global-local shrinkage prior with applications in prostate cancer analysis. Am J Hum Genet 2024; 111:213-226. [PMID: 38171363 PMCID: PMC10870138 DOI: 10.1016/j.ajhg.2023.12.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 12/04/2023] [Accepted: 12/05/2023] [Indexed: 01/05/2024] Open
Abstract
The aim of fine mapping is to identify genetic variants causally contributing to complex traits or diseases. Existing fine-mapping methods employ Bayesian discrete mixture priors and depend on a pre-specified maximum number of causal variants, which may lead to sub-optimal solutions. In this work, we propose a Bayesian fine-mapping method called h2-D2, utilizing a continuous global-local shrinkage prior. We also present an approach to define credible sets of causal variants in continuous prior settings. Simulation studies demonstrate that h2-D2 outperforms current state-of-the-art fine-mapping methods such as SuSiE and FINEMAP in accurately identifying causal variants and estimating their effect sizes. We further applied h2-D2 to prostate cancer analysis and discovered some previously unknown causal variants. In addition, we inferred 369 target genes associated with the detected causal variants and several pathways that were significantly over-represented by these genes, shedding light on their potential roles in prostate cancer development and progression.
Collapse
Affiliation(s)
- Xiang Li
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, China
| | - Pak Chung Sham
- Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China; Centre for PanorOmic Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Yan Dora Zhang
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, China.
| |
Collapse
|