1
|
Walters K, Yaacob H. Bayesian multivariant fine mapping using the Laplace prior. Genet Epidemiol 2023; 47:249-260. [PMID: 36739616 DOI: 10.1002/gepi.22517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 01/13/2023] [Accepted: 01/24/2023] [Indexed: 02/07/2023]
Abstract
Currently, the only effect size prior that is routinely implemented in a Bayesian fine-mapping multi-single-nucleotide polymorphism (SNP) analysis is the Gaussian prior. Here, we show how the Laplace prior can be deployed in Bayesian multi-SNP fine mapping studies. We compare the ranking performance of the posterior inclusion probability (PIP) using a Laplace prior with the ranking performance of the corresponding Gaussian prior and FINEMAP. Our results indicate that, for the simulation scenarios we consider here, the Laplace prior can lead to higher PIPs than either the Gaussian prior or FINEMAP, particularly for moderately sized fine-mapping studies. The Laplace prior also appears to have better worst-case scenario properties. We reanalyse the iCOGS case-control data from the CASP8 region on Chromosome 2. Even though this study has a total sample size of nearly 90,000 individuals, there are still some differences in the top few ranked SNPs if the Laplace prior is used rather than the Gaussian prior. R code to implement the Laplace (and Gaussian) prior is available at https://github.com/Kevin-walters/lapmapr.
Collapse
Affiliation(s)
- Kevin Walters
- School of Mathematics and Statistics, University of Sheffield, Sheffield, UK
| | - Hannuun Yaacob
- School of Mathematics and Statistics, University of Sheffield, Sheffield, UK.,Department of Economics and Applied Statistics, Faculty of Business and Economics, Universiti Malaya, Kuala Lumpur, Malaysia
| |
Collapse
|
2
|
Nlebedim VU, Chaudhuri RR, Walters K. Probabilistic Identification of Bacterial Essential Genes via insertion density using TraDIS Data with Tn5 libraries. Bioinformatics 2021; 37:4343-4349. [PMID: 34255819 PMCID: PMC8652038 DOI: 10.1093/bioinformatics/btab508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 06/24/2021] [Accepted: 07/23/2021] [Indexed: 11/29/2022] Open
Abstract
Motivation Probabilistic Identification of bacterial essential genes using transposon-directed insertion-site sequencing (TraDIS) data based on Tn5 libraries has received relatively little attention in the literature; most methods are designed for mariner transposon insertions. Analysis of Tn5 transposon-based genomic data is challenging due to the high insertion density and genomic resolution. We present a novel probabilistic Bayesian approach for classifying bacterial essential genes using transposon insertion density derived from transposon insertion sequencing data. We implement a Markov chain Monte Carlo sampling procedure to estimate the posterior probability that any given gene is essential. We implement a Bayesian decision theory approach to selecting essential genes. We assess the effectiveness of our approach via analysis of both simulated data and three previously published Escherichia coli, Salmonella Typhimurium and Staphylococcus aureus datasets. These three bacteria have relatively well characterized essential genes which allows us to test our classification procedure using receiver operating characteristic curves and area under the curves. We compare the classification performance with that of Bio-Tradis, a standard tool for bacterial gene classification. Results Our method is able to classify genes in the three datasets with areas under the curves between 0.967 and 0.983. Our simulated synthetic datasets show that both the number of insertions and the extent to which insertions are tolerated in the distal regions of essential genes are both important in determining classification accuracy. Importantly our method gives the user the option of classifying essential genes based on the user-supplied costs of false discovery and false non-discovery. Availability and implementation An R package that implements the method presented in this paper is available for download from https://github.com/Kevin-walters/insdens. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Valentine U Nlebedim
- School of Mathematics and Statistics, University of Sheffield, Sheffield, S10 2TN, United Kingdom
| | - Roy R Chaudhuri
- Department of Molecular Biology and Biotechnology, University of Sheffield, Sheffield, S10 2TN, United Kingdom
| | - Kevin Walters
- School of Mathematics and Statistics, University of Sheffield, Sheffield, S10 2TN, United Kingdom
| |
Collapse
|
3
|
Walters K, Cox A, Yaacob H. The utility of the Laplace effect size prior distribution in Bayesian fine-mapping studies. Genet Epidemiol 2021; 45:386-401. [PMID: 33410201 DOI: 10.1002/gepi.22375] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Revised: 11/28/2020] [Accepted: 12/16/2020] [Indexed: 11/10/2022]
Abstract
The Gaussian distribution is usually the default causal single-nucleotide polymorphism (SNP) effect size prior in Bayesian population-based fine-mapping association studies, but a recent study showed that the heavier-tailed Laplace prior distribution provided a better fit to breast cancer top hits identified in genome-wide association studies. We investigate the utility of the Laplace prior as an effect size prior in univariate fine-mapping studies. We consider ranking SNPs using Bayes factors and other summaries of the effect size posterior distribution, the effect of prior choice on credible set size based on the posterior probability of causality, and on the noteworthiness of SNPs in univariate analyses. Across a wide range of fine-mapping scenarios the Laplace prior generally leads to larger 90% credible sets than the Gaussian prior. These larger credible sets for the Laplace prior are due to relatively high prior mass around zero which can yield many noncausal SNPs with relatively large Bayes factors. If using conventional credible sets, the Gaussian prior generally yields a better trade off between including the causal SNP with high probability and keeping the set size reasonable. Interestingly when using the less well utilised measure of noteworthiness, the Laplace prior performs well, leading to causal SNPs being declared noteworthy with high probability, whilst generally declaring fewer than 5% of noncausal SNPs as being noteworthy. In contrast, the Gaussian prior leads to the causal SNP being declared noteworthy with very low probability.
Collapse
Affiliation(s)
- Kevin Walters
- School of Mathematics and Statistics, University of Sheffield, Sheffield, UK
| | - Angela Cox
- Department of Oncology, Sheffield Cancer Research Centre, University of Sheffield Medical School, Sheffield, UK
| | - Hannuun Yaacob
- School of Mathematics and Statistics, University of Sheffield, Sheffield, UK
| |
Collapse
|
4
|
Gorlov I, Xiao X, Mayes M, Gorlova O, Amos C. SNP eQTL status and eQTL density in the adjacent region of the SNP are associated with its statistical significance in GWA studies. BMC Genet 2019; 20:85. [PMID: 31718536 PMCID: PMC6852916 DOI: 10.1186/s12863-019-0786-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Accepted: 10/18/2019] [Indexed: 01/05/2023] Open
Abstract
Background Over the relatively short history of Genome Wide Association Studies (GWASs), hundreds of GWASs have been published and thousands of disease risk-associated SNPs have been identified. Summary statistics from the conducted GWASs are often available and can be used to identify SNP features associated with the level of GWAS statistical significance. Those features could be used to select SNPs from gray zones (SNPs that are nominally significant but do not reach the genome-wide level of significance) for targeted analyses. Methods We used summary statistics from recently published breast and lung cancer and scleroderma GWASs to explore the association between the level of the GWAS statistical significance and the expression quantitative trait loci (eQTL) status of the SNP. Data from the Genotype-Tissue Expression Project (GTEx) were used to identify eQTL SNPs. Results We found that SNPs reported as eQTLs were more significant in GWAS (higher -log10p) regardless of the tissue specificity of the eQTL. Pan-tissue eQTLs (those reported as eQTLs in multiple tissues) tended to be more significant in the GWAS compared to those reported as eQTL in only one tissue type. eQTL density in the ±5 kb adjacent region of a given SNP was also positively associated with the level of GWAS statistical significance regardless of the eQTL status of the SNP. We found that SNPs located in the regions of high eQTL density were more likely to be located in regulatory elements (transcription factor or miRNA binding sites). When SNPs were stratified by the level of statistical significance, the proportion of eQTLs was positively associated with the mean level of statistical significance in the group. The association curve reaches a plateau around -log10p ≈ 5. The observed associations suggest that quasi-significant SNPs (10− 5 < p < 5 × 10− 8) and SNPs at the genome wide level of statistical significance (p < 5 × 10− 8) may have a similar proportions of risk associated SNPs. Conclusions The results of this study indicate that the SNP’s eQTL status, as well as eQTL density in the adjacent region are positively associated with the level of statistical significance of the SNP in GWAS.
Collapse
Affiliation(s)
- Ivan Gorlov
- The Geisel School of Medicine, Department of Biomedical Data Science, Dartmouth College, HB7936, One Medical Center Dr., Dartmouth-Hitchcock Medical Center, Lebanon, NH, 03756, USA.
| | - Xiangjun Xiao
- Department of Medicine, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Maureen Mayes
- Department of Internal Medicine, Division of Rheumatology, University of Texas McGovern Medical School, Houston, TX, USA
| | - Olga Gorlova
- The Geisel School of Medicine, Department of Biomedical Data Science, Dartmouth College, HB7936, One Medical Center Dr., Dartmouth-Hitchcock Medical Center, Lebanon, NH, 03756, USA
| | - Christopher Amos
- Department of Medicine, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| |
Collapse
|
5
|
Alenazi AA, Cox A, Juarez M, Lin W, Walters K. Bayesian variable selection using partially observed categorical prior information in fine‐mapping association studies. Genet Epidemiol 2019; 43:690-703. [DOI: 10.1002/gepi.22213] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Revised: 04/24/2019] [Accepted: 05/07/2019] [Indexed: 12/22/2022]
Affiliation(s)
- Abdulaziz A. Alenazi
- School of Mathematics and StatisticsUniversity of SheffieldSheffield UK
- Department of MathematicsNorthern Border UniversityArar Saudi Arabia
| | - Angela Cox
- Department of Oncology, Sheffield Cancer Research CentreUniversity of Sheffield Medical SchoolSheffield UK
| | - Miguel Juarez
- School of Mathematics and StatisticsUniversity of SheffieldSheffield UK
| | - Wei‐Yu Lin
- Department of Oncology, Sheffield Cancer Research CentreUniversity of Sheffield Medical SchoolSheffield UK
- Northern Institute for Cancer Research, Medical SchoolUniversity of NewcastleNewcastle UK
| | - Kevin Walters
- School of Mathematics and StatisticsUniversity of SheffieldSheffield UK
| |
Collapse
|
6
|
Walters K, Cox A, Yaacob H. Using GWAS top hits to inform priors in Bayesian fine‐mapping association studies. Genet Epidemiol 2019; 43:675-689. [DOI: 10.1002/gepi.22212] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Revised: 04/04/2019] [Accepted: 05/07/2019] [Indexed: 11/07/2022]
Affiliation(s)
- Kevin Walters
- School of Mathematics and StatisticsUniversity of SheffieldSheffield UK
| | - Angela Cox
- Department of Oncology, Sheffield Cancer Research CentreUniversity of Sheffield Medical SchoolSheffield UK
| | - Hannuun Yaacob
- School of Mathematics and StatisticsUniversity of SheffieldSheffield UK
| |
Collapse
|
7
|
Pereira M, Thompson JR, Weichenberger CX, Thomas DC, Minelli C. Inclusion of biological knowledge in a Bayesian shrinkage model for joint estimation of SNP effects. Genet Epidemiol 2017; 41:320-331. [PMID: 28393391 DOI: 10.1002/gepi.22038] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2016] [Revised: 12/18/2016] [Accepted: 12/26/2016] [Indexed: 01/04/2023]
Abstract
With the aim of improving detection of novel single-nucleotide polymorphisms (SNPs) in genetic association studies, we propose a method of including prior biological information in a Bayesian shrinkage model that jointly estimates SNP effects. We assume that the SNP effects follow a normal distribution centered at zero with variance controlled by a shrinkage hyperparameter. We use biological information to define the amount of shrinkage applied on the SNP effects distribution, so that the effects of SNPs with more biological support are less shrunk toward zero, thus being more likely detected. The performance of the method was tested in a simulation study (1,000 datasets, 500 subjects with ∼200 SNPs in 10 linkage disequilibrium (LD) blocks) using a continuous and a binary outcome. It was further tested in an empirical example on body mass index (continuous) and overweight (binary) in a dataset of 1,829 subjects and 2,614 SNPs from 30 blocks. Biological knowledge was retrieved using the bioinformatics tool Dintor, which queried various databases. The joint Bayesian model with inclusion of prior information outperformed the standard analysis: in the simulation study, the mean ranking of the true LD block was 2.8 for the Bayesian model versus 3.6 for the standard analysis of individual SNPs; in the empirical example, the mean ranking of the six true blocks was 8.5 versus 9.3 in the standard analysis. These results suggest that our method is more powerful than the standard analysis. We expect its performance to improve further as more biological information about SNPs becomes available.
Collapse
Affiliation(s)
- Miguel Pereira
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
| | - John R Thompson
- Department of Health Sciences, University of Leicester, Leicester, United Kingdom
| | - Christian X Weichenberger
- Center for Biomedicine, European Academy of Bolzano/Bozen (EURAC), Bolzano, Italy, Affiliated to the University of Lübeck, Lübeck, Germany
| | - Duncan C Thomas
- Biostatistics Division, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Cosetta Minelli
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
| |
Collapse
|
8
|
Spencer AV, Cox A, Lin W, Easton DF, Michailidou K, Walters K. Incorporating Functional Genomic Information in Genetic Association Studies Using an Empirical Bayes Approach. Genet Epidemiol 2016; 40:176-87. [PMID: 26833494 PMCID: PMC4832271 DOI: 10.1002/gepi.21956] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2015] [Revised: 12/04/2015] [Accepted: 12/14/2015] [Indexed: 01/01/2023]
Abstract
There is a large amount of functional genetic data available, which can be used to inform fine-mapping association studies (in diseases with well-characterised disease pathways). Single nucleotide polymorphism (SNP) prioritization via Bayes factors is attractive because prior information can inform the effect size or the prior probability of causal association. This approach requires the specification of the effect size. If the information needed to estimate a priori the probability density for the effect sizes for causal SNPs in a genomic region isn't consistent or isn't available, then specifying a prior variance for the effect sizes is challenging. We propose both an empirical method to estimate this prior variance, and a coherent approach to using SNP-level functional data, to inform the prior probability of causal association. Through simulation we show that when ranking SNPs by our empirical Bayes factor in a fine-mapping study, the causal SNP rank is generally as high or higher than the rank using Bayes factors with other plausible values of the prior variance. Importantly, we also show that assigning SNP-specific prior probabilities of association based on expert prior functional knowledge of the disease mechanism can lead to improved causal SNPs ranks compared to ranking with identical prior probabilities of association. We demonstrate the use of our methods by applying the methods to the fine mapping of the CASP8 region of chromosome 2 using genotype data from the Collaborative Oncological Gene-Environment Study (COGS) Consortium. The data we analysed included approximately 46,000 breast cancer case and 43,000 healthy control samples.
Collapse
Affiliation(s)
- Amy V. Spencer
- Advanced Analytics CentreGlobal Medicines DevelopmentAstraZenecaAlderley ParkMacclesfieldUnited Kingdom
- School of Mathematics and StatisticsUniversity of SheffieldSheffieldUnited Kingdom
| | - Angela Cox
- Department of OncologySheffield Cancer Research CentreUniversity of Sheffield Medical SchoolBeech Hill RoadSheffieldUnited Kingdom
| | - Wei‐Yu Lin
- Department of OncologySheffield Cancer Research CentreUniversity of Sheffield Medical SchoolBeech Hill RoadSheffieldUnited Kingdom
- Cardiovascular Epidemiology UnitDepartment of Public Health and Primary CareUniversity of CambridgeCambridgeUnited Kingdom
| | - Douglas F. Easton
- Department of Public Health and Primary CareCentre for Cancer Genetic EpidemiologyUniversity of CambridgeCambridgeUnited Kingdom
- Department of OncologyCentre for Cancer Genetic EpidemiologyUniversity of CambridgeCambridgeUnited Kingdom
| | - Kyriaki Michailidou
- Department of Public Health and Primary CareCentre for Cancer Genetic EpidemiologyUniversity of CambridgeCambridgeUnited Kingdom
| | - Kevin Walters
- School of Mathematics and StatisticsUniversity of SheffieldSheffieldUnited Kingdom
| |
Collapse
|