1
|
Sajal IH, Biswas S. Bivariate quantitative Bayesian LASSO for detecting association of rare haplotypes with two correlated continuous phenotypes. Front Genet 2023; 14:1104727. [PMID: 36968609 PMCID: PMC10033866 DOI: 10.3389/fgene.2023.1104727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 02/21/2023] [Indexed: 03/12/2023] Open
Abstract
In genetic association studies, the multivariate analysis of correlated phenotypes offers statistical and biological advantages compared to analyzing one phenotype at a time. The joint analysis utilizes additional information contained in the correlation and avoids multiple testing. It also provides an opportunity to investigate and understand shared genetic mechanisms of multiple phenotypes. Bivariate logistic Bayesian LASSO (LBL) was proposed earlier to detect rare haplotypes associated with two binary phenotypes or one binary and one continuous phenotype jointly. There is currently no haplotype association test available that can handle multiple continuous phenotypes. In this study, by employing the framework of bivariate LBL, we propose bivariate quantitative Bayesian LASSO (QBL) to detect rare haplotypes associated with two continuous phenotypes. Bivariate QBL removes unassociated haplotypes by regularizing the regression coefficients and utilizing a latent variable to model correlation between two phenotypes. We carry out extensive simulations to investigate the performance of bivariate QBL and compare it with that of a standard (univariate) haplotype association test, Haplo.score (applied twice to two phenotypes individually). Bivariate QBL performs better than Haplo.score in all simulations with varying degrees of power gain. We analyze Genetic Analysis Workshop 19 exome sequencing data on systolic and diastolic blood pressures and detect several rare haplotypes associated with the two phenotypes.
Collapse
Affiliation(s)
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX, United States
| |
Collapse
|
2
|
ordinalbayes: Fitting Ordinal Bayesian Regression Models to High-Dimensional Data Using R. STATS 2022; 5:371-384. [PMID: 35574500 PMCID: PMC9097970 DOI: 10.3390/stats5020021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The stage of cancer is a discrete ordinal response that indicates the aggressiveness of disease and is often used by physicians to determine the type and intensity of treatment to be administered. For example, the FIGO stage in cervical cancer is based on the size and depth of the tumor as well as the level of spread. It may be of clinical relevance to identify molecular features from high-throughput genomic assays that are associated with the stage of cervical cancer to elucidate pathways related to tumor aggressiveness, identify improved molecular features that may be useful for staging, and identify therapeutic targets. High-throughput RNA-Seq data and corresponding clinical data (including stage) for cervical cancer patients have been made available through The Cancer Genome Atlas Project (TCGA). We recently described penalized Bayesian ordinal response models that can be used for variable selection for over-parameterized datasets, such as the TCGA-CESC dataset. Herein, we describe our ordinalbayes R package, available from the Comprehensive R Archive Network (CRAN), which enhances the runjags R package by enabling users to easily fit cumulative logit models when the outcome is ordinal and the number of predictors exceeds the sample size, P > N, such as for TCGA and other high-throughput genomic data. We demonstrate the use of this package by applying it to the TCGA cervical cancer dataset. Our ordinalbayes package can be used to fit models to high-dimensional datasets, and it effectively performs variable selection.
Collapse
|
3
|
Impairments of Photoreceptor Outer Segments Renewal and Phototransduction Due to a Peripherin Rare Haplotype Variant: Insights from Molecular Modeling. Int J Mol Sci 2021; 22:ijms22073484. [PMID: 33801777 PMCID: PMC8036374 DOI: 10.3390/ijms22073484] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Revised: 03/23/2021] [Accepted: 03/25/2021] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Retinitis pigmentosa punctata albescens (RPA) is a particular form of retinitis pigmentosa characterized by childhood onset night blindness and areas of peripheral retinal atrophy. We investigated the genetic cause of RPA in a family consisting of two affected Egyptian brothers with healthy consanguineous parents. METHODS Mutational analysis of four RPA causative genes was realized by Sanger sequencing on both probands, and detected variants were subsequently genotyped in their parents. Afterwards, found variants were deeply, statistically, and in silico characterized to determine their possible effects and association with RPA. RESULTS Both brothers carry three missense PRPH2 variants in a homozygous condition (c.910C > A, c.929G > A, and c.1013A > C) and two promoter variants in RHO (c.-26A > G) and RLBP1 (c.-70G > A) genes, respectively. Haplotype analyses highlighted a PRPH2 rare haplotype variant (GAG), determining a possible alteration of PRPH2 binding with melanoregulin and other outer segment proteins, followed by photoreceptor outer segment instability. Furthermore, an altered balance of transcription factor binding sites, due to the presence of RHO and RLBP1 promoter variants, might determine a comprehensive downregulation of both genes, possibly altering the PRPH2 shared visual-related pathway. CONCLUSIONS Despite several limitations, the study might be a relevant step towards detection of novel scenarios in RPA etiopathogenesis.
Collapse
|
4
|
Yuan X, Biswas S. Detecting rare haplotype association with two correlated phenotypes of binary and continuous types. Stat Med 2021; 40:1877-1900. [PMID: 33438281 DOI: 10.1002/sim.8877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Revised: 11/18/2020] [Accepted: 12/25/2020] [Indexed: 11/10/2022]
Abstract
Multiple correlated traits/phenotypes are often collected in genetic association studies and they may share a common genetic mechanism. Joint analysis of correlated phenotypes has well-known advantages over one-at-a-time analysis including gain in power and better understanding of genetic etiology. However, when the phenotypes are of discordant types such as binary and continuous, the joint modeling is more challenging. Another research area of current interest is discovery of rare genetic variants. Currently there is no method available for detecting association of rare (or common) haplotypes with multiple discordant phenotypes jointly. Our goal is to fill this gap specifically for two discordant phenotypes. We consider a rare haplotype association method for a binary phenotype, logistic Bayesian LASSO (univariate LBL) and its extension for two correlated binary phenotypes (bivariate LBL-2B). Under this framework, we propose a haplotype association test with binary and continuous phenotypes jointly (bivariate LBL-BC). Specifically, we use a latent variable to induce correlation between the two phenotypes. We carry out extensive simulations to investigate bivariate LBL-BC and compare it with univariate LBL and bivariate LBL-2B. In most settings, bivariate LBL-BC performs the best. In only two situations, bivariate LBL-BC has similar performance-when the two phenotypes are (1) weakly or not correlated and the target haplotype affects the binary phenotype only and (2) strongly positively correlated and the target haplotype affects both phenotypes in positive direction. Finally, we apply the method to a data set on lung cancer and nicotine dependence and detect several haplotypes including a rare one.
Collapse
Affiliation(s)
- Xiaochen Yuan
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA
| |
Collapse
|
5
|
Yuan X, Biswas S. Bivariate logistic Bayesian LASSO for detecting rare haplotype association with two correlated phenotypes. Genet Epidemiol 2019; 43:996-1017. [PMID: 31544985 DOI: 10.1002/gepi.22258] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Revised: 07/31/2019] [Accepted: 08/09/2019] [Indexed: 11/08/2022]
Abstract
In genetic association studies, joint modeling of related traits/phenotypes can utilize the correlation between them and thereby provide more power and uncover additional information about genetic etiology. Moreover, detecting rare genetic variants are of current scientific interest as a key to missing heritability. Logistic Bayesian LASSO (LBL) has been proposed recently to detect rare haplotype variants using case-control data, that is, a single binary phenotype. As there is currently no haplotype association method that can handle multiple binary phenotypes, we extend LBL to fill this gap. We develop a bivariate model by using a latent variable to induce correlation between the two outcomes. We carry out extensive simulations to investigate the bivariate LBL and compare with the univariate LBL. The bivariate LBL performs better or similar to the univariate LBL in most settings. It has the highest gain in power when a haplotype is associated with both traits and it affects at least one trait in a direction opposite to the direction of the correlation between the traits. We analyze two data sets-Genetic Analysis Workshop 19 sequence data on systolic and diastolic blood pressures and a genome-wide association data set on lung cancer and smoking and detect several associated rare haplotypes.
Collapse
Affiliation(s)
- Xiaochen Yuan
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas
| |
Collapse
|
6
|
Datta AS, Lin S, Biswas S. A Family-Based Rare Haplotype Association Method for Quantitative Traits. Hum Hered 2019; 83:175-195. [PMID: 30799419 DOI: 10.1159/000493543] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Accepted: 09/07/2018] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND The variants identified in genome-wide association studies account for only a small fraction of disease heritability. A key to this "missing heritability" is believed to be rare variants. Specifically, we focus on rare haplotype variant (rHTV). The existing methods for detecting rHTV are mostly population-based, and as such, are susceptible to population stratification and admixture, leading to an inflated false-positive rate. Family-based methods are more robust in this respect. METHODS We propose a method for detecting rHTVs associated with quantitative traits called family-based quantitative Bayesian LASSO (famQBL). FamQBL can analyze any type of pedigree and is based on a mixed model framework. We regularize the haplotype effects using Bayesian LASSO and estimate the posterior distributions using Markov chain Monte Carlo methods. RESULTS We conduct simulation studies, including analyses of Genetic Analysis Workshop 18 simulated data, to study the properties of famQBL and compare with a standard family-based haplotype association test implemented in FBAT (family-based association test) software. We find famQBL to be more powerful than FBAT with well-controlled false-positive rates. We also apply famQBL to the Framingham Heart Study data and detect an rHTV associated with diastolic blood pressure. CONCLUSION FamQBL can help uncover rHTVs associated with quantitative traits.
Collapse
Affiliation(s)
- Ananda S Datta
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA
| | - Shili Lin
- Department of Statistics, The Ohio State University, Columbus, Ohio, USA
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA,
| |
Collapse
|
7
|
Geng Y, Zhao Z, Zhang X, Wang W, Cui X, Ye K, Xiao X, Wang J. An improved burden-test pipeline for identifying associations from rare germline and somatic variants. BMC Genomics 2017. [PMID: 29513197 PMCID: PMC5657102 DOI: 10.1186/s12864-017-4133-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identifying rare germline and somatic variants associated with cancer progression is an important research topic in cancer genomics. Although many approaches are proposed for rare variant association study, they are not fit for cancer sequencing data due to multiple issues, such as overly relying on pre-selection, losing sight of interacting hotspots, etc. RESULTS In this article, we propose an improved pipeline to identify germline variant and somatic mutation interactions influencing cancer susceptibility from pair-wise cancer sequencing data. The proposed pipeline, RareProb-C performs an algorithmic selection on the given variants by incorporating the variant allelic frequencies. The interactions among the variants are considered within the regions which are limited by a four-gamete test. Then it filters singular cases according to the posterior probability at each site. Finally, it outputs the selected candidates that pass a collapse test. CONCLUSIONS We apply RareProb-C on a series of carefully constructed simulation cases and it outperforms six existing genetic model-free approaches. We also test RareProb-C on 429 TCGA ovarian cancer cases, and RareProb-C successfully identifies the known highlighted variants which are considered increasing disease susceptibilities.
Collapse
Affiliation(s)
- Yu Geng
- School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China.,Jinzhou Medical University, Jinzhou, Liaoning, 121001, China
| | - Zhongmeng Zhao
- School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China. .,Institute of Data Science and Information Quality, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China.
| | - Xuanping Zhang
- School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China.,Institute of Data Science and Information Quality, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China
| | - Wenke Wang
- School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China
| | - Xingjian Cui
- School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China.,Institute of Data Science and Information Quality, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China
| | - Kai Ye
- School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China.,Institute of Data Science and Information Quality, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China
| | - Xiao Xiao
- Institute of Data Science and Information Quality, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China.,State Key Laboratory of Cancer Biology, Xijing Hospital of Digestive Diseases, Xi'an, 710032, Shaanxi, China
| | - Jiayin Wang
- School of Management, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China. .,Institute of Data Science and Information Quality, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China.
| |
Collapse
|
8
|
Zhang Y, Hofmann JN, Purdue MP, Lin S, Biswas S. Logistic Bayesian LASSO for genetic association analysis of data from complex sampling designs. J Hum Genet 2017; 62:819-829. [PMID: 28424482 PMCID: PMC5572548 DOI: 10.1038/jhg.2017.43] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Revised: 03/21/2017] [Accepted: 03/22/2017] [Indexed: 01/20/2023]
Abstract
Detecting gene-environment interactions with rare variants is critical in dissecting the etiology of common diseases. Interactions with rare haplotype variants (rHTVs) are of particular interest. At the same time, complex sampling designs, such as stratified random sampling, are becoming increasingly popular for designing case-control studies, especially for recruiting controls. The US Kidney Cancer Study (KCS) is an example, wherein all available cases were included while the controls at each site were randomly selected from the population by frequency matching with cases based on age, sex and race. There is currently no rHTV association method that can account for such a complex sampling design. To fill this gap, we consider logistic Bayesian LASSO (LBL), an existing rHTV approach for case-control data, and show that its model can easily accommodate the complex sampling design. We study two extensions that include stratifying variables either as main effects only or with additional modeling of their interactions with haplotypes. We conduct extensive simulation studies to compare the complex sampling methods with the original LBL methods. We find that, when there is no interaction between haplotype and stratifying variables, both extensions perform well while the original LBL methods lead to inflated type I error rates. However, when such an interaction exists, it is necessary to include the interaction effect in the model to control the type I error rate. Finally, we analyze the KCS data and find a significant interaction between (current) smoking and a specific rHTV in the N-acetyltransferase 2 gene.
Collapse
Affiliation(s)
- Yuan Zhang
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX, USA
| | - Jonathan N Hofmann
- Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Mark P Purdue
- Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Shili Lin
- Department of Statistics, The Ohio State University, Columbus, OH, USA
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX, USA
| |
Collapse
|
9
|
Datta AS, Zhang Y, Zhang L, Biswas S. Association of rare haplotypes on ULK4 and MAP4 genes with hypertension. BMC Proc 2016; 10:363-369. [PMID: 27980663 PMCID: PMC5133474 DOI: 10.1186/s12919-016-0057-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Several variants have been implicated earlier on ULK4 and MAP4 genes on chromosome 3 to be associated with hypertension. As a natural follow-up step, we explore association of haplotypes in those genes. We consider the Genetic Analysis Workshop 19 real data on unrelated individuals and analyze haplotype blocks of 5 single-nucleotide polymorphisms through a sliding window approach. We apply 4 haplotype association methods-haplo.score, haplo.glm, hapassoc, and logistic Bayesian LASSO (LBL)-and for comparison, sequence kernel association test (SKAT) and its variants. We find several rare haplotype blocks to be associated. To get an idea about the false-positive proportions, we also analyzed the data after permuting the case-control status of individuals. We found that LBL, unlike the other methods, maintains low false-positive rates in presence of rare haplotypes. Thus, we conclude that the haplotypes found to be associated by LBL are more likely to be true positive. SKAT and its variants did not find significance on either gene.
Collapse
Affiliation(s)
- Ananda S. Datta
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX USA
| | - Yuan Zhang
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX USA
| | - Lei Zhang
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX USA
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX USA
| |
Collapse
|
10
|
Datta AS, Biswas S. Comparison of haplotype-based statistical tests for disease association with rare and common variants. Brief Bioinform 2015; 17:657-71. [PMID: 26338417 DOI: 10.1093/bib/bbv072] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Indexed: 01/26/2023] Open
Abstract
Recent literature has highlighted the advantages of haplotype association methods for detecting rare variants associated with common diseases. As several new haplotype association methods have been proposed in the past few years, a comparison of new and standard methods is important and timely for guidance to the practitioners. We consider nine methods-Haplo.score, Haplo.glm, Hapassoc, Bayesian hierarchical Generalized Linear Model (BhGLM), Logistic Bayesian LASSO (LBL), regularized GLM (rGLM), Haplotype Kernel Association Test, wei-SIMc-matching and Weighted Haplotype and Imputation-based Tests. These can be divided into two types-individual haplotype-specific tests and global tests depending on whether there is just one overall test for a haplotype region (global) or there is an individual test for each haplotype in the region. Haplo.score is the only method that tests for both; Haplo.glm, Hapassoc, BhGLM and LBL are individual haplotype-specific, while the rest are global tests. For comparison, we also apply a popular collapsing method-Sequence Kernel Association Test (SKAT) and its two variants-SKAT-O (Optimal) and SKAT-C (Combined). We carry out an extensive comparison on our simulated data sets as well as on the Genetic Analysis Workshop (GAW) 18 simulated data. Further, we apply the methods to GAW18 real hypertension data and Dallas Heart Study sequence data. We find that LBL, Haplo.score (global test) and rGLM perform well over the scenarios considered here. Also, haplotype methods are more powerful (albeit more computationally intensive) than SKAT and its variants in scenarios where multiple causal variants act interactively to produce haplotype effects.
Collapse
|
11
|
Satten GA, Biswas S, Papachristou C, Turkmen A, König IR. Population-based association and gene by environment interactions in Genetic Analysis Workshop 18. Genet Epidemiol 2014; 38 Suppl 1:S49-56. [PMID: 25112188 DOI: 10.1002/gepi.21825] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
In the past decade, genome-wide association studies have been successful in identifying genetic loci that play a role in many complex diseases. Despite this, it has become clear that for many traits, investigation of single common variants does not give a complete picture of the genetic contribution to the phenotype. Therefore a number of new approaches are currently being investigated to further the search for susceptibility loci or regions. We summarize the contributions to Genetic Analysis Workshop 18 (GAW18) that concern this search using methods for population-based association analysis. Many of the members of our GAW18 working group made use of data types that have only recently become available through the use of next-generation sequencing technologies, with many focusing on the investigation of rare variants instead of or in combination with common variants. Some contributors used a haplotype-based approach, which to date has been used relatively infrequently but may become more important for analyzing rare variant association data. Others analyzed gene-gene or gene-environment interactions, where novel statistical approaches were needed to make the best use of the available information without requiring an excessive computational burden. GAW18 provided participants with the chance to make use of state-of-the-art data, statistical techniques, and technology. We report here some of the experiences and conclusions that were reached by workshop participants who analyzed the GAW18 data as a population-based association study.
Collapse
Affiliation(s)
- Glen A Satten
- Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | | | | | | | | |
Collapse
|
12
|
Biswas S, Xia S, Lin S. Detecting rare haplotype-environment interaction with logistic Bayesian LASSO. Genet Epidemiol 2013; 38:31-41. [PMID: 24272913 DOI: 10.1002/gepi.21773] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2013] [Revised: 09/13/2013] [Accepted: 10/15/2013] [Indexed: 11/09/2022]
Abstract
Two important contributors to missing heritability are believed to be rare variants and gene-environment interaction (GXE). Thus, detecting GXE where G is a rare haplotype variant (rHTV) is a pressing problem. Haplotype analysis is usually the natural second step to follow up on a genomic region that is implicated to be associated through single nucleotide variants (SNV) analysis. Further, rHTV can tag associated rare SNV and provide greater power to detect them than popular collapsing methods. Recently we proposed Logistic Bayesian LASSO (LBL) for detecting rHTV association with case-control data. LBL shrinks the unassociated (especially common) haplotypes toward zero so that an associated rHTV can be identified with greater power. Here, we incorporate environmental factors and their interactions with haplotypes in LBL. As LBL is based on retrospective likelihood, this extension is not trivial. We model the joint distribution of haplotypes and covariates given the case-control status. We apply the approach (LBL-GXE) to the Michigan, Mayo, AREDS, Pennsylvania Cohort Study on Age-related Macular Degeneration (AMD). LBL-GXE detects interaction of a specific rHTV in CFH gene with smoking. To the best of our knowledge, this is the first time in the AMD literature that an interaction of smoking with a specific (rather than pooled) rHTV has been implicated. We also carry out simulations and find that LBL-GXE has reasonably good powers for detecting interactions with rHTV while keeping the type I error rates well controlled. Thus, we conclude that LBL-GXE is a useful tool for uncovering missing heritability.
Collapse
Affiliation(s)
- Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, United States of America
| | | | | |
Collapse
|