1
|
Sajal IH, Biswas S. Bivariate quantitative Bayesian LASSO for detecting association of rare haplotypes with two correlated continuous phenotypes. Front Genet 2023; 14:1104727. [PMID: 36968609 PMCID: PMC10033866 DOI: 10.3389/fgene.2023.1104727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 02/21/2023] [Indexed: 03/12/2023] Open
Abstract
In genetic association studies, the multivariate analysis of correlated phenotypes offers statistical and biological advantages compared to analyzing one phenotype at a time. The joint analysis utilizes additional information contained in the correlation and avoids multiple testing. It also provides an opportunity to investigate and understand shared genetic mechanisms of multiple phenotypes. Bivariate logistic Bayesian LASSO (LBL) was proposed earlier to detect rare haplotypes associated with two binary phenotypes or one binary and one continuous phenotype jointly. There is currently no haplotype association test available that can handle multiple continuous phenotypes. In this study, by employing the framework of bivariate LBL, we propose bivariate quantitative Bayesian LASSO (QBL) to detect rare haplotypes associated with two continuous phenotypes. Bivariate QBL removes unassociated haplotypes by regularizing the regression coefficients and utilizing a latent variable to model correlation between two phenotypes. We carry out extensive simulations to investigate the performance of bivariate QBL and compare it with that of a standard (univariate) haplotype association test, Haplo.score (applied twice to two phenotypes individually). Bivariate QBL performs better than Haplo.score in all simulations with varying degrees of power gain. We analyze Genetic Analysis Workshop 19 exome sequencing data on systolic and diastolic blood pressures and detect several rare haplotypes associated with the two phenotypes.
Collapse
Affiliation(s)
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX, United States
| |
Collapse
|
2
|
Yuan X, Biswas S. Detecting rare haplotype association with two correlated phenotypes of binary and continuous types. Stat Med 2021; 40:1877-1900. [PMID: 33438281 DOI: 10.1002/sim.8877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Revised: 11/18/2020] [Accepted: 12/25/2020] [Indexed: 11/10/2022]
Abstract
Multiple correlated traits/phenotypes are often collected in genetic association studies and they may share a common genetic mechanism. Joint analysis of correlated phenotypes has well-known advantages over one-at-a-time analysis including gain in power and better understanding of genetic etiology. However, when the phenotypes are of discordant types such as binary and continuous, the joint modeling is more challenging. Another research area of current interest is discovery of rare genetic variants. Currently there is no method available for detecting association of rare (or common) haplotypes with multiple discordant phenotypes jointly. Our goal is to fill this gap specifically for two discordant phenotypes. We consider a rare haplotype association method for a binary phenotype, logistic Bayesian LASSO (univariate LBL) and its extension for two correlated binary phenotypes (bivariate LBL-2B). Under this framework, we propose a haplotype association test with binary and continuous phenotypes jointly (bivariate LBL-BC). Specifically, we use a latent variable to induce correlation between the two phenotypes. We carry out extensive simulations to investigate bivariate LBL-BC and compare it with univariate LBL and bivariate LBL-2B. In most settings, bivariate LBL-BC performs the best. In only two situations, bivariate LBL-BC has similar performance-when the two phenotypes are (1) weakly or not correlated and the target haplotype affects the binary phenotype only and (2) strongly positively correlated and the target haplotype affects both phenotypes in positive direction. Finally, we apply the method to a data set on lung cancer and nicotine dependence and detect several haplotypes including a rare one.
Collapse
Affiliation(s)
- Xiaochen Yuan
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA
| |
Collapse
|
3
|
Zhou X, Wang M, Lin S. Detecting rare haplotypes associated with complex diseases using both population and family data: Combined logistic Bayesian Lasso. Stat Methods Med Res 2020; 29:3340-3350. [DOI: 10.1177/0962280220927728] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Haplotype-based association methods have been developed to understand the genetic architecture of complex diseases. Compared to single-variant-based methods, haplotype methods are thought to be more biologically relevant, since there are typically multiple non-independent genetic variants involved in complex diseases, and the use of haplotypes implicitly accounts for non-independence caused by linkage disequilibrium. In recent years, with the focus moving from common to rare variants, haplotype-based methods have also evolved accordingly to uncover the roles of rare haplotypes. One particular approach is regularization-based, with the use of Bayesian least absolute shrinkage and selection operator (Lasso) as an example. This type of methods has been developed for either case-control population data (the logistic Bayesian Lasso (LBL)) or family data (family-triad-based logistic Bayesian Lasso (famLBL)). In some situations, both family data and case-control data are available; therefore, it would be a waste of resources if only one of them could be analyzed. To make full usage of available data to increase power, we propose a unified approach that can combine both case-control and family data (combined logistic Bayesian Lasso (cLBL)). Through simulations, we characterized the performance of cLBL and showed the advantage of cLBL over existing methods. We further applied cLBL to the Framingham Heart Study data to demonstrate its utility in real data applications.
Collapse
Affiliation(s)
- Xiaofei Zhou
- Department of Statistics, The Ohio State University, Columbus, OH, USA
| | - Meng Wang
- Battelle Center for Mathematical Medicine, Nationwide Children’s Hospital, Columbus, OH, USA
| | - Shili Lin
- Department of Statistics, The Ohio State University, Columbus, OH, USA
| |
Collapse
|
4
|
Yuan X, Biswas S. Bivariate logistic Bayesian LASSO for detecting rare haplotype association with two correlated phenotypes. Genet Epidemiol 2019; 43:996-1017. [PMID: 31544985 DOI: 10.1002/gepi.22258] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Revised: 07/31/2019] [Accepted: 08/09/2019] [Indexed: 11/08/2022]
Abstract
In genetic association studies, joint modeling of related traits/phenotypes can utilize the correlation between them and thereby provide more power and uncover additional information about genetic etiology. Moreover, detecting rare genetic variants are of current scientific interest as a key to missing heritability. Logistic Bayesian LASSO (LBL) has been proposed recently to detect rare haplotype variants using case-control data, that is, a single binary phenotype. As there is currently no haplotype association method that can handle multiple binary phenotypes, we extend LBL to fill this gap. We develop a bivariate model by using a latent variable to induce correlation between the two outcomes. We carry out extensive simulations to investigate the bivariate LBL and compare with the univariate LBL. The bivariate LBL performs better or similar to the univariate LBL in most settings. It has the highest gain in power when a haplotype is associated with both traits and it affects at least one trait in a direction opposite to the direction of the correlation between the traits. We analyze two data sets-Genetic Analysis Workshop 19 sequence data on systolic and diastolic blood pressures and a genome-wide association data set on lung cancer and smoking and detect several associated rare haplotypes.
Collapse
Affiliation(s)
- Xiaochen Yuan
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas
| |
Collapse
|
5
|
Datta AS, Lin S, Biswas S. A Family-Based Rare Haplotype Association Method for Quantitative Traits. Hum Hered 2019; 83:175-195. [PMID: 30799419 DOI: 10.1159/000493543] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Accepted: 09/07/2018] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND The variants identified in genome-wide association studies account for only a small fraction of disease heritability. A key to this "missing heritability" is believed to be rare variants. Specifically, we focus on rare haplotype variant (rHTV). The existing methods for detecting rHTV are mostly population-based, and as such, are susceptible to population stratification and admixture, leading to an inflated false-positive rate. Family-based methods are more robust in this respect. METHODS We propose a method for detecting rHTVs associated with quantitative traits called family-based quantitative Bayesian LASSO (famQBL). FamQBL can analyze any type of pedigree and is based on a mixed model framework. We regularize the haplotype effects using Bayesian LASSO and estimate the posterior distributions using Markov chain Monte Carlo methods. RESULTS We conduct simulation studies, including analyses of Genetic Analysis Workshop 18 simulated data, to study the properties of famQBL and compare with a standard family-based haplotype association test implemented in FBAT (family-based association test) software. We find famQBL to be more powerful than FBAT with well-controlled false-positive rates. We also apply famQBL to the Framingham Heart Study data and detect an rHTV associated with diastolic blood pressure. CONCLUSION FamQBL can help uncover rHTVs associated with quantitative traits.
Collapse
Affiliation(s)
- Ananda S Datta
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA
| | - Shili Lin
- Department of Statistics, The Ohio State University, Columbus, Ohio, USA
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA,
| |
Collapse
|
6
|
Zhou X, Wang M, Zhang H, Stewart WCL, Lin S. Logistic Bayesian LASSO for detecting association combining family and case-control data. BMC Proc 2018; 12:54. [PMID: 30263052 PMCID: PMC6156907 DOI: 10.1186/s12919-018-0139-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Because of the limited information from the GAW20 samples when only case-control or trio data are considered, we propose eLBL, an extension of the Logistic Bayesian LASSO (least absolute shrinkage and selection operator) methodology so that both types of data can be analyzed jointly in the hope of obtaining an increased statistical power, especially for detecting association between rare haplotypes and complex diseases. The methodology is further extended to account for familial correlation among the case-control individuals and the trios. A 2-step analysis strategy was taken to first perform a genome-wise single single-nucleotide polymorphism (SNP) search using the Monte Carlo pedigree disequilibrium test (MCPDT) to determine interesting regions for the Adult Treatment Panel (ATP) binary trait. Then eLBL was applied to haplotype blocks covering the flagged SNPs in Step 1. Several significantly associated haplotypes were identified; most are in blocks contained in protein coding genes that appear to be relevant for metabolic syndrome. The results are further substantiated with a Type I error study and by an additional analysis using the triglyceride measurements directly as a quantitative trait.
Collapse
Affiliation(s)
- Xiaofei Zhou
- 1Department of Statistics, The Ohio State University, 1958 Neil Avenue, Columbus, OH 43210 USA
| | - Meng Wang
- 2Battelle Center for Mathematical Medicine, Nationwide Children's Hospital Research Institute, 700 Childrens Drive, Columbus, OH 43205 USA
| | - Han Zhang
- 1Department of Statistics, The Ohio State University, 1958 Neil Avenue, Columbus, OH 43210 USA
| | - William C L Stewart
- 1Department of Statistics, The Ohio State University, 1958 Neil Avenue, Columbus, OH 43210 USA.,2Battelle Center for Mathematical Medicine, Nationwide Children's Hospital Research Institute, 700 Childrens Drive, Columbus, OH 43205 USA
| | - Shili Lin
- 1Department of Statistics, The Ohio State University, 1958 Neil Avenue, Columbus, OH 43210 USA
| |
Collapse
|
7
|
Zeng Y, Navarro P, Shirali M, Howard DM, Adams MJ, Hall LS, Clarke TK, Thomson PA, Smith BH, Murray A, Padmanabhan S, Hayward C, Boutin T, MacIntyre DJ, Lewis CM, Wray NR, Mehta D, Penninx BW, Milaneschi Y, Baune BT, Air T, Hottenga JJ, Mbarek H, Castelao E, Pistis G, Schulze TG, Streit F, Forstner AJ, Byrne EM, Martin NG, Breen G, Müller-Myhsok B, Lucae S, Kloiber S, Domenici E, Deary IJ, Porteous DJ, Haley CS, McIntosh AM. Genome-wide Regional Heritability Mapping Identifies a Locus Within the TOX2 Gene Associated With Major Depressive Disorder. Biol Psychiatry 2017; 82:312-321. [PMID: 28153336 PMCID: PMC5553996 DOI: 10.1016/j.biopsych.2016.12.012] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Revised: 11/16/2016] [Accepted: 12/13/2016] [Indexed: 12/03/2022]
Abstract
BACKGROUND Major depressive disorder (MDD) is the second largest cause of global disease burden. It has an estimated heritability of 37%, but published genome-wide association studies have so far identified few risk loci. Haplotype-block-based regional heritability mapping (HRHM) estimates the localized genetic variance explained by common variants within haplotype blocks, integrating the effects of multiple variants, and may be more powerful for identifying MDD-associated genomic regions. METHODS We applied HRHM to Generation Scotland: The Scottish Family Health Study, a large family- and population-based Scottish cohort (N = 19,896). Single-single nucleotide polymorphism (SNP) and haplotype-based association tests were used to localize the association signal within the regions identified by HRHM. Functional prediction was used to investigate the effect of MDD-associated SNPs within the regions. RESULTS A haplotype block across a 24-kb region within the TOX2 gene reached genome-wide significance in HRHM. Single-SNP- and haplotype-based association tests demonstrated that five of nine genotyped SNPs and two haplotypes within this block were significantly associated with MDD. The expression of TOX2 and a brain-specific long noncoding RNA RP1-269M15.3 in frontal cortex and nucleus accumbens basal ganglia, respectively, were significantly regulated by MDD-associated SNPs within this region. Both the regional heritability and single-SNP associations within this block were replicated in the UK-Ireland group of the most recent release of the Psychiatric Genomics Consortium (PGC), the PGC2-MDD (Major Depression Dataset). The SNP association was also replicated in a depressive symptom sample that shares some individuals with the PGC2-MDD. CONCLUSIONS This study highlights the value of HRHM for MDD and provides an important target within TOX2 for further functional studies.
Collapse
Affiliation(s)
- Yanni Zeng
- Division of Psychiatry, University of Edinburgh, Edinburgh.
| | - Pau Navarro
- Medical Research Council Human Genetics Unit, University of Edinburgh, Edinburgh
| | - Masoud Shirali
- Medical Research Council Human Genetics Unit, University of Edinburgh, Edinburgh,Generation Scotland, Centre for Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh
| | | | - Mark J. Adams
- Division of Psychiatry, University of Edinburgh, Edinburgh
| | - Lynsey S. Hall
- Division of Psychiatry, University of Edinburgh, Edinburgh
| | | | - Pippa A. Thomson
- Centre for Genomic and Experimental Medicine, University of Edinburgh, Edinburgh,Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, Edinburgh
| | - Blair H. Smith
- Department of Psychology, University of Edinburgh, Edinburgh,Division of Population Health Sciences, University of Dundee, Dundee
| | - Alison Murray
- Division of Applied Health Sciences, University of Aberdeen, Aberdeen
| | - Sandosh Padmanabhan
- Centre for Genomic and Experimental Medicine, University of Edinburgh, Edinburgh,Institute of Cardiovascular and Medical Sciences, University of Glasgow, Glasgow
| | - Caroline Hayward
- Centre for Genomic and Experimental Medicine, University of Edinburgh, Edinburgh
| | - Thibaud Boutin
- Centre for Genomic and Experimental Medicine, University of Edinburgh, Edinburgh
| | | | - Cathryn M. Lewis
- MRC Social, Genetic, and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology, and Neuroscience, King’s College London, London, United Kingdom
| | - Naomi R. Wray
- Queensland Brain Institute, University of Queensland, St. Lucia, Queensland
| | - Divya Mehta
- Queensland Brain Institute, University of Queensland, St. Lucia, Queensland
| | | | - Yuri Milaneschi
- Department of Psychiatry, VU University Medical Center, Amsterdam, The Netherlands
| | - Bernhard T. Baune
- Discipline of Psychiatry, University of Adelaide, Adelaide, Australia
| | - Tracy Air
- Discipline of Psychiatry, University of Adelaide, Adelaide, Australia
| | - Jouke-Jan Hottenga
- Department of Biological Psychology, VU University, Amsterdam, The Netherlands
| | - Hamdi Mbarek
- Department of Biological Psychology, VU University, Amsterdam, The Netherlands
| | - Enrique Castelao
- Department of Psychiatry, Lausanne University Hospital, Lausanne, Switzerland
| | - Giorgio Pistis
- Department of Psychiatry, Lausanne University Hospital, Lausanne, Switzerland
| | - Thomas G. Schulze
- Institute of Psychiatric Phenomics and Genomics, Ludwig-Maximilians-University, Munich Cluster for Systems Neurology, Munich,Department of Psychiatry and Psychotherapy, University Medical Center, Georg-August-University, Göttingen,Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Heidelberg
| | - Fabian Streit
- Department of Genetic Epidemiology in Psychiatry, Medical Faculty Mannheim, Central Institute of Mental Health, University of Heidelberg, Mannheim
| | - Andreas J. Forstner
- Institute of Human Genetics, Life and Brain Center, University of Bonn, Bonn, Germany,Department of Genomics, Life and Brain Center, University of Bonn, Bonn, Germany
| | - Enda M. Byrne
- Queensland Brain Institute, University of Queensland, St. Lucia, Queensland
| | | | - Gerome Breen
- MRC Social, Genetic, and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology, and Neuroscience, King’s College London, London, United Kingdom
| | | | - Susanne Lucae
- Max Planck Institute of Psychiatry, Munich Cluster for Systems Neurology, Munich
| | - Stefan Kloiber
- Max Planck Institute of Psychiatry, Munich Cluster for Systems Neurology, Munich
| | - Enrico Domenici
- Laboratory of Neurogenomic Biomarkers, Centre for Integrative Biology, University of Trento, Trento, Italy
| | | | - Ian J. Deary
- Generation Scotland, Centre for Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh,Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, Edinburgh,Department of Psychology, University of Edinburgh, Edinburgh
| | - David J. Porteous
- Centre for Genomic and Experimental Medicine, University of Edinburgh, Edinburgh,Generation Scotland, Centre for Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh,Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, Edinburgh
| | - Chris S. Haley
- Medical Research Council Human Genetics Unit, University of Edinburgh, Edinburgh,The Roslin Institute and Royal (Dick) School of Veterinary Sciences, University of Edinburgh, Edinburgh
| | - Andrew M. McIntosh
- Division of Psychiatry, University of Edinburgh, Edinburgh,Generation Scotland, Centre for Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh,Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, Edinburgh
| |
Collapse
|
8
|
Režen T, Ogris I, Sever M, Merzel F, Golic Grdadolnik S, Rozman D. Evaluation of Selected CYP51A1 Polymorphisms in View of Interactions with Substrate and Redox Partner. Front Pharmacol 2017; 8:417. [PMID: 28713270 PMCID: PMC5492350 DOI: 10.3389/fphar.2017.00417] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2017] [Accepted: 06/13/2017] [Indexed: 01/16/2023] Open
Abstract
Cholesterol is essential for development, growth, and maintenance of organisms. Mutations in cholesterol biosynthetic genes are embryonic lethal and few polymorphisms have been so far associated with pathologies in humans. Previous analyses show that lanosterol 14α-demethylase (CYP51A1) from the late part of cholesterol biosynthesis has only a few missense mutations with low minor allele frequencies and low association with pathologies in humans. The aim of this study is to evaluate the role of amino acid changes in the natural missense mutations of the hCYP51A1 protein. We searched SNP databases for existing polymorphisms of CYP51A1 and evaluated their effect on protein function. We found rare variants causing detrimental missense mutations of CYP51A1. Some missense variants were also associated with a phenotype in humans. Two missense variants have been prepared for testing enzymatic activity in vitro but failed to produce a P450 spectrum. We performed molecular modeling of three selected missense variants to evaluate the effect of the amino acid substitution on potential interaction with its substrate and the obligatory redox partner POR. We show that two of the variants, R277L and especially D152G, have possibly lower binding potential toward obligatory redox partner POR. D152G and R431H have also potentially lower affinity toward the substrate lanosterol. We evaluated the potential effect of damaging variants also using data from other in vitro CYP51A1 mutants. In conclusion, we propose to include damaging CYP51A1 variants into personalized diagnostics to improve genetic counseling for certain rare disease phenotypes.
Collapse
Affiliation(s)
- Tadeja Režen
- Faculty of Medicine, Centre for Functional Genomics and Bio-Chips, Institute of Biochemistry, University of LjubljanaLjubljana, Slovenia
| | - Iza Ogris
- Faculty of Medicine, Centre for Functional Genomics and Bio-Chips, Institute of Biochemistry, University of LjubljanaLjubljana, Slovenia
| | - Marko Sever
- Department of Biomolecular Structure, National Institute of ChemistryLjubljana, Slovenia
| | - Franci Merzel
- Department of Biomolecular Structure, National Institute of ChemistryLjubljana, Slovenia
| | | | - Damjana Rozman
- Faculty of Medicine, Centre for Functional Genomics and Bio-Chips, Institute of Biochemistry, University of LjubljanaLjubljana, Slovenia
| |
Collapse
|
9
|
Zhang Y, Hofmann JN, Purdue MP, Lin S, Biswas S. Logistic Bayesian LASSO for genetic association analysis of data from complex sampling designs. J Hum Genet 2017; 62:819-829. [PMID: 28424482 PMCID: PMC5572548 DOI: 10.1038/jhg.2017.43] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Revised: 03/21/2017] [Accepted: 03/22/2017] [Indexed: 01/20/2023]
Abstract
Detecting gene-environment interactions with rare variants is critical in dissecting the etiology of common diseases. Interactions with rare haplotype variants (rHTVs) are of particular interest. At the same time, complex sampling designs, such as stratified random sampling, are becoming increasingly popular for designing case-control studies, especially for recruiting controls. The US Kidney Cancer Study (KCS) is an example, wherein all available cases were included while the controls at each site were randomly selected from the population by frequency matching with cases based on age, sex and race. There is currently no rHTV association method that can account for such a complex sampling design. To fill this gap, we consider logistic Bayesian LASSO (LBL), an existing rHTV approach for case-control data, and show that its model can easily accommodate the complex sampling design. We study two extensions that include stratifying variables either as main effects only or with additional modeling of their interactions with haplotypes. We conduct extensive simulation studies to compare the complex sampling methods with the original LBL methods. We find that, when there is no interaction between haplotype and stratifying variables, both extensions perform well while the original LBL methods lead to inflated type I error rates. However, when such an interaction exists, it is necessary to include the interaction effect in the model to control the type I error rate. Finally, we analyze the KCS data and find a significant interaction between (current) smoking and a specific rHTV in the N-acetyltransferase 2 gene.
Collapse
Affiliation(s)
- Yuan Zhang
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX, USA
| | - Jonathan N Hofmann
- Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Mark P Purdue
- Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Shili Lin
- Department of Statistics, The Ohio State University, Columbus, OH, USA
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX, USA
| |
Collapse
|
10
|
Zhang Y, Lin S, Biswas S. Detecting rare and common haplotype-environment interaction under uncertainty of gene-environment independence assumption. Biometrics 2016; 73:344-355. [PMID: 27478935 DOI: 10.1111/biom.12567] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2015] [Revised: 05/01/2016] [Accepted: 06/01/2016] [Indexed: 11/28/2022]
Abstract
Finding rare variants and gene-environment interactions (GXE) is critical in dissecting complex diseases. We consider the problem of detecting GXE where G is a rare haplotype and E is a nongenetic factor. Such methods typically assume G-E independence, which may not hold in many applications. A pertinent example is lung cancer-there is evidence that variants on Chromosome 15q25.1 interact with smoking to affect the risk. However, these variants are associated with smoking behavior rendering the assumption of G-E independence inappropriate. With the motivation of detecting GXE under G-E dependence, we extend an existing approach, logistic Bayesian LASSO, which assumes G-E independence (LBL-GXE-I) by modeling G-E dependence through a multinomial logistic regression (referred to as LBL-GXE-D). Unlike LBL-GXE-I, LBL-GXE-D controls type I error rates in all situations; however, it has reduced power when G-E independence holds. To control type I error without sacrificing power, we further propose a unified approach, LBL-GXE, to incorporate uncertainty in the G-E independence assumption by employing a reversible jump Markov chain Monte Carlo method. Our simulations show that LBL-GXE has power similar to that of LBL-GXE-I when G-E independence holds, yet has well-controlled type I errors in all situations. To illustrate the utility of LBL-GXE, we analyzed a lung cancer dataset and found several significant interactions in the 15q25.1 region, including one between a specific rare haplotype and smoking.
Collapse
Affiliation(s)
- Yuan Zhang
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas 75080, U.S.A
| | - Shili Lin
- Department of Statistics, The Ohio State University, Columbus, Ohio 43210, U.S.A
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas 75080, U.S.A
| |
Collapse
|
11
|
You XP, Zou QL, Li JL, Zhou JY. Likelihood Ratio Test for Excess Homozygosity at Marker Loci on X Chromosome. PLoS One 2015; 10:e0145032. [PMID: 26671781 PMCID: PMC4684405 DOI: 10.1371/journal.pone.0145032] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2015] [Accepted: 11/28/2015] [Indexed: 11/20/2022] Open
Abstract
The assumption of Hardy-Weinberg equilibrium (HWE) is generally required for association analysis using case-control design on autosomes; otherwise, the size may be inflated. There has been an increasing interest of exploring the association between diseases and markers on X chromosome and the effect of the departure from HWE on association analysis on X chromosome. Note that there are two hypotheses of interest regarding the X chromosome: (i) the frequencies of the same allele at a locus in males and females are equal and (ii) the inbreeding coefficient in females is zero (without excess homozygosity). Thus, excess homozygosity and significantly different minor allele frequencies between males and females are used to filter X-linked variants. There are two existing methods to test for (i) and (ii), respectively. However, their size and powers have not been studied yet. Further, there is no existing method to simultaneously detect both hypotheses till now. Therefore, in this article, we propose a novel likelihood ratio test for both (i) and (ii) on X chromosome. To further investigate the underlying reason why the null hypothesis is statistically rejected, we also develop two likelihood ratio tests for detecting (i) and (ii), respectively. Moreover, we explore the effect of population stratification on the proposed tests. From our simulation study, the size of the test for (i) is close to the nominal significance level. However, the size of the excess homozygosity test and the test for both (i) and (ii) is conservative. So, we propose parametric bootstrap techniques to evaluate their validity and performance. Simulation results show that the proposed methods with bootstrap techniques control the size well under the respective null hypothesis. Power comparison demonstrates that the methods with bootstrap techniques are more powerful than those without bootstrap procedure and the existing methods. The application of the proposed methods to a rheumatoid arthritis dataset indicates their utility.
Collapse
Affiliation(s)
- Xiao-Ping You
- State Key Laboratory of Organ Failure Research and Guangdong Provincial Key Laboratory of Tropical Research, School of Public Health and Tropical Medicine, Southern Medical University, Guangzhou, Guangdong, China
| | - Qi-Lei Zou
- State Key Laboratory of Organ Failure Research and Guangdong Provincial Key Laboratory of Tropical Research, School of Public Health and Tropical Medicine, Southern Medical University, Guangzhou, Guangdong, China
| | - Jian-Long Li
- State Key Laboratory of Organ Failure Research and Guangdong Provincial Key Laboratory of Tropical Research, School of Public Health and Tropical Medicine, Southern Medical University, Guangzhou, Guangdong, China
| | - Ji-Yuan Zhou
- State Key Laboratory of Organ Failure Research and Guangdong Provincial Key Laboratory of Tropical Research, School of Public Health and Tropical Medicine, Southern Medical University, Guangzhou, Guangdong, China
- * E-mail:
| |
Collapse
|
12
|
Kullback-Leibler divergence for detection of rare haplotype common disease association. Eur J Hum Genet 2015; 23:1558-65. [PMID: 25735482 DOI: 10.1038/ejhg.2015.25] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2014] [Revised: 11/16/2014] [Accepted: 01/28/2015] [Indexed: 12/12/2022] Open
Abstract
Rare haplotypes may tag rare causal variants of common diseases; hence, detection of such rare haplotypes may also contribute to our understanding of complex disease etiology. Because rare haplotypes frequently result from common single-nucleotide polymorphisms (SNPs), focusing on rare haplotypes is much more economical compared with using rare single-nucleotide variants (SNVs) from sequencing, as SNPs are available and 'free' from already amassed genome-wide studies. Further, associated haplotypes may shed light on the underlying disease causal mechanism, a feat unmatched by SNV-based collapsing methods. In recent years, data mining approaches have been adapted to detect rare haplotype association. However, as they rely on an assumed underlying disease model and require the specification of a null haplotype, results can be erroneous if such assumptions are violated. In this paper, we present a haplotype association method based on Kullback-Leibler divergence (hapKL) for case-control samples. The idea is to compare haplotype frequencies for the cases versus the controls by computing symmetrical divergence measures. An important property of such measures is that both the frequencies and logarithms of the frequencies contribute in parallel, thus balancing the contributions from rare and common, and accommodating both deleterious and protective, haplotypes. A simulation study under various scenarios shows that hapKL has well-controlled type I error rates and good power compared with existing data mining methods. Application of hapKL to age-related macular degeneration (AMD) shows a strong association of the complement factor H (CFH) gene with AMD, identifying several individual rare haplotypes with strong signals.
Collapse
|
13
|
Wang M, Lin S. Detecting associations of rare variants with common diseases: collapsing or haplotyping? Brief Bioinform 2015; 16:759-68. [PMID: 25596401 DOI: 10.1093/bib/bbu050] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2014] [Indexed: 01/11/2023] Open
Abstract
In recent years, a myriad of new statistical methods have been proposed for detecting associations of rare single-nucleotide variants (SNVs) with common diseases. These methods can be generally classified as 'collapsing' or 'haplotyping' based. The former is the predominant class, composed of most of the rare variant association methods proposed to date. However, recent works have suggested that haplotyping-based methods may offer advantages and can even be more powerful than collapsing methods in certain situations. In this article, we review and compare collapsing- versus haplotyping-based methods/software in terms of both power and type I error. For collapsing methods, we consider three approaches: Combined Multivariate and Collapsing, Sequence Kernel Association Test and Family-Based Association Test (FBAT): the first two are population based and are among the most popular; the last test is family based, a modification from the popular FBAT to accommodate rare SNVs. For haplotyping-based methods, we include Logistic Bayesian Lasso (LBL) for population data and family-based LBL (famLBL) for family (trio) data. These two methods are selected, as they can be used to test association for specific rare and common haplotypes. Our results show that haplotype methods can be more powerful than collapsing methods if there are interacting SNVs leading to larger haplotype effects. Even if only common SNVs are genotyped, haplotype methods can still detect specific rare haplotypes that tag rare causal SNVs. As expected, family-based methods are robust, whereas population-based methods are susceptible, to population substructure. However, the population-based haplotype approach appears to have smaller inflation of type I error than its collapsing counterparts.
Collapse
|