1
|
Yuan X, Biswas S. Detecting rare haplotype association with two correlated phenotypes of binary and continuous types. Stat Med 2021; 40:1877-1900. [PMID: 33438281 DOI: 10.1002/sim.8877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Revised: 11/18/2020] [Accepted: 12/25/2020] [Indexed: 11/10/2022]
Abstract
Multiple correlated traits/phenotypes are often collected in genetic association studies and they may share a common genetic mechanism. Joint analysis of correlated phenotypes has well-known advantages over one-at-a-time analysis including gain in power and better understanding of genetic etiology. However, when the phenotypes are of discordant types such as binary and continuous, the joint modeling is more challenging. Another research area of current interest is discovery of rare genetic variants. Currently there is no method available for detecting association of rare (or common) haplotypes with multiple discordant phenotypes jointly. Our goal is to fill this gap specifically for two discordant phenotypes. We consider a rare haplotype association method for a binary phenotype, logistic Bayesian LASSO (univariate LBL) and its extension for two correlated binary phenotypes (bivariate LBL-2B). Under this framework, we propose a haplotype association test with binary and continuous phenotypes jointly (bivariate LBL-BC). Specifically, we use a latent variable to induce correlation between the two phenotypes. We carry out extensive simulations to investigate bivariate LBL-BC and compare it with univariate LBL and bivariate LBL-2B. In most settings, bivariate LBL-BC performs the best. In only two situations, bivariate LBL-BC has similar performance-when the two phenotypes are (1) weakly or not correlated and the target haplotype affects the binary phenotype only and (2) strongly positively correlated and the target haplotype affects both phenotypes in positive direction. Finally, we apply the method to a data set on lung cancer and nicotine dependence and detect several haplotypes including a rare one.
Collapse
Affiliation(s)
- Xiaochen Yuan
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA
| |
Collapse
|
2
|
Zhang Y, Hofmann JN, Purdue MP, Lin S, Biswas S. Logistic Bayesian LASSO for genetic association analysis of data from complex sampling designs. J Hum Genet 2017; 62:819-829. [PMID: 28424482 PMCID: PMC5572548 DOI: 10.1038/jhg.2017.43] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Revised: 03/21/2017] [Accepted: 03/22/2017] [Indexed: 01/20/2023]
Abstract
Detecting gene-environment interactions with rare variants is critical in dissecting the etiology of common diseases. Interactions with rare haplotype variants (rHTVs) are of particular interest. At the same time, complex sampling designs, such as stratified random sampling, are becoming increasingly popular for designing case-control studies, especially for recruiting controls. The US Kidney Cancer Study (KCS) is an example, wherein all available cases were included while the controls at each site were randomly selected from the population by frequency matching with cases based on age, sex and race. There is currently no rHTV association method that can account for such a complex sampling design. To fill this gap, we consider logistic Bayesian LASSO (LBL), an existing rHTV approach for case-control data, and show that its model can easily accommodate the complex sampling design. We study two extensions that include stratifying variables either as main effects only or with additional modeling of their interactions with haplotypes. We conduct extensive simulation studies to compare the complex sampling methods with the original LBL methods. We find that, when there is no interaction between haplotype and stratifying variables, both extensions perform well while the original LBL methods lead to inflated type I error rates. However, when such an interaction exists, it is necessary to include the interaction effect in the model to control the type I error rate. Finally, we analyze the KCS data and find a significant interaction between (current) smoking and a specific rHTV in the N-acetyltransferase 2 gene.
Collapse
Affiliation(s)
- Yuan Zhang
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX, USA
| | - Jonathan N Hofmann
- Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Mark P Purdue
- Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Shili Lin
- Department of Statistics, The Ohio State University, Columbus, OH, USA
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX, USA
| |
Collapse
|
3
|
Datta AS, Zhang Y, Zhang L, Biswas S. Association of rare haplotypes on ULK4 and MAP4 genes with hypertension. BMC Proc 2016; 10:363-369. [PMID: 27980663 PMCID: PMC5133474 DOI: 10.1186/s12919-016-0057-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Several variants have been implicated earlier on ULK4 and MAP4 genes on chromosome 3 to be associated with hypertension. As a natural follow-up step, we explore association of haplotypes in those genes. We consider the Genetic Analysis Workshop 19 real data on unrelated individuals and analyze haplotype blocks of 5 single-nucleotide polymorphisms through a sliding window approach. We apply 4 haplotype association methods-haplo.score, haplo.glm, hapassoc, and logistic Bayesian LASSO (LBL)-and for comparison, sequence kernel association test (SKAT) and its variants. We find several rare haplotype blocks to be associated. To get an idea about the false-positive proportions, we also analyzed the data after permuting the case-control status of individuals. We found that LBL, unlike the other methods, maintains low false-positive rates in presence of rare haplotypes. Thus, we conclude that the haplotypes found to be associated by LBL are more likely to be true positive. SKAT and its variants did not find significance on either gene.
Collapse
Affiliation(s)
- Ananda S. Datta
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX USA
| | - Yuan Zhang
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX USA
| | - Lei Zhang
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX USA
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX USA
| |
Collapse
|
4
|
Vujkovic M, Aplenc R, Alonzo TA, Gamis AS, Li Y. Comparing Analytic Methods for Longitudinal GWAS and a Case-Study Evaluating Chemotherapy Course Length in Pediatric AML. A Report from the Children's Oncology Group. Front Genet 2016; 7:139. [PMID: 27547214 PMCID: PMC4974249 DOI: 10.3389/fgene.2016.00139] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Accepted: 07/19/2016] [Indexed: 12/11/2022] Open
Abstract
Regression analysis is commonly used in genome-wide association studies (GWAS) to test genotype-phenotype associations but restricts the phenotype to a single observation for each individual. There is an increasing need for analytic methods for longitudinally collected phenotype data. Several methods have been proposed to perform longitudinal GWAS for family-based studies but few methods are described for unrelated populations. We compared the performance of three statistical approaches for longitudinal GWAS in unrelated subjectes: (1) principal component-based generalized estimating equations (PC-GEE); (2) principal component-based linear mixed effects model (PC-LMEM); (3) kinship coefficient matrix-based linear mixed effects model (KIN-LMEM), in a study of single-nucleotide polymorphisms (SNPs) on the duration of 4 courses of chemotherapy in 624 unrelated children with de novo acute myeloid leukemia (AML) genotyped on the Illumina 2.5 M OmniQuad from the COG studies AAML0531 and AAML1031. In this study we observed an exaggerated type I error with PC-GEE in SNPs with minor allele frequencies < 0.05, wheras KIN-LMEM produces more than expected type II errors. PC-MEM showed balanced type I and type II errors for the observed vs. expected P-values in comparison to competing approaches. In general, a strong concordance was observed between the P-values with the different approaches, in particular among P < 0.01 where the between-method AUCs exceed 99%. PC-LMEM accounts for genetic relatedness and correlations among repeated phenotype measures, shows minimal genome-wide inflation of type I errors, and yields high power. We therefore recommend PC-LMEM as a robust analytic approach for GWAS of longitudinal data in unrelated populations.
Collapse
Affiliation(s)
- Marijana Vujkovic
- Division of Oncology, Children's Hospital of Philadelphia Philadelphia, PA, USA
| | - Richard Aplenc
- Division of Oncology, Children's Hospital of Philadelphia Philadelphia, PA, USA
| | - Todd A Alonzo
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California Los Angeles, CA, USA
| | - Alan S Gamis
- Division of Hematology, Oncology Bone Marrow Transplantation, Children's Mercy Hospitals and Clinics Kansas City, MO, USA
| | - Yimei Li
- Division of Oncology, Children's Hospital of Philadelphia Philadelphia, PA, USA
| |
Collapse
|
5
|
Zhang Y, Lin S, Biswas S. Detecting rare and common haplotype-environment interaction under uncertainty of gene-environment independence assumption. Biometrics 2016; 73:344-355. [PMID: 27478935 DOI: 10.1111/biom.12567] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2015] [Revised: 05/01/2016] [Accepted: 06/01/2016] [Indexed: 11/28/2022]
Abstract
Finding rare variants and gene-environment interactions (GXE) is critical in dissecting complex diseases. We consider the problem of detecting GXE where G is a rare haplotype and E is a nongenetic factor. Such methods typically assume G-E independence, which may not hold in many applications. A pertinent example is lung cancer-there is evidence that variants on Chromosome 15q25.1 interact with smoking to affect the risk. However, these variants are associated with smoking behavior rendering the assumption of G-E independence inappropriate. With the motivation of detecting GXE under G-E dependence, we extend an existing approach, logistic Bayesian LASSO, which assumes G-E independence (LBL-GXE-I) by modeling G-E dependence through a multinomial logistic regression (referred to as LBL-GXE-D). Unlike LBL-GXE-I, LBL-GXE-D controls type I error rates in all situations; however, it has reduced power when G-E independence holds. To control type I error without sacrificing power, we further propose a unified approach, LBL-GXE, to incorporate uncertainty in the G-E independence assumption by employing a reversible jump Markov chain Monte Carlo method. Our simulations show that LBL-GXE has power similar to that of LBL-GXE-I when G-E independence holds, yet has well-controlled type I errors in all situations. To illustrate the utility of LBL-GXE, we analyzed a lung cancer dataset and found several significant interactions in the 15q25.1 region, including one between a specific rare haplotype and smoking.
Collapse
Affiliation(s)
- Yuan Zhang
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas 75080, U.S.A
| | - Shili Lin
- Department of Statistics, The Ohio State University, Columbus, Ohio 43210, U.S.A
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas 75080, U.S.A
| |
Collapse
|
6
|
Beyene J, Hamid JS. Longitudinal data analysis in genome-wide association studies. Genet Epidemiol 2014; 38 Suppl 1:S68-73. [PMID: 25112192 DOI: 10.1002/gepi.21828] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Genome-wide association studies have led to the discovery of thousands of susceptibility genetic variants (typically single-nucleotide polymorphisms [SNPs]) for a wide range of complex diseases and traits commonly measured at a single point in time. Although many novel genotype-phenotype associations have been identified and successfully replicated using cross-sectionally measured phenotypes, there is growing interest in the study of longitudinally measured phenotypes because these allow for the study of the natural trajectory of traits and disease progression. However, there are several challenges with analysis and interpretation of longitudinal data. Here, we summarize the methods and strategies proposed and applied in genome-wide association studies of blood pressure related phenotypes made available through Genetic Analysis Workshop 18 (GAW18). The investigators considered methods that incorporated correlation across time points and familial relatedness among the individuals into their studies and compared their approaches with single-time-point analysis using baseline data. Some of the studies used unrelated individuals; some also used the simulated data provided by the GAW18 organizers to assess type I error and power of their approach in detecting true associations.
Collapse
Affiliation(s)
- Joseph Beyene
- Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, ON, Canada; Child Health Evaluative Sciences, The Hospital for Sick Children Research Institute, Toronto, ON, Canada
| | | |
Collapse
|
7
|
Biswas S, Xia S, Lin S. Detecting rare haplotype-environment interaction with logistic Bayesian LASSO. Genet Epidemiol 2013; 38:31-41. [PMID: 24272913 DOI: 10.1002/gepi.21773] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2013] [Revised: 09/13/2013] [Accepted: 10/15/2013] [Indexed: 11/09/2022]
Abstract
Two important contributors to missing heritability are believed to be rare variants and gene-environment interaction (GXE). Thus, detecting GXE where G is a rare haplotype variant (rHTV) is a pressing problem. Haplotype analysis is usually the natural second step to follow up on a genomic region that is implicated to be associated through single nucleotide variants (SNV) analysis. Further, rHTV can tag associated rare SNV and provide greater power to detect them than popular collapsing methods. Recently we proposed Logistic Bayesian LASSO (LBL) for detecting rHTV association with case-control data. LBL shrinks the unassociated (especially common) haplotypes toward zero so that an associated rHTV can be identified with greater power. Here, we incorporate environmental factors and their interactions with haplotypes in LBL. As LBL is based on retrospective likelihood, this extension is not trivial. We model the joint distribution of haplotypes and covariates given the case-control status. We apply the approach (LBL-GXE) to the Michigan, Mayo, AREDS, Pennsylvania Cohort Study on Age-related Macular Degeneration (AMD). LBL-GXE detects interaction of a specific rHTV in CFH gene with smoking. To the best of our knowledge, this is the first time in the AMD literature that an interaction of smoking with a specific (rather than pooled) rHTV has been implicated. We also carry out simulations and find that LBL-GXE has reasonably good powers for detecting interactions with rHTV while keeping the type I error rates well controlled. Thus, we conclude that LBL-GXE is a useful tool for uncovering missing heritability.
Collapse
Affiliation(s)
- Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, United States of America
| | | | | |
Collapse
|