1
|
Shen L, Amei A, Liu B, Xu G, Liu Y, Oh EC, Zhou X, Wang Z. Marginal interaction test for detecting interactions between genetic marker sets and environment in genome-wide studies. G3 (BETHESDA, MD.) 2025; 15:jkae263. [PMID: 39538414 PMCID: PMC11708225 DOI: 10.1093/g3journal/jkae263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2024] [Accepted: 11/04/2024] [Indexed: 11/16/2024]
Abstract
As human complex diseases are influenced by the interaction between genetics and the environment, identifying gene-environment interactions (G×E) is crucial for understanding disease mechanisms and predicting risk. Developing robust quantitative tools for G×E analysis can enhance the study of complex diseases. However, many existing methods that explore G×E focus on the interplay between an environmental factor and genetic variants, exclusively for common or rare variants. In this study, we developed MAGEIT_RAN and MAGEIT_FIX to identify interactions between an environmental factor and a set of genetic markers, including both rare and common variants, based on the MinQue for Summary statistics. The genetic main effects in MAGEIT_RAN and MAGEIT_FIX are modeled as random and fixed effects, respectively. Simulation studies showed that both tests had type I error under control, with MAGEIT_RAN being the most powerful test. Applying MAGEIT to a genome-wide analysis of gene-alcohol interactions on hypertension and seated systolic blood pressure in the Multiethnic Study of Atherosclerosis revealed genes like EIF2AK2, CCNDBP1, and EPB42 influencing blood pressure through alcohol interaction. Pathway analysis identified 1 apoptosis and survival pathway involving PKR and 2 signal transduction pathways associated with hypertension and alcohol intake, demonstrating MAGEIT_RAN's ability to detect biologically relevant gene-environment interactions.
Collapse
Affiliation(s)
- Linchuan Shen
- Department of Mathematical Sciences, University of Nevada, Las Vegas, Las Vegas, NV 89154, USA
| | - Amei Amei
- Department of Mathematical Sciences, University of Nevada, Las Vegas, Las Vegas, NV 89154, USA
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, Las Vegas, NV 89154, USA
| | - Bowen Liu
- Department of Mathematical Sciences, University of Nevada, Las Vegas, Las Vegas, NV 89154, USA
- Division of Computing, Analysis, and Mathematics, University of Missouri, Kansas City, MO 64108, USA
| | - Gang Xu
- Department of Mathematical Sciences, University of Nevada, Las Vegas, Las Vegas, NV 89154, USA
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT 06510, USA
| | - Yunqing Liu
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT 06510, USA
| | - Edwin C Oh
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, Las Vegas, NV 89154, USA
- Department of Internal Medicine, University of Nevada School of Medicine, Las Vegas, NV 89154, USA
| | - Xin Zhou
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT 06510, USA
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT 06510, USA
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT 06510, USA
| |
Collapse
|
2
|
Herrera-Luis E, Benke K, Volk H, Ladd-Acosta C, Wojcik GL. Gene-environment interactions in human health. Nat Rev Genet 2024; 25:768-784. [PMID: 38806721 DOI: 10.1038/s41576-024-00731-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/03/2024] [Indexed: 05/30/2024]
Abstract
Gene-environment interactions (G × E), the interplay of genetic variation with environmental factors, have a pivotal impact on human complex traits and diseases. Statistically, G × E can be assessed by determining the deviation from expectation of predictive models based solely on the phenotypic effects of genetics or environmental exposures. Despite the unprecedented, widespread and diverse use of G × E analytical frameworks, heterogeneity in their application and reporting hinders their applicability in public health. In this Review, we discuss study design considerations as well as G × E analytical frameworks to assess polygenic liability dependent on the environment, to identify specific genetic variants exhibiting G × E, and to characterize environmental context for these dynamics. We conclude with recommendations to address the most common challenges and pitfalls in the conceptualization, methodology and reporting of G × E studies, as well as future directions.
Collapse
Affiliation(s)
- Esther Herrera-Luis
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Kelly Benke
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Heather Volk
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Christine Ladd-Acosta
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Genevieve L Wojcik
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
| |
Collapse
|
3
|
Jin X, Shi G. Cauchy combination methods for the detection of gene-environment interactions for rare variants related to quantitative phenotypes. Heredity (Edinb) 2023; 131:241-252. [PMID: 37481617 PMCID: PMC10539363 DOI: 10.1038/s41437-023-00640-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 07/09/2023] [Accepted: 07/12/2023] [Indexed: 07/24/2023] Open
Abstract
The characterization of gene-environment interactions (GEIs) can provide detailed insights into the biological mechanisms underlying complex diseases. Despite recent interest in GEIs for rare variants, published GEI tests are underpowered for an extremely small proportion of causal rare variants in a gene or a region. By extending the aggregated Cauchy association test (ACAT), we propose three GEI tests to address this issue: a Cauchy combination GEI test with fixed main effects (CCGEI-F), a Cauchy combination GEI test with random main effects (CCGEI-R), and an omnibus Cauchy combination GEI test (CCGEI-O). ACAT was applied to combine p values of single-variant GEI analyses to obtain CCGEI-F and CCGEI-R and p values of multiple GEI tests were combined in CCGEI-O. Through numerical simulations, for small numbers of causal variants, CCGEI-F, CCGEI-R and CCGEI-O provided approximately 5% higher power than the existing GEI tests INT-FIX and INT-RAN; however, they had slightly higher power than the existing GEI test TOW-GE. For large numbers of causal variants, although CCGEI-F and CCGEI-R exhibited comparable or slightly lower power values than the competing tests, the results were still satisfactory. Among all simulation conditions evaluated, CCGEI-O provided significantly higher power than that of competing GEI tests. We further applied our GEI tests in genome-wide analyses of systolic blood pressure or diastolic blood pressure to detect gene-body mass index (BMI) interactions, using whole-exome sequencing data from UK Biobank. At a suggestive significance level of 1.0 × 10-4, KCNC4, GAR1, FAM120AOS and NT5C3B showed interactions with BMI by our GEI tests.
Collapse
Affiliation(s)
- Xiaoqin Jin
- State Key Laboratory of Integrated Services Networks, Xidian University, 2 South Taibai Road, Xi'an, Shaanxi, 710071, China.
| | - Gang Shi
- State Key Laboratory of Integrated Services Networks, Xidian University, 2 South Taibai Road, Xi'an, Shaanxi, 710071, China
| |
Collapse
|
4
|
Shen L, Amei A, Liu B, Liu Y, Xu G, Oh EC, Wang Z. Detection of interactions between genetic marker sets and environment in a genome-wide study of hypertension. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.28.542666. [PMID: 37398075 PMCID: PMC10312472 DOI: 10.1101/2023.05.28.542666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
As human complex diseases are influenced by the interplay of genes and environment, detecting gene-environment interactions ( G × E ) can shed light on biological mechanisms of diseases and play an important role in disease risk prediction. Development of powerful quantitative tools to incorporate G × E in complex diseases has potential to facilitate the accurate curation and analysis of large genetic epidemiological studies. However, most of existing methods that interrogate G × E focus on the interaction effects of an environmental factor and genetic variants, exclusively for common or rare variants. In this study, we proposed two tests, MAGEIT_RAN and MAGEIT_FIX, to detect interaction effects of an environmental factor and a set of genetic markers containing both rare and common variants, based on the MinQue for Summary statistics. The genetic main effects in MAGEIT_RAN and MAGEIT_FIX are modeled as random or fixed, respectively. Through simulation studies, we illustrated that both tests had type I error under control and MAGEIT_RAN was overall the most powerful test. We applied MAGEIT to a genome-wide analysis of gene-alcohol interactions on hypertension in the Multi-Ethnic Study of Atherosclerosis. We detected two genes, CCNDBP1 and EPB42, that interact with alcohol usage to influence blood pressure. Pathway analysis identified sixteen significant pathways related to signal transduction and development that were associated with hypertension, and several of them were reported to have an interactive effect with alcohol intake. Our results demonstrated that MAGEIT can detect biologically relevant genes that interact with environmental factors to influence complex traits.
Collapse
Affiliation(s)
- Linchuan Shen
- Department of Mathematical Sciences, University of Nevada, Las Vegas
| | - Amei Amei
- Department of Mathematical Sciences, University of Nevada, Las Vegas
| | - Bowen Liu
- Department of Mathematical Sciences, University of Nevada, Las Vegas
| | - Yunqing Liu
- Department of Biostatistics, Yale School of Public Health
| | - Gang Xu
- Department of Mathematical Sciences, University of Nevada, Las Vegas
- Department of Biostatistics, Yale School of Public Health
| | - Edwin C. Oh
- Department of Internal Medicine, University of Nevada School of Medicine, Las Vegas
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health
| |
Collapse
|
5
|
Zhang J, Liang X, Gonzales S, Liu J, Gao XR, Wang X. A gene based combination test using GWAS summary data. BMC Bioinformatics 2023; 24:2. [PMID: 36597047 PMCID: PMC9811798 DOI: 10.1186/s12859-022-05114-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 12/13/2022] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Gene-based association tests provide a useful alternative and complement to the usual single marker association tests, especially in genome-wide association studies (GWAS). The way of weighting for variants in a gene plays an important role in boosting the power of a gene-based association test. Appropriate weights can boost statistical power, especially when detecting genetic variants with weak effects on a trait. One major limitation of existing gene-based association tests lies in using weights that are predetermined biologically or empirically. This limitation often attenuates the power of a test. On another hand, effect sizes or directions of causal genetic variants in real data are usually unknown, driving a need for a flexible yet robust methodology of gene based association tests. Furthermore, access to individual-level data is often limited, while thousands of GWAS summary data are publicly and freely available. RESULTS To resolve these limitations, we propose a combination test named as OWC which is based on summary statistics from GWAS data. Several traditional methods including burden test, weighted sum of squared score test [SSU], weighted sum statistic [WSS], SNP-set Kernel Association Test [SKAT], and the score test are special cases of OWC. To evaluate the performance of OWC, we perform extensive simulation studies. Results of simulation studies demonstrate that OWC outperforms several existing popular methods. We further show that OWC outperforms comparison methods in real-world data analyses using schizophrenia GWAS summary data and a fasting glucose GWAS meta-analysis data. The proposed method is implemented in an R package available at https://github.com/Xuexia-Wang/OWC-R-package CONCLUSIONS: We propose a novel gene-based association test that incorporates four different weighting schemes (two constant weights and two weights proportional to normal statistic Z) and includes several popular methods as its special cases. Results of the simulation studies and real data analyses illustrate that the proposed test, OWC, outperforms comparable methods in most scenarios. These results demonstrate that OWC is a useful tool that adapts to the underlying biological model for a disease by weighting appropriately genetic variants and combination of well-known gene-based tests.
Collapse
Affiliation(s)
- Jianjun Zhang
- grid.266869.50000 0001 1008 957XDepartment of Mathematics, University of North Texas, 225 Avenue E, Denton, TX 76201 USA
| | - Xiaoyu Liang
- grid.17088.360000 0001 2150 1785Department of Epidemiology and Biostatistics, Michigan State University, 909 Wilson Rd Room B601, East Lansing, MI 48824 USA
| | - Samantha Gonzales
- grid.266869.50000 0001 1008 957XDepartment of Mathematics, University of North Texas, 225 Avenue E, Denton, TX 76201 USA
| | - Jianguo Liu
- grid.266869.50000 0001 1008 957XDepartment of Mathematics, University of North Texas, 225 Avenue E, Denton, TX 76201 USA
| | - Xiaoyi Raymond Gao
- grid.261331.40000 0001 2285 7943Department of Ophthalmology and Visual Science, Department of Biomedical informatics, Division of Human Genetics, Ohio State University, 915 Olentangy River Road, Columbus, OH 43212 USA
| | - Xuexia Wang
- grid.65456.340000 0001 2110 1845Department of Biostatistics, Robert Stempel College of Public Health and Social Work, Florida International University, 11200 SW 8th street, Miami, FL 33174 USA
| |
Collapse
|
6
|
Hecker J, Prokopenko D, Moll M, Lee S, Kim W, Qiao D, Voorhies K, Kim W, Vansteelandt S, Hobbs BD, Cho MH, Silverman EK, Lutz SM, DeMeo DL, Weiss ST, Lange C. A robust and adaptive framework for interaction testing in quantitative traits between multiple genetic loci and exposure variables. PLoS Genet 2022; 18:e1010464. [PMID: 36383614 PMCID: PMC9668174 DOI: 10.1371/journal.pgen.1010464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 10/04/2022] [Indexed: 11/17/2022] Open
Abstract
The identification and understanding of gene-environment interactions can provide insights into the pathways and mechanisms underlying complex diseases. However, testing for gene-environment interaction remains a challenge since a.) statistical power is often limited and b.) modeling of environmental effects is nontrivial and such model misspecifications can lead to false positive interaction findings. To address the lack of statistical power, recent methods aim to identify interactions on an aggregated level using, for example, polygenic risk scores. While this strategy can increase the power to detect interactions, identifying contributing genes and pathways is difficult based on these relatively global results. Here, we propose RITSS (Robust Interaction Testing using Sample Splitting), a gene-environment interaction testing framework for quantitative traits that is based on sample splitting and robust test statistics. RITSS can incorporate sets of genetic variants and/or multiple environmental factors. Based on the user's choice of statistical/machine learning approaches, a screening step selects and combines potential interactions into scores with improved interpretability. In the testing step, the application of robust statistics minimizes the susceptibility to main effect misspecifications. Using extensive simulation studies, we demonstrate that RITSS controls the type 1 error rate in a wide range of scenarios, and we show how the screening strategy influences statistical power. In an application to lung function phenotypes and human height in the UK Biobank, RITSS identified highly significant interactions based on subcomponents of genetic risk scores. While the contributing single variant interaction signals are weak, our results indicate interaction patterns that result in strong aggregated effects, providing potential insights into underlying gene-environment interaction mechanisms.
Collapse
Affiliation(s)
- Julian Hecker
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
| | - Dmitry Prokopenko
- Harvard Medical School, Boston, Massachusetts, United States of America
- Genetics and Aging Unit and McCance Center for Brain Health, Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | - Matthew Moll
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
| | - Sanghun Lee
- Department of Medical Consilience, Division of Medicine, Graduate School, Dankook University, Yongin, South Korea
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Wonji Kim
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
| | - Dandi Qiao
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
| | - Kirsten Voorhies
- Harvard Medical School, Boston, Massachusetts, United States of America
- Department of Population Medicine, PRecisiOn Medicine Translational Research (PROMoTeR) Center, Harvard Pilgrim Health Care, Boston, Massachusetts, United States of America
| | - Woori Kim
- Harvard Medical School, Boston, Massachusetts, United States of America
- Systems Biology and Computer Science Program, Ann Romney Center for Neurological Diseases, Department of Neurology, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Stijn Vansteelandt
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Brian D. Hobbs
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
| | - Michael H. Cho
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
| | - Edwin K. Silverman
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
| | - Sharon M. Lutz
- Harvard Medical School, Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Department of Population Medicine, PRecisiOn Medicine Translational Research (PROMoTeR) Center, Harvard Pilgrim Health Care, Boston, Massachusetts, United States of America
| | - Dawn L. DeMeo
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
| | - Scott T. Weiss
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
| | - Christoph Lange
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| |
Collapse
|
7
|
Chi JT, Ipsen ICF, Hsiao TH, Lin CH, Wang LS, Lee WP, Lu TP, Tzeng JY. SEAGLE: A Scalable Exact Algorithm for Large-Scale Set-Based Gene-Environment Interaction Tests in Biobank Data. Front Genet 2021; 12:710055. [PMID: 34795690 PMCID: PMC8593472 DOI: 10.3389/fgene.2021.710055] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Accepted: 09/13/2021] [Indexed: 11/13/2022] Open
Abstract
The explosion of biobank data offers unprecedented opportunities for gene-environment interaction (GxE) studies of complex diseases because of the large sample sizes and the rich collection in genetic and non-genetic information. However, the extremely large sample size also introduces new computational challenges in G×E assessment, especially for set-based G×E variance component (VC) tests, which are a widely used strategy to boost overall G×E signals and to evaluate the joint G×E effect of multiple variants from a biologically meaningful unit (e.g., gene). In this work, we focus on continuous traits and present SEAGLE, a Scalable Exact AlGorithm for Large-scale set-based G×E tests, to permit G×E VC tests for biobank-scale data. SEAGLE employs modern matrix computations to calculate the test statistic and p-value of the GxE VC test in a computationally efficient fashion, without imposing additional assumptions or relying on approximations. SEAGLE can easily accommodate sample sizes in the order of 105, is implementable on standard laptops, and does not require specialized computing equipment. We demonstrate the performance of SEAGLE using extensive simulations. We illustrate its utility by conducting genome-wide gene-based G×E analysis on the Taiwan Biobank data to explore the interaction of gene and physical activity status on body mass index.
Collapse
Affiliation(s)
- Jocelyn T. Chi
- Department of Statistics, North Carolina State University, Raleigh, NC, United States
| | - Ilse C. F. Ipsen
- Department of Mathematics, North Carolina State University, Raleigh, NC, United States
| | - Tzu-Hung Hsiao
- Department of Medical Research, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Ching-Heng Lin
- Department of Medical Research, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Li-San Wang
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Wan-Ping Lee
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Tzu-Pin Lu
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| | - Jung-Ying Tzeng
- Department of Statistics, North Carolina State University, Raleigh, NC, United States
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
- Department of Statistics, National Cheng-Kung University, Tainan, Taiwan
| |
Collapse
|
8
|
Jin X, Shi G. Variance-component-based meta-analysis of gene-environment interactions for rare variants. G3-GENES GENOMES GENETICS 2021; 11:6298593. [PMID: 34544119 PMCID: PMC8661424 DOI: 10.1093/g3journal/jkab203] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 06/07/2021] [Indexed: 11/13/2022]
Abstract
Complex diseases are often caused by interplay between genetic and environmental factors. Existing gene-environment interaction (G × E) tests for rare variants largely focus on detecting gene-based G × E effects in a single study; thus, their statistical power is limited by the sample size of the study. Meta-analysis methods that synthesize summary statistics of G × E effects from multiple studies for rare variants are still limited. Based on variance component models, we propose four meta-analysis methods of testing G × E effects for rare variants: HOM-INT-FIX, HET-INT-FIX, HOM-INT-RAN, and HET-INT-RAN. Our methods consider homogeneous or heterogeneous G × E effects across studies and treat the main genetic effect as either fixed or random. Through simulations, we show that the empirical distributions of the four meta-statistics under the null hypothesis align with their expected theoretical distributions. When the interaction effect is homogeneous across studies, HOM-INT-FIX and HOM-INT-RAN have as much statistical power as a pooled analysis conducted on a single interaction test with individual-level data from all studies. When the interaction effect is heterogeneous across studies, HET-INT-FIX and HET-INT-RAN provide higher power than pooled analysis. Our methods are further validated via testing 12 candidate gene-age interactions in blood pressure traits using whole-exome sequencing data from UK Biobank.
Collapse
Affiliation(s)
- Xiaoqin Jin
- State Key Laboratory of Integrated Services Networks, Xidian University, Xi'an 710071, China
| | - Gang Shi
- State Key Laboratory of Integrated Services Networks, Xidian University, Xi'an 710071, China
| |
Collapse
|
9
|
Uncovering Evidence for Endocrine-Disrupting Chemicals That Elicit Differential Susceptibility through Gene-Environment Interactions. TOXICS 2021; 9:toxics9040077. [PMID: 33917455 PMCID: PMC8067468 DOI: 10.3390/toxics9040077] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 03/27/2021] [Accepted: 04/02/2021] [Indexed: 12/17/2022]
Abstract
Exposure to endocrine-disrupting chemicals (EDCs) is linked to myriad disorders, characterized by the disruption of the complex endocrine signaling pathways that govern development, physiology, and even behavior across the entire body. The mechanisms of endocrine disruption involve a complex system of pathways that communicate across the body to stimulate specific receptors that bind DNA and regulate the expression of a suite of genes. These mechanisms, including gene regulation, DNA binding, and protein binding, can be tied to differences in individual susceptibility across a genetically diverse population. In this review, we posit that EDCs causing such differential responses may be identified by looking for a signal of population variability after exposure. We begin by summarizing how the biology of EDCs has implications for genetically diverse populations. We then describe how gene-environment interactions (GxE) across the complex pathways of endocrine signaling could lead to differences in susceptibility. We survey examples in the literature of individual susceptibility differences to EDCs, pointing to a need for research in this area, especially regarding the exceedingly complex thyroid pathway. Following a discussion of experimental designs to better identify and study GxE across EDCs, we present a case study of a high-throughput screening signal of putative GxE within known endocrine disruptors. We conclude with a call for further, deeper analysis of the EDCs, particularly the thyroid disruptors, to identify if these chemicals participate in GxE leading to differences in susceptibility.
Collapse
|
10
|
Shafquat A, Crystal RG, Mezey JG. Identifying novel associations in GWAS by hierarchical Bayesian latent variable detection of differentially misclassified phenotypes. BMC Bioinformatics 2020; 21:178. [PMID: 32381021 PMCID: PMC7204256 DOI: 10.1186/s12859-020-3387-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Accepted: 01/24/2020] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Heterogeneity in the definition and measurement of complex diseases in Genome-Wide Association Studies (GWAS) may lead to misdiagnoses and misclassification errors that can significantly impact discovery of disease loci. While well appreciated, almost all analyses of GWAS data consider reported disease phenotype values as is without accounting for potential misclassification. RESULTS Here, we introduce Phenotype Latent variable Extraction of disease misdiagnosis (PheLEx), a GWAS analysis framework that learns and corrects misclassified phenotypes using structured genotype associations within a dataset. PheLEx consists of a hierarchical Bayesian latent variable model, where inference of differential misclassification is accomplished using filtered genotypes while implementing a full mixed model to account for population structure and genetic relatedness in study populations. Through simulations, we show that the PheLEx framework dramatically improves recovery of the correct disease state when considering realistic allele effect sizes compared to existing methodologies designed for Bayesian recovery of disease phenotypes. We also demonstrate the potential of PheLEx for extracting new potential loci from existing GWAS data by analyzing bipolar disorder and epilepsy phenotypes available from the UK Biobank. From the PheLEx analysis of these data, we identified new candidate disease loci not previously reported for these datasets that have value for supplemental hypothesis generation. CONCLUSION PheLEx shows promise in reanalyzing GWAS datasets to provide supplemental candidate loci that are ignored by traditional GWAS analysis methodologies.
Collapse
Affiliation(s)
- Afrah Shafquat
- Department of Computational Biology, Cornell University, Ithaca, NY USA
| | - Ronald G. Crystal
- Department of Genetic Medicine, Weill Cornell Medicine, New York, NY USA
- Department of Medicine, Weill Cornell Medicine, New York, NY USA
| | - Jason G. Mezey
- Department of Computational Biology, Cornell University, Ithaca, NY USA
- Department of Genetic Medicine, Weill Cornell Medicine, New York, NY USA
| |
Collapse
|
11
|
Lim E, Chen H, Dupuis J, Liu CT. A unified method for rare variant analysis of gene-environment interactions. Stat Med 2020; 39:801-813. [PMID: 31799744 PMCID: PMC7261513 DOI: 10.1002/sim.8446] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 11/19/2019] [Accepted: 11/21/2019] [Indexed: 01/17/2023]
Abstract
Advanced technology in whole-genome sequencing has offered the opportunity to comprehensively investigate the genetic contribution, particularly rare variants, to complex traits. Several region-based tests have been developed to jointly model the marginal effect of rare variants, but methods to detect gene-environment (GE) interactions are underdeveloped. Identifying the modification effects of environmental factors on genetic risk poses a considerable challenge. To tackle this challenge, we develop a method to detect GE interactions for rare variants using generalized linear mixed effect model. The proposed method can accommodate either binary or continuous traits in related or unrelated samples. Under this model, genetic main effects, GE interactions, and sample relatedness are modeled as random effects. We adopt a kernel-based method to leverage the joint information across rare variants and implement variance component score tests to reduce the computational burden. Our simulation studies of continuous and binary traits show that the proposed method maintains correct type I error rates and appropriate power under various scenarios, such as genotype main effects and GE interaction effects in opposite directions and varying the proportion of causal variants in the model. We apply our method in the Framingham Heart Study to test GE interaction of smoking on body mass index or overweight status and replicate the Cholinergic Receptor Nicotinic Beta 4 gene association reported in previous large consortium meta-analysis of single nucleotide polymorphism-smoking interaction. Our proposed set-based GE test is computationally efficient and is applicable to both binary and continuous phenotypes, while appropriately accounting for familial or cryptic relatedness.
Collapse
Affiliation(s)
- Elise Lim
- Department of Biostatistics, Boston University, Boston, Massachusetts
| | - Han Chen
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas
- Center for Precision Health, School of Public Health and School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas
| | - Josée Dupuis
- Department of Biostatistics, Boston University, Boston, Massachusetts
| | - Ching-Ti Liu
- Department of Biostatistics, Boston University, Boston, Massachusetts
| |
Collapse
|
12
|
Zhao Y, Zhu H, Lu Z, Knickmeyer RC, Zou F. Structured Genome-Wide Association Studies with Bayesian Hierarchical Variable Selection. Genetics 2019; 212:397-415. [PMID: 31010934 PMCID: PMC6553832 DOI: 10.1534/genetics.119.301906] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Accepted: 04/08/2019] [Indexed: 02/04/2023] Open
Abstract
It becomes increasingly important in using genome-wide association studies (GWAS) to select important genetic information associated with qualitative or quantitative traits. Currently, the discovery of biological association among SNPs motivates various strategies to construct SNP-sets along the genome and to incorporate such set information into selection procedure for a higher selection power, while facilitating more biologically meaningful results. The aim of this paper is to propose a novel Bayesian framework for hierarchical variable selection at both SNP-set (group) level and SNP (within group) level. We overcome a key limitation of existing posterior updating scheme in most Bayesian variable selection methods by proposing a novel sampling scheme to explicitly accommodate the ultrahigh-dimensionality of genetic data. Specifically, by constructing an auxiliary variable selection model under SNP-set level, the new procedure utilizes the posterior samples of the auxiliary model to subsequently guide the posterior inference for the targeted hierarchical selection model. We apply the proposed method to a variety of simulation studies and show that our method is computationally efficient and achieves substantially better performance than competing approaches in both SNP-set and SNP selection. Applying the method to the Alzheimers Disease Neuroimaging Initiative (ADNI) data, we identify biologically meaningful genetic factors under several neuroimaging volumetric phenotypes. Our method is general and readily to be applied to a wide range of biomedical studies.
Collapse
Affiliation(s)
- Yize Zhao
- Department of Healthcare Policy and Research, Cornell University Weill Cornell, New York, New York 10065
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599
| | - Zhaohua Lu
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, Tennessee 38105
| | - Rebecca C Knickmeyer
- Department of Pediatrics and Human Development, Michigan State University, East Lansing, Michigan 48824
| | - Fei Zou
- Department of Biostatistics, University of Florida, Gainesville, Florida 32611
| |
Collapse
|
13
|
Oxytocin Receptor Gene (OXTR) and Deviant Peer Affiliation: A Gene-Environment Interaction in Adolescent Antisocial Behavior. J Youth Adolesc 2018; 48:86-101. [PMID: 30315439 DOI: 10.1007/s10964-018-0939-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Accepted: 09/24/2018] [Indexed: 12/24/2022]
Abstract
Although the oxytocin receptor gene (OXTR) is involved in aggression and social affiliation, it has not been examined in gene-environment interaction studies. This longitudinal study examined the effect of genetic variants in OXTR and its gene-environment interaction with perceived deviant peer affiliation in the trajectories of antisocial behavior in 323 adolescents (182 males) from 13 to 18 years. Annual assessments of reactive and proactive aggression, delinquency, and friends' delinquency, as well as DNA at age 17 were collected. Gene-based tests yielded no main effect of OXTR, but revealed a significant gene-environment interaction in proactive aggression and delinquency. Variation in the OXTR might affect the influence of deviant peer affiliation on antisocial behavior, contributing to a better understanding of individual differences in antisocial behavior.
Collapse
|
14
|
Calafato MS, Thygesen JH, Ranlund S, Zartaloudi E, Cahn W, Crespo-Facorro B, Díez-Revuelta Á, Di Forti M, Hall MH, Iyegbe C, Jablensky A, Kahn R, Kalaydjieva L, Kravariti E, Lin K, McDonald C, McIntosh AM, McQuillin A, Picchioni M, Rujescu D, Shaikh M, Toulopoulou T, Os JV, Vassos E, Walshe M, Powell J, Lewis CM, Murray RM, Bramon E. Use of schizophrenia and bipolar disorder polygenic risk scores to identify psychotic disorders. Br J Psychiatry 2018; 213:535-541. [PMID: 30113282 PMCID: PMC6130805 DOI: 10.1192/bjp.2018.89] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
BACKGROUND There is increasing evidence for shared genetic susceptibility between schizophrenia and bipolar disorder. Although genetic variants only convey subtle increases in risk individually, their combination into a polygenic risk score constitutes a strong disease predictor.AimsTo investigate whether schizophrenia and bipolar disorder polygenic risk scores can distinguish people with broadly defined psychosis and their unaffected relatives from controls. METHOD Using the latest Psychiatric Genomics Consortium data, we calculated schizophrenia and bipolar disorder polygenic risk scores for 1168 people with psychosis, 552 unaffected relatives and 1472 controls. RESULTS Patients with broadly defined psychosis had dramatic increases in schizophrenia and bipolar polygenic risk scores, as did their relatives, albeit to a lesser degree. However, the accuracy of predictive models was modest. CONCLUSIONS Although polygenic risk scores are not ready for clinical use, it is hoped that as they are refined they could help towards risk reduction advice and early interventions for psychosis.Declaration of interestR.M.M. has received honoraria for lectures from Janssen, Lundbeck, Lilly, Otsuka and Sunovian.
Collapse
Affiliation(s)
- Maria Stella Calafato
- Division of Psychiatry, University College London, UK,Correspondence: Maria Stella Calafato, Mental Health Neuroscience Research Department, Division of Psychiatry, University College London, 149 Tottenham Court Rd, London W1T 7NF, UK.
| | | | - Siri Ranlund
- Division of Psychiatry, University College London, UK
| | - Eirini Zartaloudi
- Division of Psychiatry, University College London and Institute of Psychiatry, Psychology and Neuroscience at King's College London and South London and Maudsley NHS Foundation Trust, UK
| | - Wiepke Cahn
- Department of Psychiatry, Brain Centre Rudolf Magnus, University Medical Center Utrecht, the Netherlands
| | - Benedicto Crespo-Facorro
- CIBERSAM, Centro Investigación Biomédica en Red Salud Mental, Madrid and Department of Psychiatry, University Hospital Marqués de Valdecilla, School of Medicine, University of Cantabria–IDIVAL, Spain
| | - Álvaro Díez-Revuelta
- Division of Psychiatry, University College London, London, UK and Laboratory of Cognitive and Computational Neuroscience − Centre for Biomedical Technology (CTB), Complutense University and Technical University of Madrid, Spain
| | - Marta Di Forti
- Institute of Psychiatry, Psychology and Neuroscience at King's College London and South London and Maudsley NHS Foundation Trust, UK
| | | | - Mei-Hua Hall
- Psychosis Neurobiology Laboratory, Harvard Medical School, McLean Hospital, USA
| | - Conrad Iyegbe
- Institute of Psychiatry, Psychology and Neuroscience at King's College London and South London and Maudsley NHS Foundation Trust, UK
| | - Assen Jablensky
- Centre for Clinical Research in Neuropsychiatry, The University of Western Australia, Australia
| | - Rene Kahn
- Department of Psychiatry, Brain Centre Rudolf Magnus, University Medical Center Utrecht, the Netherlands
| | - Luba Kalaydjieva
- Harry Perkins Institute of Medical Research and Centre for Medical Research, The University of Western Australia, Australia
| | - Eugenia Kravariti
- Institute of Psychiatry, Psychology and Neuroscience at King's College London and South London and Maudsley NHS Foundation Trust, UK
| | - Kuang Lin
- Institute of Psychiatry, Psychology and Neuroscience, King's College London and South London and Maudsley NHS Foundation Trust and Nuffield Department of Population Health, University of Oxford, UK
| | - Colm McDonald
- The Centre for Neuroimaging & Cognitive Genomics (NICOG) and NCBES Galway Neuroscience Centre, National University of Ireland Galway, Ireland
| | - Andrew M. McIntosh
- Division of Psychiatry, University of Edinburgh, Royal Edinburgh Hospital and Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, UK
| | | | | | - Marco Picchioni
- Institute of Psychiatry, Psychology and Neuroscience, King's College London and South London and Maudsley NHS Foundation Trust, UK
| | - Dan Rujescu
- Department of Psychiatry, Ludwig-Maximilians University of Munich and Department of Psychiatry, Psychotherapy and Psychosomatics, University of Halle Wittenberg, Germany
| | - Madiha Shaikh
- North East London Foundation Trust and Research Department of Clinical, Educational and Health Psychology, University College London, UK
| | - Timothea Toulopoulou
- Institute of Psychiatry, Psychology and Neuroscience, King's College London and South London and Maudsley NHS Foundation Trust, UK and Department of Psychology, Bilkent University, Turkey
| | - Jim Van Os
- Institute of Psychiatry Psychology and Neuroscience, King's College London and South London and Maudsley NHS Foundation Trust, UK and Department of Psychiatry and Psychology, Maastricht University Medical Centre, EURON, the Netherlands
| | - Evangelos Vassos
- Institute of Psychiatry, Psychology and Neuroscience, King's College London and South London and Maudsley NHS Foundation Trust, UK
| | - Muriel Walshe
- Division of Psychiatry, University College London and Institute of Psychiatry, Psychology and Neuroscience, King's College London and South London and Maudsley NHS Foundation Trust, UK
| | - John Powell
- Institute of Psychiatry, Psychology and Neuroscience, King's College London and South London and Maudsley NHS Foundation Trust, UK
| | - Cathryn M. Lewis
- Institute of Psychiatry, Psychology and Neuroscience, King's College London and South London and Maudsley NHS Foundation Trust, UK
| | - Robin M. Murray
- Institute of Psychiatry, Psychology and Neuroscience, King's College London and South London and Maudsley NHS Foundation Trust, UK
| | - Elvira Bramon
- Division of Psychiatry and Institute of Cognitive Neuroscience, University College London and Institute of Psychiatry, Psychology and Neuroscience, King's College London and South London and Maudsley NHS Foundation Trust, UK
| | | |
Collapse
|
15
|
Wang J, Liu Q, Pierce BL, Huo D, Olopade OI, Ahsan H, Chen LS. A meta-analysis approach with filtering for identifying gene-level gene-environment interactions. Genet Epidemiol 2018; 42:434-446. [PMID: 29430690 PMCID: PMC6013347 DOI: 10.1002/gepi.22115] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Revised: 12/13/2017] [Accepted: 01/02/2018] [Indexed: 02/02/2023]
Abstract
There is a growing recognition that gene-environment interaction (G × E) plays a pivotal role in the development and progression of complex diseases. Despite a wealth of genetic data on various complex diseases/traits generated from association and sequencing studies, detecting G × E via genome-wide analysis remains challenging due to power issues. In genome-wide G × E studies, a common strategy to improve power is to first conduct a filtering test and retain only the genetic variants that pass the filtering step for subsequent G × E analyses. Two-stage, multistage, and unified tests have been proposed to jointly consider the filtering statistics in G × E tests. However, such G × E tests based on data from a single study may still be underpowered. Meanwhile, large-scale consortia have been formed to borrow strength across studies and populations. In this work, motivated by existing single-study G × E tests with filtering and the needs for meta-analysis G × E approaches based on consortia data, we propose a meta-analysis framework for detecting gene-based G × E effects, and introduce meta-analysis-based filtering statistics in the gene-level G × E tests. Simulations demonstrate the advantages of the proposed method-the ofGEM test. We apply the proposed tests to existing data from two breast cancer consortia to identify the genes harboring genetic variants with age-dependent penetrance (i.e., gene-age interaction effects). We develop an R software package ofGEM for the proposed meta-analysis tests.
Collapse
Affiliation(s)
- Jiebiao Wang
- Department of Public Health Sciences, The University of Chicago, Chicago, Illinois, USA
| | | | - Brandon L. Pierce
- Department of Public Health Sciences, The University of Chicago, Chicago, Illinois, USA
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, USA
| | - Dezheng Huo
- Department of Public Health Sciences, The University of Chicago, Chicago, Illinois, USA
- Department of Medicine, The University of Chicago, Chicago, Illinois, USA
| | - Olufunmilayo I. Olopade
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, USA
- Department of Medicine, The University of Chicago, Chicago, Illinois, USA
- Center for Clinical Cancer Genetics & Global Health, The University of Chicago Medical Center, Chicago, Illinois, USA
| | - Habibul Ahsan
- Department of Public Health Sciences, The University of Chicago, Chicago, Illinois, USA
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, USA
- Department of Medicine, The University of Chicago, Chicago, Illinois, USA
| | - Lin S. Chen
- Department of Public Health Sciences, The University of Chicago, Chicago, Illinois, USA
| |
Collapse
|
16
|
Luo Y, Maity A, Wu MC, Smith C, Duan Q, Li Y, Tzeng JY. On the substructure controls in rare variant analysis: Principal components or variance components? Genet Epidemiol 2017; 42:276-287. [PMID: 29280188 DOI: 10.1002/gepi.22102] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2017] [Revised: 10/07/2017] [Accepted: 10/19/2017] [Indexed: 11/09/2022]
Abstract
Recent studies showed that population substructure (PS) can have more complex impact on rare variant tests and that similarity-based collapsing tests (e.g., SKAT) may suffer more severely by PS than burden-based tests. In this work, we evaluate the performance of SKAT coupling with principal components (PC) or variance components (VC) based PS correction methods. We consider confounding effects caused by PS including stratified populations, admixed populations, and spatially distributed nongenetic risk; we investigate which types of variants (e.g., common, less frequent, rare, or all variants) should be used to effectively control for confounding effects. We found that (i) PC-based methods can account for confounding effects in most scenarios except for admixture, although the number of sufficient PCs depends on the PS complexity and the type of variants used. (ii) PCs based on all variants (i.e., common + less frequent + rare) tend to require equal or fewer sufficient PCs and often achieve higher power than PCs based on other variant types. (iii) VC-based methods can effectively adjust for confounding in all scenarios (even for admixture), though the type of variants should be used to construct VC may vary. (iv) VC based on all variants works consistently in all scenarios, though its power may be sometimes lower than VC based on other variant types. Given that the best-performed method and which variants to use depend on the underlying unknown confounding mechanisms, a robust strategy is to perform SKAT analyses using VC-based methods based on all variants.
Collapse
Affiliation(s)
- Yiwen Luo
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America.,Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Arnab Maity
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Michael C Wu
- Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Chris Smith
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Qing Duan
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Yun Li
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America.,Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Jung-Ying Tzeng
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America.,Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America.,Department of Statistics, National Cheng-Kung University, Tainan, Taiwan.,Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
17
|
McAllister K, Mechanic LE, Amos C, Aschard H, Blair IA, Chatterjee N, Conti D, Gauderman WJ, Hsu L, Hutter CM, Jankowska MM, Kerr J, Kraft P, Montgomery SB, Mukherjee B, Papanicolaou GJ, Patel CJ, Ritchie MD, Ritz BR, Thomas DC, Wei P, Witte JS. Current Challenges and New Opportunities for Gene-Environment Interaction Studies of Complex Diseases. Am J Epidemiol 2017; 186:753-761. [PMID: 28978193 PMCID: PMC5860428 DOI: 10.1093/aje/kwx227] [Citation(s) in RCA: 116] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2016] [Revised: 03/14/2017] [Accepted: 03/16/2017] [Indexed: 12/25/2022] Open
Abstract
Recently, many new approaches, study designs, and statistical and analytical methods have emerged for studying gene-environment interactions (G×Es) in large-scale studies of human populations. There are opportunities in this field, particularly with respect to the incorporation of -omics and next-generation sequencing data and continual improvement in measures of environmental exposures implicated in complex disease outcomes. In a workshop called "Current Challenges and New Opportunities for Gene-Environment Interaction Studies of Complex Diseases," held October 17-18, 2014, by the National Institute of Environmental Health Sciences and the National Cancer Institute in conjunction with the annual American Society of Human Genetics meeting, participants explored new approaches and tools that have been developed in recent years for G×E discovery. This paper highlights current and critical issues and themes in G×E research that need additional consideration, including the improved data analytical methods, environmental exposure assessment, and incorporation of functional data and annotations.
Collapse
Affiliation(s)
| | - Leah E. Mechanic
- Correspondence to Dr. Leah E. Mechanic, Genomic Epidemiology Branch, Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, 9609 Medical Center Drive, Room 4E104, MSC 9763, Bethesda, MD 20892 (e-mail: )
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Huang C, Thompson P, Wang Y, Yu Y, Zhang J, Kong D, Colen RR, Knickmeyer RC, Zhu H. FGWAS: Functional genome wide association analysis. Neuroimage 2017; 159:107-121. [PMID: 28735012 PMCID: PMC5984052 DOI: 10.1016/j.neuroimage.2017.07.030] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2017] [Revised: 07/12/2017] [Accepted: 07/14/2017] [Indexed: 12/11/2022] Open
Abstract
Functional phenotypes (e.g., subcortical surface representation), which commonly arise in imaging genetic studies, have been used to detect putative genes for complexly inherited neuropsychiatric and neurodegenerative disorders. However, existing statistical methods largely ignore the functional features (e.g., functional smoothness and correlation). The aim of this paper is to develop a functional genome-wide association analysis (FGWAS) framework to efficiently carry out whole-genome analyses of functional phenotypes. FGWAS consists of three components: a multivariate varying coefficient model, a global sure independence screening procedure, and a test procedure. Compared with the standard multivariate regression model, the multivariate varying coefficient model explicitly models the functional features of functional phenotypes through the integration of smooth coefficient functions and functional principal component analysis. Statistically, compared with existing methods for genome-wide association studies (GWAS), FGWAS can substantially boost the detection power for discovering important genetic variants influencing brain structure and function. Simulation studies show that FGWAS outperforms existing GWAS methods for searching sparse signals in an extremely large search space, while controlling for the family-wise error rate. We have successfully applied FGWAS to large-scale analysis of data from the Alzheimer's Disease Neuroimaging Initiative for 708 subjects, 30,000 vertices on the left and right hippocampal surfaces, and 501,584 SNPs.
Collapse
Affiliation(s)
- Chao Huang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Paul Thompson
- Imaging Genetics Center, Stevens Institute for Neuroimaging and Informatics, University of Southern California, Marina del Rey, CA, USA
| | - Yalin Wang
- School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ, USA
| | - Yang Yu
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Jingwen Zhang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Dehan Kong
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Rivka R Colen
- Department of Diagnostic Radiology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Rebecca C Knickmeyer
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Hongtu Zhu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
19
|
Gauderman WJ, Mukherjee B, Aschard H, Hsu L, Lewinger JP, Patel CJ, Witte JS, Amos C, Tai CG, Conti D, Torgerson DG, Lee S, Chatterjee N. Update on the State of the Science for Analytical Methods for Gene-Environment Interactions. Am J Epidemiol 2017; 186:762-770. [PMID: 28978192 PMCID: PMC5859988 DOI: 10.1093/aje/kwx228] [Citation(s) in RCA: 77] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2016] [Revised: 04/24/2017] [Accepted: 04/25/2017] [Indexed: 12/14/2022] Open
Abstract
The analysis of gene-environment interaction (G×E) may hold the key for further understanding the etiology of many complex traits. The current availability of high-volume genetic data, the wide range in types of environmental data that can be measured, and the formation of consortiums of multiple studies provide new opportunities to identify G×E but also new analytical challenges. In this article, we summarize several statistical approaches that can be used to test for G×E in a genome-wide association study. These include traditional models of G×E in a case-control or quantitative trait study as well as alternative approaches that can provide substantially greater power. The latest methods for analyzing G×E with gene sets and with data in a consortium setting are summarized, as are issues that arise due to the complexity of environmental data. We provide some speculation on why detecting G×E in a genome-wide association study has thus far been difficult. We conclude with a description of software programs that can be used to implement most of the methods described in the paper.
Collapse
Affiliation(s)
- W. James Gauderman
- Correspondence to Dr. W. James Gauderman, Division of Biostatistics, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, 2001 North Soto Street, 202-K, Los Angeles, CA 90032 (e-mail: )
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Jadhav S, Tong X, Lu Q. A functional U-statistic method for association analysis of sequencing data. Genet Epidemiol 2017; 41:636-643. [PMID: 28850771 DOI: 10.1002/gepi.22063] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2017] [Revised: 06/06/2017] [Accepted: 07/10/2017] [Indexed: 11/08/2022]
Abstract
Although sequencing studies hold great promise for uncovering novel variants predisposing to human diseases, the high dimensionality of the sequencing data brings tremendous challenges to data analysis. Moreover, for many complex diseases (e.g., psychiatric disorders) multiple related phenotypes are collected. These phenotypes can be different measurements of an underlying disease, or measurements characterizing multiple related diseases for studying common genetic mechanism. Although jointly analyzing these phenotypes could potentially increase the power of identifying disease-associated genes, the different types of phenotypes pose challenges for association analysis. To address these challenges, we propose a nonparametric method, functional U-statistic method (FU), for multivariate analysis of sequencing data. It first constructs smooth functions from individuals' sequencing data, and then tests the association of these functions with multiple phenotypes by using a U-statistic. The method provides a general framework for analyzing various types of phenotypes (e.g., binary and continuous phenotypes) with unknown distributions. Fitting the genetic variants within a gene using a smoothing function also allows us to capture complexities of gene structure (e.g., linkage disequilibrium, LD), which could potentially increase the power of association analysis. Through simulations, we compared our method to the multivariate outcome score test (MOST), and found that our test attained better performance than MOST. In a real data application, we apply our method to the sequencing data from Minnesota Twin Study (MTS) and found potential associations of several nicotine receptor subunit (CHRN) genes, including CHRNB3, associated with nicotine dependence and/or alcohol dependence.
Collapse
Affiliation(s)
- Sneha Jadhav
- Department of Statistics and Probability, Michigan State University, East Lansing, Michigan, United States of America
| | - Xiaoran Tong
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan, United States of America
| | - Qing Lu
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan, United States of America
| |
Collapse
|
21
|
Powerful Genetic Association Analysis for Common or Rare Variants with High-Dimensional Structured Traits. Genetics 2017. [PMID: 28642271 DOI: 10.1534/genetics.116.199646] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Many genetic association studies collect a wide range of complex traits. As these traits may be correlated and share a common genetic mechanism, joint analysis can be statistically more powerful and biologically more meaningful. However, most existing tests for multiple traits cannot be used for high-dimensional and possibly structured traits, such as network-structured transcriptomic pathway expressions. To overcome potential limitations, in this article we propose the dual kernel-based association test (DKAT) for testing the association between multiple traits and multiple genetic variants, both common and rare. In DKAT, two individual kernels are used to describe the phenotypic and genotypic similarity, respectively, between pairwise subjects. Using kernels allows for capturing structure while accommodating dimensionality. Then, the association between traits and genetic variants is summarized by a coefficient which measures the association between two kernel matrices. Finally, DKAT evaluates the hypothesis of nonassociation with an analytical P-value calculation without any computationally expensive resampling procedures. By collapsing information in both traits and genetic variants using kernels, the proposed DKAT is shown to have a correct type-I error rate and higher power than other existing methods in both simulation studies and application to a study of genetic regulation of pathway gene expressions.
Collapse
|
22
|
Casale FP, Horta D, Rakitsch B, Stegle O. Joint genetic analysis using variant sets reveals polygenic gene-context interactions. PLoS Genet 2017; 13:e1006693. [PMID: 28426829 PMCID: PMC5398484 DOI: 10.1371/journal.pgen.1006693] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2016] [Accepted: 03/15/2017] [Indexed: 01/28/2023] Open
Abstract
Joint genetic models for multiple traits have helped to enhance association analyses. Most existing multi-trait models have been designed to increase power for detecting associations, whereas the analysis of interactions has received considerably less attention. Here, we propose iSet, a method based on linear mixed models to test for interactions between sets of variants and environmental states or other contexts. Our model generalizes previous interaction tests and in particular provides a test for local differences in the genetic architecture between contexts. We first use simulations to validate iSet before applying the model to the analysis of genotype-environment interactions in an eQTL study. Our model retrieves a larger number of interactions than alternative methods and reveals that up to 20% of cases show context-specific configurations of causal variants. Finally, we apply iSet to test for sub-group specific genetic effects in human lipid levels in a large human cohort, where we identify a gene-sex interaction for C-reactive protein that is missed by alternative methods. Genetic effects on phenotypes can depend on external contexts, including environment. Statistical tests for identifying such interactions are important to understand how individual genetic variants may act in different contexts. Interaction effects can either be studied using measurements of a given phenotype in different contexts, under the same genetic backgrounds, or by stratifying a population into subgroups. Here, we derive a method based on linear mixed models that can be applied to both of these designs. iSet enables testing for interactions between context and sets of variants, and accounts for polygenic effects. We validate our model using simulations, before applying it to the genetic analysis of gene expression studies and genome-wide association studies of human blood lipid levels. We find that modeling interactions with variant sets offers increased power, thereby uncovering interactions that cannot be detected by alternative methods.
Collapse
Affiliation(s)
- Francesco Paolo Casale
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, CB10 1SD Hinxton, Cambridge, United Kingdom
- * E-mail: (FPC); (OS)
| | - Danilo Horta
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, CB10 1SD Hinxton, Cambridge, United Kingdom
| | - Barbara Rakitsch
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, CB10 1SD Hinxton, Cambridge, United Kingdom
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, CB10 1SD Hinxton, Cambridge, United Kingdom
- * E-mail: (FPC); (OS)
| |
Collapse
|
23
|
He Z, Zhang M, Lee S, Smith JA, Kardia SLR, Diez Roux AV, Mukherjee B. Set-Based Tests for the Gene-Environment Interaction in Longitudinal Studies. J Am Stat Assoc 2016; 112:966-978. [PMID: 29780190 PMCID: PMC5954413 DOI: 10.1080/01621459.2016.1252266] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2016] [Accepted: 10/01/2016] [Indexed: 01/09/2023]
Abstract
We propose a generalized score type test for set-based inference for gene-environment interaction with longitudinally measured quantitative traits. The test is robust to misspecification of within subject correlation structure and has enhanced power compared to existing alternatives. Unlike tests for marginal genetic association, set-based tests for gene-environment interaction face the challenges of a potentially misspecified and high-dimensional main effect model under the null hypothesis. We show that our proposed test is robust to main effect misspecification of environmental exposure and genetic factors under the gene-environment independence condition. When genetic and environmental factors are dependent, the method of sieves is further proposed to eliminate potential bias due to a misspecified main effect of a continuous environmental exposure. A weighted principal component analysis approach is developed to perform dimension reduction when the number of genetic variants in the set is large relative to the sample size. The methods are motivated by an example from the Multi-Ethnic Study of Atherosclerosis (MESA), investigating interaction between measures of neighborhood environment and genetic regions on longitudinal measures of blood pressure over a study period of about seven years with 4 exams.
Collapse
Affiliation(s)
- Zihuai He
- Department of Biostatistics, University of Michigan
| | - Min Zhang
- Department of Biostatistics, University of Michigan
| | | | | | | | | | | |
Collapse
|
24
|
Su YR, Di CZ, Hsu L. A unified powerful set-based test for sequencing data analysis of GxE interactions. Biostatistics 2016; 18:119-131. [PMID: 27474101 DOI: 10.1093/biostatistics/kxw034] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2015] [Revised: 04/27/2016] [Accepted: 06/17/2016] [Indexed: 11/13/2022] Open
Abstract
The development of next-generation sequencing technologies has allowed researchers to study comprehensively the contribution of genetic variation particularly rare variants to complex diseases. To date many sequencing analyses of rare variants have focused on marginal genetic effects and have not explored the potential role environmental factors play in modifying genetic risk. Analysis of gene-environment interaction (GxE) for rare variants poses considerable challenges because of variant rarity and paucity of subjects who carry the variants while being exposed. To tackle this challenge, we propose a hierarchical model to jointly assess the GxE effects of a set of rare variants for example, in a gene or regulatory region, leveraging the information across the variants. Under this model, GxE is modeled by two components. The first component incorporates variant functional information as weights to calculate the weighted burden of variant alleles across variants, and then assess their GxE interaction with the environmental factor. Since this information is a priori known, this component is fixed effects in the model. The second component involves residual GxE effects that have not been accounted for by the fixed effects. In this component, the residual GxE effects are postulated to follow an unspecified distribution with mean 0 and variance [Formula: see text] We develop a novel testing procedure by deriving two independent score statistics for the fixed effects and the variance component separately. We propose two data-adaptive combination approaches for combining these two score statistics and establish the asymptotic distributions. An extensive simulation study shows that the proposed approaches maintain the correct type I error and the power is comparable to or better than existing methods under a wide range of scenarios. Finally we illustrate the proposed methods by a exome-wide GxE analysis with NSAIDs use in colorectal cancer.
Collapse
Affiliation(s)
- Yu-Ru Su
- Biostatistics and Biomathematics Program, Public Health Science Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue N, Seattle, WA 98109, USA
| | - Chong-Zhi Di
- Biostatistics and Biomathematics Program, Public Health Science Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue N, Seattle, WA 98109, USA
| | - Li Hsu
- Biostatistics and Biomathematics Program, Public Health Science Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue N, Seattle, WA 98109, USA
| | | |
Collapse
|
25
|
Jeng XJ, Daye ZJ, Lu W, Tzeng JY. Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level. PLoS Comput Biol 2016; 12:e1004993. [PMID: 27355347 PMCID: PMC4927097 DOI: 10.1371/journal.pcbi.1004993] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2015] [Accepted: 05/21/2016] [Indexed: 11/24/2022] Open
Abstract
Genetic association analyses of rare variants in next-generation sequencing (NGS) studies are fundamentally challenging due to the presence of a very large number of candidate variants at extremely low minor allele frequencies. Recent developments often focus on pooling multiple variants to provide association analysis at the gene instead of the locus level. Nonetheless, pinpointing individual variants is a critical goal for genomic researches as such information can facilitate the precise delineation of molecular mechanisms and functions of genetic factors on diseases. Due to the extreme rarity of mutations and high-dimensionality, significances of causal variants cannot easily stand out from those of noncausal ones. Consequently, standard false-positive control procedures, such as the Bonferroni and false discovery rate (FDR), are often impractical to apply, as a majority of the causal variants can only be identified along with a few but unknown number of noncausal variants. To provide informative analysis of individual variants in large-scale sequencing studies, we propose the Adaptive False-Negative Control (AFNC) procedure that can include a large proportion of causal variants with high confidence by introducing a novel statistical inquiry to determine those variants that can be confidently dispatched as noncausal. The AFNC provides a general framework that can accommodate for a variety of models and significance tests. The procedure is computationally efficient and can adapt to the underlying proportion of causal variants and quality of significance rankings. Extensive simulation studies across a plethora of scenarios demonstrate that the AFNC is advantageous for identifying individual rare variants, whereas the Bonferroni and FDR are exceedingly over-conservative for rare variants association studies. In the analyses of the CoLaus dataset, AFNC has identified individual variants most responsible for gene-level significances. Moreover, single-variant results using the AFNC have been successfully applied to infer related genes with annotation information. Next-generation sequencing technologies have allowed genetic association studies of complex traits at the single base-pair resolution, where most genetic variants have extremely low mutation frequencies. These rare variants have been the focus of modern statistical-computational genomics due to their potential to explain missing disease heritability. The identification of individual rare variants associated with diseases can provide new biological insights and enable the precise delineation of disease mechanisms. However, due to the extreme rarity of mutations and large numbers of variants, significances of causative variants tend to be mixed inseparably with a few noncausative ones, and standard multiple testing procedures controlling for false positives fail to provide a meaningful way to include a large proportion of the causative variants. To address the challenge of detecting weak biological signals, we propose a novel statistical procedure, based on false-negative control, to provide a practical approach for variant inclusion in large-scale sequencing studies. By determining those variants that can be confidently dispatched as noncausative, the proposed procedure offers an objective selection of a modest number of potentially causative variants at the single-locus level. Results can be further prioritized or used to infer disease-associated genes with annotation information.
Collapse
Affiliation(s)
- Xinge Jessie Jeng
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Zhongyin John Daye
- Epidemiology and Biostatistics, University of Arizona, Tucson, Arizona, United States of America
| | - Wenbin Lu
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Jung-Ying Tzeng
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
- Department of Statistics, National Cheng-Kung University, Tainan, Taiwan
- * E-mail:
| |
Collapse
|
26
|
Powerful and Adaptive Testing for Multi-trait and Multi-SNP Associations with GWAS and Sequencing Data. Genetics 2016; 203:715-31. [PMID: 27075728 DOI: 10.1534/genetics.115.186502] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2015] [Accepted: 04/02/2016] [Indexed: 11/18/2022] Open
Abstract
Testing for genetic association with multiple traits has become increasingly important, not only because of its potential to boost statistical power, but also for its direct relevance to applications. For example, there is accumulating evidence showing that some complex neurodegenerative and psychiatric diseases like Alzheimer's disease are due to disrupted brain networks, for which it would be natural to identify genetic variants associated with a disrupted brain network, represented as a set of multiple traits, one for each of multiple brain regions of interest. In spite of its promise, testing for multivariate trait associations is challenging: if not appropriately used, its power can be much lower than testing on each univariate trait separately (with a proper control for multiple testing). Furthermore, differing from most existing methods for single-SNP-multiple-trait associations, we consider SNP set-based association testing to decipher complicated joint effects of multiple SNPs on multiple traits. Because the power of a test critically depends on several unknown factors such as the proportions of associated SNPs and of traits, we propose a highly adaptive test at both the SNP and trait levels, giving higher weights to those likely associated SNPs and traits, to yield high power across a wide spectrum of situations. We illuminate relationships among the proposed and some existing tests, showing that the proposed test covers several existing tests as special cases. We compare the performance of the new test with that of several existing tests, using both simulated and real data. The methods were applied to structural magnetic resonance imaging data drawn from the Alzheimer's Disease Neuroimaging Initiative to identify genes associated with gray matter atrophy in the human brain default mode network (DMN). For genome-wide association studies (GWAS), genes AMOTL1 on chromosome 11 and APOE on chromosome 19 were discovered by the new test to be significantly associated with the DMN. Notably, gene AMOTL1 was not detected by single SNP-based analyses. To our knowledge, AMOTL1 has not been highlighted in other Alzheimer's disease studies before, although it was indicated to be related to cognitive impairment. The proposed method is also applicable to rare variants in sequencing data and can be extended to pathway analysis.
Collapse
|
27
|
Lu ZH, Zhu H, Knickmeyer RC, Sullivan PF, Williams SN, Zou F. Multiple SNP Set Analysis for Genome-Wide Association Studies Through Bayesian Latent Variable Selection. Genet Epidemiol 2015; 39:664-77. [PMID: 26515609 DOI: 10.1002/gepi.21932] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Revised: 07/23/2015] [Accepted: 08/18/2015] [Indexed: 11/07/2022]
Abstract
The power of genome-wide association studies (GWAS) for mapping complex traits with single-SNP analysis (where SNP is single-nucleotide polymorphism) may be undermined by modest SNP effect sizes, unobserved causal SNPs, correlation among adjacent SNPs, and SNP-SNP interactions. Alternative approaches for testing the association between a single SNP set and individual phenotypes have been shown to be promising for improving the power of GWAS. We propose a Bayesian latent variable selection (BLVS) method to simultaneously model the joint association mapping between a large number of SNP sets and complex traits. Compared with single SNP set analysis, such joint association mapping not only accounts for the correlation among SNP sets but also is capable of detecting causal SNP sets that are marginally uncorrelated with traits. The spike-and-slab prior assigned to the effects of SNP sets can greatly reduce the dimension of effective SNP sets, while speeding up computation. An efficient Markov chain Monte Carlo algorithm is developed. Simulations demonstrate that BLVS outperforms several competing variable selection methods in some important scenarios.
Collapse
Affiliation(s)
- Zhao-Hua Lu
- Department of Biostatistics, University of North Carolina at Chapel Hill, North Carolina, United States of America
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill, North Carolina, United States of America.,Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, North Carolina, United States of America
| | - Rebecca C Knickmeyer
- Department of Psychiatry, University of North Carolina at Chapel Hill, North Carolina, United States of America
| | - Patrick F Sullivan
- Department of Genetics, University of North Carolina at Chapel Hill, North Carolina, United States of America
| | - Stephanie N Williams
- Department of Genetics, University of North Carolina at Chapel Hill, North Carolina, United States of America
| | - Fei Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, North Carolina, United States of America
| | | |
Collapse
|
28
|
Tzeng JY, Magnusson PKE, Sullivan PF, Szatkiewicz JP. A New Method for Detecting Associations with Rare Copy-Number Variants. PLoS Genet 2015; 11:e1005403. [PMID: 26431523 PMCID: PMC4592002 DOI: 10.1371/journal.pgen.1005403] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Accepted: 06/30/2015] [Indexed: 01/31/2023] Open
Abstract
Copy number variants (CNVs) play an important role in the etiology of many diseases such as cancers and psychiatric disorders. Due to a modest marginal effect size or the rarity of the CNVs, collapsing rare CNVs together and collectively evaluating their effect serves as a key approach to evaluating the collective effect of rare CNVs on disease risk. While a plethora of powerful collapsing methods are available for sequence variants (e.g., SNPs) in association analysis, these methods cannot be directly applied to rare CNVs due to the CNV-specific challenges, i.e., the multi-faceted nature of CNV polymorphisms (e.g., CNVs vary in size, type, dosage, and details of gene disruption), and etiological heterogeneity (e.g., heterogeneous effects of duplications and deletions that occur within a locus or in different loci). Existing CNV collapsing analysis methods (a.k.a. the burden test) tend to have suboptimal performance due to the fact that these methods often ignore heterogeneity and evaluate only the marginal effects of a CNV feature. We introduce CCRET, a random effects test for collapsing rare CNVs when searching for disease associations. CCRET is applicable to variants measured on a multi-categorical scale, collectively modeling the effects of multiple CNV features, and is robust to etiological heterogeneity. Multiple confounders can be simultaneously corrected. To evaluate the performance of CCRET, we conducted extensive simulations and analyzed large-scale schizophrenia datasets. We show that CCRET has powerful and robust performance under multiple types of etiological heterogeneity, and has performance comparable to or better than existing methods when there is no heterogeneity.
Collapse
Affiliation(s)
- Jung-Ying Tzeng
- Department of Statistics and Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
- Department of Statistics, National Cheng-Kung University, Tainan, Taiwan
| | - Patrik K. E. Magnusson
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Patrick F. Sullivan
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | | | - Jin P. Szatkiewicz
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
29
|
Huang M, Nichols T, Huang C, Yang Y, Lu Z, Feng Q, Knickmeyer RC, Zhu H. FVGWAS: Fast voxelwise genome wide association analysis of large-scale imaging genetic data. Neuroimage 2015; 118:613-27. [PMID: 26025292 PMCID: PMC4554832 DOI: 10.1016/j.neuroimage.2015.05.043] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2015] [Revised: 04/09/2015] [Accepted: 05/16/2015] [Indexed: 01/17/2023] Open
Abstract
More and more large-scale imaging genetic studies are being widely conducted to collect a rich set of imaging, genetic, and clinical data to detect putative genes for complexly inherited neuropsychiatric and neurodegenerative disorders. Several major big-data challenges arise from testing genome-wide (NC>12 million known variants) associations with signals at millions of locations (NV~10(6)) in the brain from thousands of subjects (n~10(3)). The aim of this paper is to develop a Fast Voxelwise Genome Wide Association analysiS (FVGWAS) framework to efficiently carry out whole-genome analyses of whole-brain data. FVGWAS consists of three components including a heteroscedastic linear model, a global sure independence screening (GSIS) procedure, and a detection procedure based on wild bootstrap methods. Specifically, for standard linear association, the computational complexity is O (nNVNC) for voxelwise genome wide association analysis (VGWAS) method compared with O ((NC+NV)n(2)) for FVGWAS. Simulation studies show that FVGWAS is an efficient method of searching sparse signals in an extremely large search space, while controlling for the family-wise error rate. Finally, we have successfully applied FVGWAS to a large-scale imaging genetic data analysis of ADNI data with 708 subjects, 193,275voxels in RAVENS maps, and 501,584 SNPs, and the total processing time was 203,645s for a single CPU. Our FVGWAS may be a valuable statistical toolbox for large-scale imaging genetic analysis as the field is rapidly advancing with ultra-high-resolution imaging and whole-genome sequencing.
Collapse
Affiliation(s)
- Meiyan Huang
- School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China
| | - Thomas Nichols
- Department of Statistics, University of Warwick, Coventry, UK
| | - Chao Huang
- Department of Biostatistics and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yu Yang
- Department of Statistics and Operation Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Zhaohua Lu
- Department of Biostatistics and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Qianjing Feng
- School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China
| | - Rebecca C Knickmeyer
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Hongtu Zhu
- Department of Biostatistics and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | | |
Collapse
|
30
|
Wang C, Kao WH, Hsiao CK. Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies. PLoS One 2015; 10:e0135918. [PMID: 26302001 PMCID: PMC4547758 DOI: 10.1371/journal.pone.0135918] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2014] [Accepted: 07/28/2015] [Indexed: 11/27/2022] Open
Abstract
The availability of high-throughput genomic data has led to several challenges in recent genetic association studies, including the large number of genetic variants that must be considered and the computational complexity in statistical analyses. Tackling these problems with a marker-set study such as SNP-set analysis can be an efficient solution. To construct SNP-sets, we first propose a clustering algorithm, which employs Hamming distance to measure the similarity between strings of SNP genotypes and evaluates whether the given SNPs or SNP-sets should be clustered. A dendrogram can then be constructed based on such distance measure, and the number of clusters can be determined. With the resulting SNP-sets, we next develop an association test HDAT to examine susceptibility to the disease of interest. This proposed test assesses, based on Hamming distance, whether the similarity between a diseased and a normal individual differs from the similarity between two individuals of the same disease status. In our proposed methodology, only genotype information is needed. No inference of haplotypes is required, and SNPs under consideration do not need to locate in nearby regions. The proposed clustering algorithm and association test are illustrated with applications and simulation studies. As compared with other existing methods, the clustering algorithm is faster and better at identifying sets containing SNPs exerting a similar effect. In addition, the simulation studies demonstrated that the proposed test works well for SNP-sets containing a large proportion of neutral SNPs. Furthermore, employing the clustering algorithm before testing a large set of data improves the knowledge in confining the genetic regions for susceptible genetic markers.
Collapse
Affiliation(s)
- Charlotte Wang
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, 100, Taiwan
| | - Wen-Hsin Kao
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, 100, Taiwan
| | - Chuhsing Kate Hsiao
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, 100, Taiwan
- Bioinformatics and Biostatistics Core, Division of Genomic Medicine, Research Center for Medical Excellence, National Taiwan University, Taipei, 100, Taiwan
- Department of Public Health, National Taiwan University, Taipei, 100, Taiwan
- * E-mail:
| |
Collapse
|
31
|
Xu Z, Pan W. Approximate score-based testing with application to multivariate trait association analysis. Genet Epidemiol 2015. [PMID: 26198454 DOI: 10.1002/gepi.21911] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
For genome-wide association studies and DNA sequencing studies, several powerful score-based tests, such as kernel machine regression and sum of powered score tests, have been proposed in the last few years. However, extensions of these score-based tests to more complex models, such as mixed-effects models for analysis of multiple and correlated traits, have been hindered by the unavailability of the score vector, due to either no output from statistical software or no closed-form solution at all. We propose a simple and general method to asymptotically approximate the score vector based on an asymptotically normal and consistent estimate of a parameter vector to be tested and its (consistent) covariance matrix. The proposed method is applicable to both maximum-likelihood estimation and estimating function-based approaches. We use the derived approximate score vector to extend several score-based tests to mixed-effects models. We demonstrate the feasibility and possible power gains of these tests in association analysis of multiple and correlated quantitative or binary traits with both real and simulated data. The proposed method is easy to implement with a wide applicability.
Collapse
Affiliation(s)
- Zhiyuan Xu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| | | |
Collapse
|
32
|
Marceau R, Lu W, Holloway S, Sale MM, Worrall BB, Williams SR, Hsu FC, Tzeng JY. A Fast Multiple-Kernel Method With Applications to Detect Gene-Environment Interaction. Genet Epidemiol 2015; 39:456-68. [PMID: 26139508 DOI: 10.1002/gepi.21909] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Revised: 05/10/2015] [Accepted: 05/20/2015] [Indexed: 01/27/2023]
Abstract
Kernel machine (KM) models are a powerful tool for exploring associations between sets of genetic variants and complex traits. Although most KM methods use a single kernel function to assess the marginal effect of a variable set, KM analyses involving multiple kernels have become increasingly popular. Multikernel analysis allows researchers to study more complex problems, such as assessing gene-gene or gene-environment interactions, incorporating variance-component based methods for population substructure into rare-variant association testing, and assessing the conditional effects of a variable set adjusting for other variable sets. The KM framework is robust, powerful, and provides efficient dimension reduction for multifactor analyses, but requires the estimation of high dimensional nuisance parameters. Traditional estimation techniques, including regularization and the "expectation-maximization (EM)" algorithm, have a large computational cost and are not scalable to large sample sizes needed for rare variant analysis. Therefore, under the context of gene-environment interaction, we propose a computationally efficient and statistically rigorous "fastKM" algorithm for multikernel analysis that is based on a low-rank approximation to the nuisance effect kernel matrices. Our algorithm is applicable to various trait types (e.g., continuous, binary, and survival traits) and can be implemented using any existing single-kernel analysis software. Through extensive simulation studies, we show that our algorithm has similar performance to an EM-based KM approach for quantitative traits while running much faster. We also apply our method to the Vitamin Intervention for Stroke Prevention (VISP) clinical trial, examining gene-by-vitamin effects on recurrent stroke risk and gene-by-age effects on change in homocysteine level.
Collapse
Affiliation(s)
- Rachel Marceau
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Wenbin Lu
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Shannon Holloway
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Michèle M Sale
- Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia, United States of America.,Department of Medicine, University of Virginia, Charlottesville, Virginia, United States of America.,Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, Virginia, United States of America
| | - Bradford B Worrall
- Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia, United States of America.,Department of Neurology, University of Virginia, Charlottesville, Virginia, United States of America
| | - Stephen R Williams
- Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia, United States of America.,Cardiovascular Research Center, University of Virginia, Charlottesville, Virginia, United States of America
| | - Fang-Chi Hsu
- Department of Biostatistical Sciences, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America
| | - Jung-Ying Tzeng
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America.,Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America.,Department of Statistics, National Cheng-Kung University, Tainan, Taiwan
| |
Collapse
|
33
|
Pan W, Kwak IY, Wei P. A Powerful Pathway-Based Adaptive Test for Genetic Association with Common or Rare Variants. Am J Hum Genet 2015; 97:86-98. [PMID: 26119817 DOI: 10.1016/j.ajhg.2015.05.018] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2015] [Accepted: 05/21/2015] [Indexed: 12/11/2022] Open
Abstract
In spite of the success of genome-wide association studies (GWASs), only a small proportion of heritability for each complex trait has been explained by identified genetic variants, mainly SNPs. Likely reasons include genetic heterogeneity (i.e., multiple causal genetic variants) and small effect sizes of causal variants, for which pathway analysis has been proposed as a promising alternative to the standard single-SNP-based analysis. A pathway contains a set of functionally related genes, each of which includes multiple SNPs. Here we propose a pathway-based test that is adaptive at both the gene and SNP levels, thus maintaining high power across a wide range of situations with varying numbers of the genes and SNPs associated with a trait. The proposed method is applicable to both common variants and rare variants and can incorporate biological knowledge on SNPs and genes to boost statistical power. We use extensively simulated data and a WTCCC GWAS dataset to compare our proposal with several existing pathway-based and SNP-set-based tests, demonstrating its promising performance and its potential use in practice.
Collapse
|
34
|
Jiao S, Peters U, Berndt S, Bézieau S, Brenner H, Campbell PT, Chan AT, Chang-Claude J, Lemire M, Newcomb PA, Potter JD, Slattery ML, Woods MO, Hsu L. Powerful Set-Based Gene-Environment Interaction Testing Framework for Complex Diseases. Genet Epidemiol 2015; 39:609-18. [PMID: 26095235 DOI: 10.1002/gepi.21908] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Revised: 04/20/2015] [Accepted: 05/06/2015] [Indexed: 01/15/2023]
Abstract
Identification of gene-environment interaction (G × E) is important in understanding the etiology of complex diseases. Based on our previously developed Set Based gene EnviRonment InterAction test (SBERIA), in this paper we propose a powerful framework for enhanced set-based G × E testing (eSBERIA). The major challenge of signal aggregation within a set is how to tell signals from noise. eSBERIA tackles this challenge by adaptively aggregating the interaction signals within a set weighted by the strength of the marginal and correlation screening signals. eSBERIA then combines the screening-informed aggregate test with a variance component test to account for the residual signals. Additionally, we develop a case-only extension for eSBERIA (coSBERIA) and an existing set-based method, which boosts the power not only by exploiting the G-E independence assumption but also by avoiding the need to specify main effects for a large number of variants in the set. Through extensive simulation, we show that coSBERIA and eSBERIA are considerably more powerful than existing methods within the case-only and the case-control method categories across a wide range of scenarios. We conduct a genome-wide G × E search by applying our methods to Illumina HumanExome Beadchip data of 10,446 colorectal cancer cases and 10,191 controls and identify two novel interactions between nonsteroidal anti-inflammatory drugs (NSAIDs) and MINK1 and PTCHD3.
Collapse
Affiliation(s)
- Shuo Jiao
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Ulrike Peters
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Sonja Berndt
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| | | | - Hermann Brenner
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center, Heidelberg, Germany.,German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Peter T Campbell
- Epidemiology Research Program, American Cancer Society, Atlanta, Georgia, United States of America
| | - Andrew T Chan
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Jenny Chang-Claude
- Division of Cancer Epidemiology, German Cancer Research Center, Heidelberg, Germany
| | | | - Polly A Newcomb
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America.,School of Public Health, University of Washington, Seattle, Washington, United States of America
| | - John D Potter
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America.,Ontario Institute for Cancer Research, Toronto, Canada.,Centre for Public Health Research, Massey University, Wellington, New Zealand
| | - Martha L Slattery
- Department of Internal Medicine, University of Utah Health Sciences Center, Salt Lake City, Utah, United States of America
| | - Michael O Woods
- Discipline of Genetics, Memorial University of Newfoundland, St. John's, NL, Canada
| | - Li Hsu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| |
Collapse
|
35
|
Broadaway KA, Duncan R, Conneely KN, Almli LM, Bradley B, Ressler KJ, Epstein MP. Kernel Approach for Modeling Interaction Effects in Genetic Association Studies of Complex Quantitative Traits. Genet Epidemiol 2015; 39:366-75. [PMID: 25885490 DOI: 10.1002/gepi.21901] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Revised: 02/16/2015] [Accepted: 02/27/2015] [Indexed: 12/29/2022]
Abstract
The etiology of complex traits likely involves the effects of genetic and environmental factors, along with complicated interaction effects between them. Consequently, there has been interest in applying genetic association tests of complex traits that account for potential modification of the genetic effect in the presence of an environmental factor. One can perform such an analysis using a joint test of gene and gene-environment interaction. An optimal joint test would be one that remains powerful under a variety of models ranging from those of strong gene-environment interaction effect to those of little or no gene-environment interaction effect. To fill this demand, we have extended a kernel machine based approach for association mapping of multiple SNPs to consider joint tests of gene and gene-environment interaction. The kernel-based approach for joint testing is promising, because it incorporates linkage disequilibrium information from multiple SNPs simultaneously in analysis and permits flexible modeling of interaction effects. Using simulated data, we show that our kernel machine approach typically outperforms the traditional joint test under strong gene-environment interaction models and further outperforms the traditional main-effect association test under models of weak or no gene-environment interaction effects. We illustrate our test using genome-wide association data from the Grady Trauma Project, a cohort of highly traumatized, at-risk individuals, which has previously been investigated for interaction effects.
Collapse
Affiliation(s)
- K Alaine Broadaway
- Department of Human Genetics, Emory University, Atlanta, Georgia, United States of America
| | - Richard Duncan
- Department of Human Genetics, Emory University, Atlanta, Georgia, United States of America
| | - Karen N Conneely
- Department of Human Genetics, Emory University, Atlanta, Georgia, United States of America
| | - Lynn M Almli
- Department of Psychiatry and Behavioral Sciences, Emory University, Atlanta, Georgia, United States of America
| | - Bekh Bradley
- Department of Psychiatry and Behavioral Sciences, Emory University, Atlanta, Georgia, United States of America.,Department of Veterans Affairs, Atlanta VA Medical Center, Atlanta, Georgia, United States of America
| | - Kerry J Ressler
- Department of Psychiatry and Behavioral Sciences, Emory University, Atlanta, Georgia, United States of America
| | - Michael P Epstein
- Department of Human Genetics, Emory University, Atlanta, Georgia, United States of America
| |
Collapse
|
36
|
Wang Z, Maity A, Hsiao CK, Voora D, Kaddurah-Daouk R, Tzeng JY. Module-based association analysis for omics data with network structure. PLoS One 2015; 10:e0122309. [PMID: 25822417 PMCID: PMC4378989 DOI: 10.1371/journal.pone.0122309] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2014] [Accepted: 02/20/2015] [Indexed: 02/06/2023] Open
Abstract
Module-based analysis (MBA) aims to evaluate the effect of a group of biological elements sharing common features, such as SNPs in the same gene or metabolites in the same pathways, and has become an attractive alternative to traditional single bio-element approaches. Because bio-elements regulate and interact with each other as part of network, incorporating network structure information can more precisely model the biological effects, enhance the ability to detect true associations, and facilitate our understanding of the underlying biological mechanisms. However, most MBA methods ignore the network structure information, which depicts the interaction and regulation relationship among basic functional units in biology system. We construct the connectivity kernel and the topology kernel to capture the relationship among bio-elements in a module, and use a kernel machine framework to evaluate the joint effect of bio-elements. Our proposed kernel machine approach directly incorporates network structure so to enhance the study efficiency; it can assess interactions among modules, account covariates, and is computational efficient. Through simulation studies and real data application, we demonstrate that the proposed network-based methods can have markedly better power than the approaches ignoring network information under a range of scenarios.
Collapse
Affiliation(s)
- Zhi Wang
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, 27695, United States of America
| | - Arnab Maity
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, 27695, United States of America
| | - Chuhsing Kate Hsiao
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| | - Deepak Voora
- Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina, United States of America
| | - Rima Kaddurah-Daouk
- Department of Psychiatry and Behavioral Sciences, Duke University, Durham, North Carolina, United States of America
| | - Jung-Ying Tzeng
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, 27695, United States of America
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, 27695, United States of America
- Department of Statistics, National Cheng-Kung University, Taiwan, R.O.C
| |
Collapse
|
37
|
Wang Z, Maity A, Luo Y, Neely ML, Tzeng JY. Complete effect-profile assessment in association studies with multiple genetic and multiple environmental factors. Genet Epidemiol 2015; 39:122-33. [PMID: 25538034 PMCID: PMC4314365 DOI: 10.1002/gepi.21877] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2014] [Revised: 10/20/2014] [Accepted: 11/05/2014] [Indexed: 01/28/2023]
Abstract
Studying complex diseases in the post genome-wide association studies (GWAS) era has led to developing methods that consider factor-sets rather than individual genetic/environmental factors (i.e., Multi-G-Multi-E studies), and mining for potential gene-environment (G×E) interactions has proven to be an invaluable aid in both discovery and deciphering underlying biological mechanisms. Current approaches for examining effect profiles in Multi-G-Multi-E analyses are either underpowered due to large degrees of freedom, ill-suited for detecting G×E interactions due to imprecise modeling of the G and E effects, or lack of capacity for modeling interactions between two factor-sets (e.g., existing methods focus primarily on a single E factor). In this work, we illustrate the issues encountered in constructing kernels for investigating interactions between two factor-sets, and propose a simple yet intuitive solution to construct the G×E kernel that retains the ease-of-interpretation of classic regression. We also construct a series of kernel machine (KM) score tests to evaluate the complete effect profile (i.e., the G, E, and G×E effects individually or in combination). We show, via simulations and a data application, that the proposed KM methods outperform the classic and PC regressions across a range of scenarios, including varying effect size, effect structure, and interaction complexity. The largest power gain was observed when the underlying effect structure involved complex G×E interactions; however, the proposed methods have consistent, powerful performance when the effect profile is simple or complex, suggesting that the proposed method could be a useful tool for exploratory or confirmatory G×E analysis.
Collapse
Affiliation(s)
- Zhi Wang
- Bioinformatics Research Center, North Carolina State University,
Raleigh NC, 27695, USA
| | - Arnab Maity
- Department of Statistics, North Carolina State University,
Raleigh NC, 27695, USA
| | - Yiwen Luo
- Bioinformatics Research Center, North Carolina State University,
Raleigh NC, 27695, USA
| | - Megan L. Neely
- Department of Biostatistics and Bioinformatics, Duke University,
Durham, NC, 27705, USA
| | - Jung-Ying Tzeng
- Bioinformatics Research Center, North Carolina State University,
Raleigh NC, 27695, USA
- Department of Statistics, North Carolina State University,
Raleigh NC, 27695, USA
| |
Collapse
|
38
|
Assessing gene-environment interactions for common and rare variants with binary traits using gene-trait similarity regression. Genetics 2015; 199:695-710. [PMID: 25585620 DOI: 10.1534/genetics.114.171686] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Accounting for gene-environment (G×E) interactions in complex trait association studies can facilitate our understanding of genetic heterogeneity under different environmental exposures, improve the ability to discover susceptible genes that exhibit little marginal effect, provide insight into the biological mechanisms of complex diseases, help to identify high-risk subgroups in the population, and uncover hidden heritability. However, significant G×E interactions can be difficult to find. The sample sizes required for sufficient power to detect association are much larger than those needed for genetic main effects, and interactions are sensitive to misspecification of the main-effects model. These issues are exacerbated when working with binary phenotypes and rare variants, which bear less information on association. In this work, we present a similarity-based regression method for evaluating G×E interactions for rare variants with binary traits. The proposed model aggregates the genetic and G×E information across markers, using genetic similarity, thus increasing the ability to detect G×E signals. The model has a random effects interpretation, which leads to robustness against main-effect misspecifications when evaluating G×E interactions. We construct score tests to examine G×E interactions and a computationally efficient EM algorithm to estimate the nuisance variance components. Using simulations and data applications, we show that the proposed method is a flexible and powerful tool to study the G×E effect in common or rare variant studies with binary traits.
Collapse
|
39
|
Almli LM, Duncan R, Feng H, Ghosh D, Binder EB, Bradley B, Ressler KJ, Conneely KN, Epstein MP. Correcting systematic inflation in genetic association tests that consider interaction effects: application to a genome-wide association study of posttraumatic stress disorder. JAMA Psychiatry 2014; 71:1392-9. [PMID: 25354142 PMCID: PMC4293022 DOI: 10.1001/jamapsychiatry.2014.1339] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
IMPORTANCE Genetic association studies of psychiatric outcomes often consider interactions with environmental exposures and, in particular, apply tests that jointly consider gene and gene-environment interaction effects for analysis. Using a genome-wide association study (GWAS) of posttraumatic stress disorder (PTSD), we report that heteroscedasticity (defined as variability in outcome that differs by the value of the environmental exposure) can invalidate traditional joint tests of gene and gene-environment interaction. OBJECTIVES To identify the cause of bias in traditional joint tests of gene and gene-environment interaction in a PTSD GWAS and determine whether proposed robust joint tests are insensitive to this problem. DESIGN, SETTING, AND PARTICIPANTS The PTSD GWAS data set consisted of 3359 individuals (978 men and 2381 women) from the Grady Trauma Project (GTP), a cohort study from Atlanta, Georgia. The GTP performed genome-wide genotyping of participants and collected environmental exposures using the Childhood Trauma Questionnaire and Trauma Experiences Inventory. MAIN OUTCOMES AND MEASURES We performed joint interaction testing of the Beck Depression Inventory and modified PTSD Symptom Scale in the GTP GWAS. We assessed systematic bias in our interaction analyses using quantile-quantile plots and genome-wide inflation factors. RESULTS Application of the traditional joint interaction test to the GTP GWAS yielded systematic inflation across different outcomes and environmental exposures (inflation-factor estimates ranging from 1.07 to 1.21), whereas application of the robust joint test to the same data set yielded no such inflation (inflation-factor estimates ranging from 1.01 to 1.02). Simulated data further revealed that the robust joint test is valid in different heteroscedasticity models, whereas the traditional joint test is invalid. The robust joint test also has power similar to the traditional joint test when heteroscedasticity is not an issue. CONCLUSIONS AND RELEVANCE We believe the robust joint test should be used in candidate-gene studies and GWASs of psychiatric outcomes that consider environmental interactions. To make the procedure useful for applied investigators, we created a software tool that can be called from the popular PLINK package for analysis.
Collapse
Affiliation(s)
- Lynn M. Almli
- Department of Psychiatry and Behavioral Sciences, Emory University School of Medicine, Atlanta, Georgia
| | - Richard Duncan
- Department of Human Genetics, Emory University School of Medicine, Atlanta, Georgia
| | - Hao Feng
- Department of Human Genetics, Emory University School of Medicine, Atlanta, Georgia
| | - Debashis Ghosh
- Department of Statistics, Pennsylvania State University, State College
| | - Elisabeth B. Binder
- Department of Psychiatry and Behavioral Sciences, Emory University School of Medicine, Atlanta, Georgia4Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, Munich, Germany
| | - Bekh Bradley
- Department of Psychiatry and Behavioral Sciences, Emory University School of Medicine, Atlanta, Georgia5Mental Health Service Line, Department of Veterans Affairs Medical Center, Atlanta, Georgia
| | - Kerry J. Ressler
- Department of Psychiatry and Behavioral Sciences, Emory University School of Medicine, Atlanta, Georgia
| | - Karen N. Conneely
- Department of Human Genetics, Emory University School of Medicine, Atlanta, Georgia
| | - Michael P. Epstein
- Department of Human Genetics, Emory University School of Medicine, Atlanta, Georgia
| |
Collapse
|
40
|
Chen H, Meigs JB, Dupuis J. Incorporating gene-environment interaction in testing for association with rare genetic variants. Hum Hered 2014; 78:81-90. [PMID: 25060534 PMCID: PMC4169076 DOI: 10.1159/000363347] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2014] [Accepted: 05/03/2014] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVES The incorporation of gene-environment interactions could improve the ability to detect genetic associations with complex traits. For common genetic variants, single-marker interaction tests and joint tests of genetic main effects and gene-environment interaction have been well-established and used to identify novel association loci for complex diseases and continuous traits. For rare genetic variants, however, single-marker tests are severely underpowered due to the low minor allele frequency, and only a few gene-environment interaction tests have been developed. We aimed at developing powerful and computationally efficient tests for gene-environment interaction with rare variants. METHODS In this paper, we propose interaction and joint tests for testing gene-environment interaction of rare genetic variants. Our approach is a generalization of existing gene-environment interaction tests for multiple genetic variants under certain conditions. RESULTS We show in our simulation studies that our interaction and joint tests have correct type I errors, and that the joint test is a powerful approach for testing genetic association, allowing for gene-environment interaction. We also illustrate our approach in a real data example from the Framingham Heart Study. CONCLUSION Our approach can be applied to both binary and continuous traits, it is powerful and computationally efficient.
Collapse
Affiliation(s)
- Han Chen
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA
| | - James B Meigs
- General Medicine Division, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Josée Dupuis
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
- The National Heart, Lung and Blood Institute’s Framingham Heart Study, Framingham, MA, USA
| |
Collapse
|
41
|
Wang X, Epstein MP, Tzeng JY. Analysis of gene-gene interactions using gene-trait similarity regression. Hum Hered 2014; 78:17-26. [PMID: 24969398 DOI: 10.1159/000360161] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2013] [Accepted: 01/30/2014] [Indexed: 12/14/2022] Open
Abstract
OBJECTIVE Gene-gene interactions (G×G) are important to study because of their extensiveness in biological systems and their potential in explaining missing heritability of complex traits. In this work, we propose a new similarity-based test to assess G×G at the gene level, which permits the study of epistasis at biologically functional units with amplified interaction signals. METHODS Under the framework of gene-trait similarity regression (SimReg), we propose a gene-based test for detecting G×G. SimReg uses a regression model to correlate trait similarity with genotypic similarity across a gene. Unlike existing gene-level methods based on leading principal components (PCs), SimReg summarizes all information on genotypic variation within a gene and can be used to assess the joint/interactive effects of two genes as well as the effect of one gene conditional on another. RESULTS Using simulations and a real data application to the Warfarin study, we show that the SimReg G×G tests have satisfactory power and robustness under different genetic architecture when compared to existing gene-based interaction tests such as PC analysis or partial least squares. A genome-wide association study with approx. 20,000 genes may be completed on a parallel computing system in 2 weeks.
Collapse
Affiliation(s)
- Xin Wang
- Bioinformatics Research Center, North Carolina State University, Raleigh, N.C., USA
| | | | | |
Collapse
|
42
|
Tzeng JY, Lu W, Hsu FC. GENE-LEVEL PHARMACOGENETIC ANALYSIS ON SURVIVAL OUTCOMES USING GENE-TRAIT SIMILARITY REGRESSION. Ann Appl Stat 2014; 8:1232-1255. [PMID: 25018788 DOI: 10.1214/14-aoas735] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Gene/pathway-based methods are drawing significant attention due to their usefulness in detecting rare and common variants that affect disease susceptibility. The biological mechanism of drug responses indicates that a gene-based analysis has even greater potential in pharmacogenetics. Motivated by a study from the Vitamin Intervention for Stroke Prevention (VISP) trial, we develop a gene-trait similarity regression for survival analysis to assess the effect of a gene or pathway on time-to-event outcomes. The similarity regression has a general framework that covers a range of survival models, such as the proportional hazards model and the proportional odds model. The inference procedure developed under the proportional hazards model is robust against model misspecification. We derive the equivalence between the similarity survival regression and a random effects model, which further unifies the current variance-component based methods. We demonstrate the effectiveness of the proposed method through simulation studies. In addition, we apply the method to the VISP trial data to identify the genes that exhibit an association with the risk of a recurrent stroke. TCN2 gene was found to be associated with the recurrent stroke risk in the low-dose arm. This gene may impact recurrent stroke risk in response to cofactor therapy.
Collapse
Affiliation(s)
- Jung-Ying Tzeng
- North Carolina State University ; National Cheng-Kung University
| | | | | |
Collapse
|
43
|
Li M, He Z, Zhang M, Zhan X, Wei C, Elston RC, Lu Q. A generalized genetic random field method for the genetic association analysis of sequencing data. Genet Epidemiol 2014; 38:242-53. [PMID: 24482034 PMCID: PMC5241166 DOI: 10.1002/gepi.21790] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2013] [Revised: 11/28/2013] [Accepted: 12/21/2013] [Indexed: 01/23/2023]
Abstract
With the advance of high-throughput sequencing technologies, it has become feasible to investigate the influence of the entire spectrum of sequencing variations on complex human diseases. Although association studies utilizing the new sequencing technologies hold great promise to unravel novel genetic variants, especially rare genetic variants that contribute to human diseases, the statistical analysis of high-dimensional sequencing data remains a challenge. Advanced analytical methods are in great need to facilitate high-dimensional sequencing data analyses. In this article, we propose a generalized genetic random field (GGRF) method for association analyses of sequencing data. Like other similarity-based methods (e.g., SIMreg and SKAT), the new method has the advantages of avoiding the need to specify thresholds for rare variants and allowing for testing multiple variants acting in different directions and magnitude of effects. The method is built on the generalized estimating equation framework and thus accommodates a variety of disease phenotypes (e.g., quantitative and binary phenotypes). Moreover, it has a nice asymptotic property, and can be applied to small-scale sequencing data without need for small-sample adjustment. Through simulations, we demonstrate that the proposed GGRF attains an improved or comparable power over a commonly used method, SKAT, under various disease scenarios, especially when rare variants play a significant role in disease etiology. We further illustrate GGRF with an application to a real dataset from the Dallas Heart Study. By using GGRF, we were able to detect the association of two candidate genes, ANGPTL3 and ANGPTL4, with serum triglyceride.
Collapse
Affiliation(s)
- Ming Li
- Division of Biostatistics, Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, United States of America
| | - Zihuai He
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Min Zhang
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Xiaowei Zhan
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Changshuai Wei
- Department of Epidemiology and Biostatics, Michigan State University, East Lansing, Michigan, United States of America
| | - Robert C. Elston
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio, United States of America
| | - Qing Lu
- Department of Epidemiology and Biostatics, Michigan State University, East Lansing, Michigan, United States of America
| |
Collapse
|
44
|
Wang X, Lee S, Zhu X, Redline S, Lin X. GEE-based SNP set association test for continuous and discrete traits in family-based association studies. Genet Epidemiol 2013; 37:778-86. [PMID: 24166731 PMCID: PMC4007511 DOI: 10.1002/gepi.21763] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2013] [Revised: 08/17/2013] [Accepted: 09/10/2013] [Indexed: 12/17/2022]
Abstract
Family-based genetic association studies of related individuals provide opportunities to detect genetic variants that complement studies of unrelated individuals. Most statistical methods for family association studies for common variants are single marker based, which test one SNP a time. In this paper, we consider testing the effect of an SNP set, e.g., SNPs in a gene, in family studies, for both continuous and discrete traits. Specifically, we propose a generalized estimating equations (GEEs) based kernel association test, a variance component based testing method, to test for the association between a phenotype and multiple variants in an SNP set jointly using family samples. The proposed approach allows for both continuous and discrete traits, where the correlation among family members is taken into account through the use of an empirical covariance estimator. We derive the theoretical distribution of the proposed statistic under the null and develop analytical methods to calculate the P-values. We also propose an efficient resampling method for correcting for small sample size bias in family studies. The proposed method allows for easily incorporating covariates and SNP-SNP interactions. Simulation studies show that the proposed method properly controls for type I error rates under both random and ascertained sampling schemes in family studies. We demonstrate through simulation studies that our approach has superior performance for association mapping compared to the single marker based minimum P-value GEE test for an SNP-set effect over a range of scenarios. We illustrate the application of the proposed method using data from the Cleveland Family GWAS Study.
Collapse
Affiliation(s)
- Xuefeng Wang
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA 02115
| | - Seunggeun Lee
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA 02115
| | - Xiaofeng Zhu
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA 44106
| | - Susan Redline
- Department of Medicine, Brigham and Women’s Hospital and Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
| | - Xihong Lin
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA 02115
| |
Collapse
|
45
|
Byrnes AE, Wu MC, Wright FA, Li M, Li Y. The value of statistical or bioinformatics annotation for rare variant association with quantitative trait. Genet Epidemiol 2013; 37:666-74. [PMID: 23836599 PMCID: PMC4083762 DOI: 10.1002/gepi.21747] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2013] [Revised: 05/20/2013] [Accepted: 06/03/2013] [Indexed: 11/06/2022]
Abstract
In the past few years, a plethora of methods for rare variant association with phenotype have been proposed. These methods aggregate information from multiple rare variants across genomic region(s), but there is little consensus as to which method is most effective. The weighting scheme adopted when aggregating information across variants is one of the primary determinants of effectiveness. Here we present a systematic evaluation of multiple weighting schemes through a series of simulations intended to mimic large sequencing studies of a quantitative trait. We evaluate existing phenotype-independent and phenotype-dependent methods, as well as weights estimated by penalized regression approaches including Lasso, Elastic Net, and SCAD. We find that the difference in power between phenotype-dependent schemes is negligible when high-quality functional annotations are available. When functional annotations are unavailable or incomplete, all methods suffer from power loss; however, the variable selection methods outperform the others at the cost of increased computational time. Therefore, in the absence of good annotation, we recommend variable selection methods (which can be viewed as "statistical annotation") on top of regions implicated by a phenotype-independent weighting scheme. Further, once a region is implicated, variable selection can help to identify potential causal single nucleotide polymorphisms for biological validation. These findings are supported by an analysis of a high coverage targeted sequencing study of 1,898 individuals.
Collapse
Affiliation(s)
- Andrea E. Byrnes
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599
| | - Michael C. Wu
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599
| | - Fred A. Wright
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599
| | - Mingyao Li
- Department of Biostatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, PA 19104
| | - Yun Li
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27599
- Department of Computer Science, University of North Carolina, Chapel Hill, North Carolina 27599
| |
Collapse
|
46
|
|
47
|
Winham SJ, Biernacka JM. Gene-environment interactions in genome-wide association studies: current approaches and new directions. J Child Psychol Psychiatry 2013; 54:1120-34. [PMID: 23808649 PMCID: PMC3829379 DOI: 10.1111/jcpp.12114] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/03/2013] [Indexed: 01/20/2023]
Abstract
BACKGROUND Complex psychiatric traits have long been thought to be the result of a combination of genetic and environmental factors, and gene-environment interactions are thought to play a crucial role in behavioral phenotypes and the susceptibility and progression of psychiatric disorders. Candidate gene studies to investigate hypothesized gene-environment interactions are now fairly common in human genetic research, and with the shift toward genome-wide association studies, genome-wide gene-environment interaction studies are beginning to emerge. METHODS We summarize the basic ideas behind gene-environment interaction, and provide an overview of possible study designs and traditional analysis methods in the context of genome-wide analysis. We then discuss novel approaches beyond the traditional strategy of analyzing the interaction between the environmental factor and each polymorphism individually. RESULTS Two-step filtering approaches that reduce the number of polymorphisms tested for interactions can substantially increase the power of genome-wide gene-environment studies. New analytical methods including data-mining approaches, and gene-level and pathway-level analyses, also have the capacity to improve our understanding of how complex genetic and environmental factors interact to influence psychologic and psychiatric traits. Such methods, however, have not yet been utilized much in behavioral and mental health research. CONCLUSIONS Although methods to investigate gene-environment interactions are available, there is a need for further development and extension of these methods to identify gene-environment interactions in the context of genome-wide association studies. These novel approaches need to be applied in studies of psychology and psychiatry.
Collapse
Affiliation(s)
- Stacey J Winham
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester MN 55905
| | - Joanna M. Biernacka
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester MN 55905,Department of Psychiatry and Psychology, Mayo Clinic, Rochester MN 55905
| |
Collapse
|
48
|
Lin X, Lee S, Christiani DC, Lin X. Test for interactions between a genetic marker set and environment in generalized linear models. Biostatistics 2013; 14:667-81. [PMID: 23462021 PMCID: PMC3769996 DOI: 10.1093/biostatistics/kxt006] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2012] [Revised: 01/21/2013] [Accepted: 01/28/2013] [Indexed: 11/13/2022] Open
Abstract
We consider in this paper testing for interactions between a genetic marker set and an environmental variable. A common practice in studying gene-environment (GE) interactions is to analyze one single-nucleotide polymorphism (SNP) at a time. It is of significant interest to analyze SNPs in a biologically defined set simultaneously, e.g. gene or pathway. In this paper, we first show that if the main effects of multiple SNPs in a set are associated with a disease/trait, the classical single SNP-GE interaction analysis can be biased. We derive the asymptotic bias and study the conditions under which the classical single SNP-GE interaction analysis is unbiased. We further show that, the simple minimum p-value-based SNP-set GE analysis, can be biased and have an inflated Type 1 error rate. To overcome these difficulties, we propose a computationally efficient and powerful gene-environment set association test (GESAT) in generalized linear models. Our method tests for SNP-set by environment interactions using a variance component test, and estimates the main SNP effects under the null hypothesis using ridge regression. We evaluate the performance of GESAT using simulation studies, and apply GESAT to data from the Harvard lung cancer genetic study to investigate GE interactions between the SNPs in the 15q24-25.1 region and smoking on lung cancer risk.
Collapse
Affiliation(s)
- Xinyi Lin
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA
| | | | | | | |
Collapse
|
49
|
Quantitative trait analysis in sequencing studies under trait-dependent sampling. Proc Natl Acad Sci U S A 2013; 110:12247-52. [PMID: 23847208 DOI: 10.1073/pnas.1221713110] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
It is not economically feasible to sequence all study subjects in a large cohort. A cost-effective strategy is to sequence only the subjects with the extreme values of a quantitative trait. In the National Heart, Lung, and Blood Institute Exome Sequencing Project, subjects with the highest or lowest values of body mass index, LDL, or blood pressure were selected for whole-exome sequencing. Failure to account for such trait-dependent sampling can cause severe inflation of type I error and substantial loss of power in quantitative trait analysis, especially when combining results from multiple studies with different selection criteria. We present valid and efficient statistical methods for association analysis of sequencing data under trait-dependent sampling. We pay special attention to gene-based analysis of rare variants. Our methods can be used to perform quantitative trait analysis not only for the trait that is used to select subjects for sequencing but for any other traits that are measured. For a particular trait of interest, our approach properly combines the association results from all studies with measurements of that trait. This meta-analysis is substantially more powerful than the analysis of any single study. By contrast, meta-analysis of standard linear regression results (ignoring trait-dependent sampling) can be less powerful than the analysis of a single study. The advantages of the proposed methods are demonstrated through simulation studies and the National Heart, Lung, and Blood Institute Exome Sequencing Project data. The methods are applicable to other types of genetic association studies and nongenetic studies.
Collapse
|
50
|
Jiao S, Hsu L, Bézieau S, Brenner H, Chan AT, Chang-Claude J, Le Marchand L, Lemire M, Newcomb PA, Slattery ML, Peters U. SBERIA: set-based gene-environment interaction test for rare and common variants in complex diseases. Genet Epidemiol 2013; 37:452-64. [PMID: 23720162 PMCID: PMC3713231 DOI: 10.1002/gepi.21735] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2013] [Revised: 04/04/2013] [Accepted: 04/30/2013] [Indexed: 01/28/2023]
Abstract
Identification of gene-environment interaction (G × E) is important in understanding the etiology of complex diseases. However, partially due to the lack of power, there have been very few replicated G × E findings compared to the success in marginal association studies. The existing G × E testing methods mainly focus on improving the power for individual markers. In this paper, we took a different strategy and proposed a set-based gene-environment interaction test (SBERIA), which can improve the power by reducing the multiple testing burdens and aggregating signals within a set. The major challenge of the signal aggregation within a set is how to tell signals from noise and how to determine the direction of the signals. SBERIA takes advantage of the established correlation screening for G × E to guide the aggregation of genotypes within a marker set. The correlation screening has been shown to be an efficient way of selecting potential G × E candidate SNPs in case-control studies for complex diseases. Importantly, the correlation screening in case-control combined samples is independent of the interaction test. With this desirable feature, SBERIA maintains the correct type I error level and can be easily implemented in a regular logistic regression setting. We showed that SBERIA had higher power than benchmark methods in various simulation scenarios, both for common and rare variants. We also applied SBERIA to real genome-wide association studies (GWAS) data of 10,729 colorectal cancer cases and 13,328 controls and found evidence of interaction between the set of known colorectal cancer susceptibility loci and smoking.
Collapse
Affiliation(s)
- Shuo Jiao
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|