1
|
Bao F, Deng Y, Du M, Ren Z, Zhang Q, Zhao Y, Suo J, Zhang Z, Wang M, Dai Q. Probabilistic natural mapping of gene-level tests for genome-wide association studies. Brief Bioinform 2018; 19:545-553. [PMID: 28200018 DOI: 10.1093/bib/bbx002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Indexed: 11/14/2022] Open
Abstract
Genome-wide association studies (GWASs) generally focus on a single marker, which limits the elucidation of the genetic architecture of complex traits. Herein, we present a new computational framework, termed probabilistic natural mapping (PALM), for performing gene-level association tests. PALM robustly reveals the inherent genomic structures of genes and generates feature representations that can be seamlessly incorporated into conventional statistic tests. Our approach substantially improves the effectiveness of uncovering associations derived from a subgroup of variants with weak effects, which represents a known challenge associated with existing methods. We applied PALM in a gastric cancer GWAS and identified two additional gastric cancer-associated susceptibility genes, NOC3L and RUNDC2A. The robust susceptibility discoveries of PALM are widely supported by existing studies from other biological perspectives. PALM will be useful for further GWAS analytical strategies that use gene-level analyses.
Collapse
Affiliation(s)
- Feng Bao
- Department of Automation, Tsinghua University, China
| | - Yue Deng
- School of Pharmacy, University of California, San Francisco, USA
| | - Mulong Du
- Department of Environmental Genomics, Nanjing Medical University, China
| | - Zhiquan Ren
- Department of Automation, Tsinghua University, China
| | - Qingzhao Zhang
- School of Economics and Wang Yanan Institute for Studies in Economics, Xiamen University, China
| | - Yanyu Zhao
- Department of Biomedical Engineering, Boston University, USA
| | - Jinli Suo
- Department of Automation, Tsinghua University, China
| | - Zhengdong Zhang
- Department of Genetic Toxicology, Key Laboratory of Modern Toxicology of Ministry of Education, School of Public Health, and a PI in Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center For Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China
| | - Meilin Wang
- Department of Genetic Toxicology, Key Laboratory of Modern Toxicology of Ministry of Education, School of Public Health, and a PI in Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center For Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China
| | - Qionghai Dai
- Department of Automation, Tsinghua University, China
| |
Collapse
|
2
|
Abstract
BACKGROUND A large amount of research has been devoted to the detection and investigation of epistatic interactions in genome-wide association studies (GWASs). Most of the literature focuses on low-order interactions between single-nucleotide polymorphisms (SNPs) with significant main effects. RESULTS In this paper we propose an original approach for detecting epistasis at the gene level, without systematically filtering on significant genes. We first compute interaction variables for each gene pair by finding its Eigen-Epistasis component, defined as the linear combination of Gene SNPs having the highest correlation with the phenotype. The selection of significant effects is done using a penalized regression method based on Group Lasso controlling the False Discovery Rate. CONCLUSION The method is tested against two recent alternative proposals from the literature using synthetic data, and shows good performances in different settings. We demonstrate the power of our approach by detecting new gene-gene interactions on three genome-wide association studies.
Collapse
|
3
|
Wang MH, Sun R, Guo J, Weng H, Lee J, Hu I, Sham PC, Zee BCY. A fast and powerful W-test for pairwise epistasis testing. Nucleic Acids Res 2016; 44:e115. [PMID: 27112568 PMCID: PMC4937324 DOI: 10.1093/nar/gkw347] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2015] [Revised: 04/14/2016] [Accepted: 04/15/2016] [Indexed: 01/08/2023] Open
Abstract
Epistasis plays an essential role in the development of complex diseases. Interaction methods face common challenge of seeking a balance between persistent power, model complexity, computation efficiency, and validity of identified bio-markers. We introduce a novel W-test to identify pairwise epistasis effect, which measures the distributional difference between cases and controls through a combined log odds ratio. The test is model-free, fast, and inherits a Chi-squared distribution with data adaptive degrees of freedom. No permutation is needed to obtain the P-values. Simulation studies demonstrated that the W-test is more powerful in low frequency variants environment than alternative methods, which are the Chi-squared test, logistic regression and multifactor-dimensionality reduction (MDR). In two independent real bipolar disorder genome-wide associations (GWAS) datasets, the W-test identified significant interactions pairs that can be replicated, including SLIT3-CENPN, SLIT3-TMEM132D, CNTNAP2-NDST4 and CNTCAP2-RTN4R The genes in the pairs play central roles in neurotransmission and synapse formation. A majority of the identified loci are undiscoverable by main effect and are low frequency variants. The proposed method offers a powerful alternative tool for mapping the genetic puzzle underlying complex disorders.
Collapse
Affiliation(s)
- Maggie Haitian Wang
- Division of Biostatistics and Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, the Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China CUHK Shenzhen Research Institute, Shenzhen, China
| | - Rui Sun
- Division of Biostatistics and Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, the Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China CUHK Shenzhen Research Institute, Shenzhen, China
| | - Junfeng Guo
- The Australian National University, Canberra, Australia
| | - Haoyi Weng
- Division of Biostatistics and Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, the Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China CUHK Shenzhen Research Institute, Shenzhen, China
| | - Jack Lee
- Division of Biostatistics and Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, the Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China
| | - Inchi Hu
- ISOM Department and Biomedical Engineering Division, the Hong Kong University of Science and Technology, Kowloon, Hong Kong SAR, China
| | - Pak Chung Sham
- Department of Psychiatry; Centre for Genomic Sciences, the University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
| | - Benny Chung-Ying Zee
- Division of Biostatistics and Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, the Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China CUHK Shenzhen Research Institute, Shenzhen, China
| |
Collapse
|
4
|
Wang Y, Li D, Wei P. Powerful Tukey's One Degree-of-Freedom Test for Detecting Gene-Gene and Gene-Environment Interactions. Cancer Inform 2015; 14:209-18. [PMID: 26064040 PMCID: PMC4459566 DOI: 10.4137/cin.s17305] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2014] [Revised: 04/20/2015] [Accepted: 04/28/2015] [Indexed: 12/17/2022] Open
Abstract
Genome-wide association studies (GWASs) have identified thousands of single nucleotide polymorphisms (SNPs) robustly associated with hundreds of complex human diseases including cancers. However, the large number of GWAS-identified genetic loci only explains a small proportion of the disease heritability. This “missing heritability” problem has been partly attributed to the yet-to-be-identified gene–gene (G × G) and gene–environment (G × E) interactions. In spite of the important roles of G × G and G × E interactions in understanding disease mechanisms and filling in the missing heritability, straightforward GWAS scanning for such interactions has very limited statistical power, leading to few successes. Here we propose a two-step statistical approach to test G × G/G × E interactions: the first step is to perform principal component analysis (PCA) on the multiple SNPs within a gene region, and the second step is to perform Tukey’s one degree-of-freedom (1-df) test on the leading PCs. We derive a score test that is computationally fast and numerically stable for the proposed Tukey’s 1-df interaction test. Using extensive simulations we show that the proposed approach, which combines the two parsimonious models, namely, the PCA and Tukey’s 1-df form of interaction, outperforms other state-of-the-art methods. We also demonstrate the utility and efficiency gains of the proposed method with applications to testing G × G interactions for Crohn’s disease using the Wellcome Trust Case Control Consortium (WTCCC) GWAS data and testing G × E interaction using data from a case–control study of pancreatic cancer.
Collapse
Affiliation(s)
- Yaping Wang
- Department of Biostatistics, School of Public Health, University of Texas Health Science Center
| | - Donghui Li
- Department of Gastrointestinal Medical Oncology, The University of Texas MD Anderson Cancer Center
| | - Peng Wei
- Department of Biostatistics, School of Public Health, University of Texas Health Science Center ; Human Genetics Center, School of Public Health, University of Texas Health Science Center, Houston, TX, USA
| |
Collapse
|
5
|
Ma L, Keinan A, Clark AG. Biological knowledge-driven analysis of epistasis in human GWAS with application to lipid traits. Methods Mol Biol 2015; 1253:35-45. [PMID: 25403526 DOI: 10.1007/978-1-4939-2155-3_3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
While the importance of epistasis is well established, specific gene-gene interactions have rarely been identified in human genome-wide association studies (GWAS), mainly due to low power associated with such interaction tests. In this chapter, we integrate biological knowledge and human GWAS data to reveal epistatic interactions underlying quantitative lipid traits, which are major risk factors for coronary artery disease. To increase power to detect interactions, we only tested pairs of SNPs filtered by prior biological knowledge, including GWAS results, protein-protein interactions (PPIs), and pathway information. Using published GWAS and 9,713 European Americans (EA) from the Atherosclerosis Risk in Communities (ARIC) study, we identified an interaction between HMGCR and LIPC affecting high-density lipoprotein cholesterol (HDL-C) levels. We then validated this interaction in additional multiethnic cohorts from ARIC, the Framingham Heart Study, and the Multi-Ethnic Study of Atherosclerosis. Both HMGCR and LIPC are involved in the metabolism of lipids and lipoproteins, and LIPC itself has been marginally associated with HDL-C. Furthermore, no significant interaction was detected using PPI and pathway information, mainly due to the stringent significance level required after correcting for the large number of tests conducted. These results suggest the potential of biological knowledge-driven approaches to detect epistatic interactions in human GWAS, which may hold the key to exploring the role gene-gene interactions play in connecting genotypes and complex phenotypes in future GWAS.
Collapse
Affiliation(s)
- Li Ma
- Department of Animal and Avian Sciences, University of Maryland, Bldg 142, College Park, MD, 20742, USA,
| | | | | |
Collapse
|
6
|
Wang X, Zhang D, Tzeng JY. Pathway-guided identification of gene-gene interactions. Ann Hum Genet 2014; 78:478-91. [PMID: 25227508 PMCID: PMC4363308 DOI: 10.1111/ahg.12080] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2014] [Accepted: 07/03/2014] [Indexed: 12/26/2022]
Abstract
Assessing gene-gene interactions (GxG) at the gene level can permit examination of epistasis at biologically functional units with amplified interaction signals from marker-marker pairs. While current gene-based GxG methods tend to be designed for two or a few genes, for complex traits, it is often common to have a list of many candidate genes to explore GxG. We propose a regression model with pathway-guided regularization for detecting interactions among genes. Specifically, we use the principal components to summarize the SNP-SNP interactions between a gene pair, and use an L1 penalty that incorporates adaptive weights based on biological guidance and trait supervision to identify important main and interaction effects. Our approach aims to combine biological guidance and data adaptiveness, and yields credible findings that may be likely to shed insights in order to formulate biological hypotheses for further molecular studies. The proposed approach can be used to explore the GxG with a list of many candidate genes and is applicable even when sample size is smaller than the number of predictors studied. We evaluate the utility of the proposed method using simulation and real data analysis. The results suggest improved performance over methods not utilizing pathway and trait guidance.
Collapse
Affiliation(s)
- Xin Wang
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA
- Department of Statistics, North Carolina State University, Raleigh, NC, USA
| | - Daowen Zhang
- Department of Statistics, North Carolina State University, Raleigh, NC, USA
| | - Jung-Ying Tzeng
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA
- Department of Statistics, North Carolina State University, Raleigh, NC, USA
- Department of Statistics, National Cheng-Kung University, Tainan, Taiwan
| |
Collapse
|
7
|
Identifying interacting genetic variations by fish-swarm logic regression. BIOMED RESEARCH INTERNATIONAL 2014; 2013:574735. [PMID: 23984382 PMCID: PMC3747618 DOI: 10.1155/2013/574735] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/02/2013] [Revised: 06/08/2013] [Accepted: 07/02/2013] [Indexed: 11/18/2022]
Abstract
Understanding associations between genotypes and complex traits is a fundamental problem in human genetics. A major open problem in mapping phenotypes is that of identifying a set of interacting genetic variants, which might contribute to complex traits. Logic regression (LR) is a powerful multivariant association tool. Several LR-based approaches have been successfully applied to different datasets. However, these approaches are not adequate with regard to accuracy and efficiency. In this paper, we propose a new LR-based approach, called fish-swarm logic regression (FSLR), which improves the logic regression process by incorporating swarm optimization. In our approach, a school of fish agents are conducted in parallel. Each fish agent holds a regression model, while the school searches for better models through various preset behaviors. A swarm algorithm improves the accuracy and the efficiency by speeding up the convergence and preventing it from dropping into local optimums. We apply our approach on a real screening dataset and a series of simulation scenarios. Compared to three existing LR-based approaches, our approach outperforms them by having lower type I and type II error rates, being able to identify more preset causal sites, and performing at faster speeds.
Collapse
|
8
|
Winham SJ, Biernacka JM. Gene-environment interactions in genome-wide association studies: current approaches and new directions. J Child Psychol Psychiatry 2013; 54:1120-34. [PMID: 23808649 PMCID: PMC3829379 DOI: 10.1111/jcpp.12114] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/03/2013] [Indexed: 01/20/2023]
Abstract
BACKGROUND Complex psychiatric traits have long been thought to be the result of a combination of genetic and environmental factors, and gene-environment interactions are thought to play a crucial role in behavioral phenotypes and the susceptibility and progression of psychiatric disorders. Candidate gene studies to investigate hypothesized gene-environment interactions are now fairly common in human genetic research, and with the shift toward genome-wide association studies, genome-wide gene-environment interaction studies are beginning to emerge. METHODS We summarize the basic ideas behind gene-environment interaction, and provide an overview of possible study designs and traditional analysis methods in the context of genome-wide analysis. We then discuss novel approaches beyond the traditional strategy of analyzing the interaction between the environmental factor and each polymorphism individually. RESULTS Two-step filtering approaches that reduce the number of polymorphisms tested for interactions can substantially increase the power of genome-wide gene-environment studies. New analytical methods including data-mining approaches, and gene-level and pathway-level analyses, also have the capacity to improve our understanding of how complex genetic and environmental factors interact to influence psychologic and psychiatric traits. Such methods, however, have not yet been utilized much in behavioral and mental health research. CONCLUSIONS Although methods to investigate gene-environment interactions are available, there is a need for further development and extension of these methods to identify gene-environment interactions in the context of genome-wide association studies. These novel approaches need to be applied in studies of psychology and psychiatry.
Collapse
Affiliation(s)
- Stacey J Winham
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester MN 55905
| | - Joanna M. Biernacka
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester MN 55905,Department of Psychiatry and Psychology, Mayo Clinic, Rochester MN 55905
| |
Collapse
|
9
|
Li F, Zhao J, Yuan Z, Zhang X, Ji J, Xue F. A powerful latent variable method for detecting and characterizing gene-based gene-gene interaction on multiple quantitative traits. BMC Genet 2013; 14:89. [PMID: 24059907 PMCID: PMC3848962 DOI: 10.1186/1471-2156-14-89] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2013] [Accepted: 09/17/2013] [Indexed: 01/10/2023] Open
Abstract
Background On thinking quantitatively of complex diseases, there are at least three statistical strategies for analyzing the gene-gene interaction: SNP by SNP interaction on single trait, gene-gene (each can involve multiple SNPs) interaction on single trait and gene-gene interaction on multiple traits. The third one is the most general in dissecting the genetic mechanism underlying complex diseases underpinning multiple quantitative traits. In this paper, we developed a novel statistic for this strategy through modifying the Partial Least Squares Path Modeling (PLSPM), called mPLSPM statistic. Results Simulation studies indicated that mPLSPM statistic was powerful and outperformed the principal component analysis (PCA) based linear regression method. Application to real data in the EPIC-Norfolk GWAS sub-cohort showed suggestive interaction (γ) between TMEM18 gene and BDNF gene on two composite body shape scores (γ = 0.047 and γ = 0.058, with P = 0.021, P = 0.005), and BMI (γ = 0.043, P = 0.034). This suggested these scores (synthetically latent traits) were more suitable to capture the obesity related genetic interaction effect between genes compared to single trait. Conclusions The proposed novel mPLSPM statistic is a valid and powerful gene-based method for detecting gene-gene interaction on multiple quantitative phenotypes.
Collapse
Affiliation(s)
- Fangyu Li
- Department of Epidemiology and Biostatistics, School of Public Health, Shandong University, Jinan 250012, China.
| | | | | | | | | | | |
Collapse
|
10
|
Larson NB, Schaid DJ. A kernel regression approach to gene-gene interaction detection for case-control studies. Genet Epidemiol 2013; 37:695-703. [PMID: 23868214 DOI: 10.1002/gepi.21749] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2013] [Revised: 05/07/2013] [Accepted: 06/12/2013] [Indexed: 01/13/2023]
Abstract
Gene-gene interactions are increasingly being addressed as a potentially important contributor to the variability of complex traits. Consequently, attentions have moved beyond single locus analysis of association to more complex genetic models. Although several single-marker approaches toward interaction analysis have been developed, such methods suffer from very high testing dimensionality and do not take advantage of existing information, notably the definition of genes as functional units. Here, we propose a comprehensive family of gene-level score tests for identifying genetic elements of disease risk, in particular pairwise gene-gene interactions. Using kernel machine methods, we devise score-based variance component tests under a generalized linear mixed model framework. We conducted simulations based upon coalescent genetic models to evaluate the performance of our approach under a variety of disease models. These simulations indicate that our methods are generally higher powered than alternative gene-level approaches and at worst competitive with exhaustive SNP-level (where SNP is single-nucleotide polymorphism) analyses. Furthermore, we observe that simulated epistatic effects resulted in significant marginal testing results for the involved genes regardless of whether or not true main effects were present. We detail the benefits of our methods and discuss potential genome-wide analysis strategies for gene-gene interaction analysis in a case-control study design.
Collapse
Affiliation(s)
- Nicholas B Larson
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota
| | | |
Collapse
|
11
|
Gene-based testing of interactions in association studies of quantitative traits. PLoS Genet 2013; 9:e1003321. [PMID: 23468652 PMCID: PMC3585009 DOI: 10.1371/journal.pgen.1003321] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2012] [Accepted: 12/31/2012] [Indexed: 01/05/2023] Open
Abstract
Various methods have been developed for identifying gene–gene interactions in genome-wide association studies (GWAS). However, most methods focus on individual markers as the testing unit, and the large number of such tests drastically erodes statistical power. In this study, we propose novel interaction tests of quantitative traits that are gene-based and that confer advantage in both statistical power and biological interpretation. The framework of gene-based gene–gene interaction (GGG) tests combine marker-based interaction tests between all pairs of markers in two genes to produce a gene-level test for interaction between the two. The tests are based on an analytical formula we derive for the correlation between marker-based interaction tests due to linkage disequilibrium. We propose four GGG tests that extend the following P value combining methods: minimum P value, extended Simes procedure, truncated tail strength, and truncated P value product. Extensive simulations point to correct type I error rates of all tests and show that the two truncated tests are more powerful than the other tests in cases of markers involved in the underlying interaction not being directly genotyped and in cases of multiple underlying interactions. We applied our tests to pairs of genes that exhibit a protein–protein interaction to test for gene-level interactions underlying lipid levels using genotype data from the Atherosclerosis Risk in Communities study. We identified five novel interactions that are not evident from marker-based interaction testing and successfully replicated one of these interactions, between SMAD3 and NEDD9, in an independent sample from the Multi-Ethnic Study of Atherosclerosis. We conclude that our GGG tests show improved power to identify gene-level interactions in existing, as well as emerging, association studies. Epistasis is likely to play a significant role in complex diseases or traits and is one of the many possible explanations for “missing heritability.” However, epistatic interactions have been difficult to detect in genome-wide association studies (GWAS) due to the limited power caused by the multiple-testing correction from the large number of tests conducted. Gene-based gene–gene interaction (GGG) tests might hold the key to relaxing the multiple-testing correction burden and increasing the power for identifying epistatic interactions in GWAS. Here, we developed GGG tests of quantitative traits by extending four P value combining methods and evaluated their type I error rates and power using extensive simulations. All four GGG tests are more powerful than a principal component-based test. We also applied our GGG tests to data from the Atherosclerosis Risk in Communities study and found five gene-level interactions associated with the levels of total cholesterol and high-density lipoprotein cholesterol (HDL-C). One interaction between SMAD3 and NEDD9 on HDL-C was further replicated in an independent sample from the Multi-Ethnic Study of Atherosclerosis.
Collapse
|
12
|
HOLZINGER EMILYR, DUDEK SCOTTM, FRASE ALEXT, KRAUSS RONALDM, MEDINA MARISAW, RITCHIE MARYLYND. ATHENA: a tool for meta-dimensional analysis applied to genotypes and gene expression data to predict HDL cholesterol levels. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2013:385-396. [PMID: 23424143 PMCID: PMC3587764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Technology is driving the field of human genetics research with advances in techniques to generate high-throughput data that interrogate various levels of biological regulation. With this massive amount of data comes the important task of using powerful bioinformatics techniques to sift through the noise to find true signals that predict various human traits. A popular analytical method thus far has been the genome-wide association study (GWAS), which assesses the association of single nucleotide polymorphisms (SNPs) with the trait of interest. Unfortunately, GWAS has not been able to explain a substantial proportion of the estimated heritability for most complex traits. Due to the inherently complex nature of biology, this phenomenon could be a factor of the simplistic study design. A more powerful analysis may be a systems biology approach that integrates different types of data, or a meta-dimensional analysis. For this study we used the Analysis Tool for Heritable and Environmental Network Associations (ATHENA) to integrate high-throughput SNPs and gene expression variables (EVs) to predict high-density lipoprotein cholesterol (HDL-C) levels. We generated multivariable models that consisted of SNPs only, EVs only, and SNPs + EVs with testing r-squared values of 0.16, 0.11, and 0.18, respectively. Additionally, using just the SNPs and EVs from the best models, we generated a model with a testing r-squared of 0.32. A linear regression model with the same variables resulted in an adjusted r-squared of 0.23. With this systems biology approach, we were able to integrate different types of high-throughput data to generate meta-dimensional models that are predictive for the HDL-C in our data set. Additionally, our modeling method was able to capture more of the HDL-C variation than a linear regression model that included the same variables.
Collapse
Affiliation(s)
| | - SCOTT M. DUDEK
- Center for Systems Genomics, Pennsylvania State University, University Park, PA 16803, USA
| | - ALEX T. FRASE
- Center for Systems Genomics, Pennsylvania State University, University Park, PA 16803, USA
| | - RONALD M. KRAUSS
- Children’s Hospital Oakland Research Institute, Oakland, CA 94609, USA
| | - MARISA W. MEDINA
- Children’s Hospital Oakland Research Institute, Oakland, CA 94609, USA
| | - MARYLYN D. RITCHIE
- Center for Systems Genomics, Pennsylvania State University, University Park, PA 16803, USA
| |
Collapse
|
13
|
Li S, Cui Y. Gene-centric gene–gene interaction: A model-based kernel machine method. Ann Appl Stat 2012. [DOI: 10.1214/12-aoas545] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
14
|
Knowledge-driven analysis identifies a gene-gene interaction affecting high-density lipoprotein cholesterol levels in multi-ethnic populations. PLoS Genet 2012; 8:e1002714. [PMID: 22654671 PMCID: PMC3359971 DOI: 10.1371/journal.pgen.1002714] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2012] [Accepted: 03/30/2012] [Indexed: 12/17/2022] Open
Abstract
Total cholesterol, low-density lipoprotein cholesterol, triglyceride, and high-density lipoprotein cholesterol (HDL-C) levels are among the most important risk factors for coronary artery disease. We tested for gene–gene interactions affecting the level of these four lipids based on prior knowledge of established genome-wide association study (GWAS) hits, protein–protein interactions, and pathway information. Using genotype data from 9,713 European Americans from the Atherosclerosis Risk in Communities (ARIC) study, we identified an interaction between HMGCR and a locus near LIPC in their effect on HDL-C levels (Bonferroni corrected Pc = 0.002). Using an adaptive locus-based validation procedure, we successfully validated this gene–gene interaction in the European American cohorts from the Framingham Heart Study (Pc = 0.002) and the Multi-Ethnic Study of Atherosclerosis (MESA; Pc = 0.006). The interaction between these two loci is also significant in the African American sample from ARIC (Pc = 0.004) and in the Hispanic American sample from MESA (Pc = 0.04). Both HMGCR and LIPC are involved in the metabolism of lipids, and genome-wide association studies have previously identified LIPC as associated with levels of HDL-C. However, the effect on HDL-C of the novel gene–gene interaction reported here is twice as pronounced as that predicted by the sum of the marginal effects of the two loci. In conclusion, based on a knowledge-driven analysis of epistasis, together with a new locus-based validation method, we successfully identified and validated an interaction affecting a complex trait in multi-ethnic populations. Genome-wide association studies (GWAS) have identified many loci associated with complex human traits or diseases. However, the fraction of heritable variation explained by these loci is often relatively low. Gene–gene interactions might play a significant role in complex traits or diseases and are one of the many possible factors contributing to the missing heritability. However, to date only a few interactions have been found and validated in GWAS due to the limited power caused by the need for multiple-testing correction for the very large number of tests conducted. Here, we used three types of prior knowledge, known GWAS hits, protein–protein interactions, and pathway information, to guide our search for gene–gene interactions affecting four lipid levels. We identified an interaction between HMGCR and a locus near LIPC in their effect on high-density lipoprotein cholesterol (HDL-C) and another pair of loci that interact in their effect on low-density lipoprotein cholesterol (LDL-C). We validated the interaction on HDL-C in a number of independent multiple-ethnic populations, while the interaction underlying LDL-C did not validate. The prior knowledge-driven searching approach and a locus-based validation procedure show the potential for dissecting and validating gene–gene interactions in current and future GWAS.
Collapse
|
15
|
Chikkagoudar S, Wang K, Li M. GENIE: a software package for gene-gene interaction analysis in genetic association studies using multiple GPU or CPU cores. BMC Res Notes 2011; 4:158. [PMID: 21615923 PMCID: PMC3115877 DOI: 10.1186/1756-0500-4-158] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2011] [Accepted: 05/26/2011] [Indexed: 11/23/2022] Open
Abstract
Background Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs) have multiple cores, whereas Graphics Processing Units (GPUs) also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Findings Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1) the interaction of SNPs within it in parallel, and 2) the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run. Conclusions GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from http://www.cceb.upenn.edu/~mli/software/GENIE/.
Collapse
Affiliation(s)
- Satish Chikkagoudar
- Department of Biostatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, PA, USA.
| | | | | |
Collapse
|