101
|
Basu S, Pan W, Shen X, Oetting WS. Multilocus association testing with penalized regression. Genet Epidemiol 2011; 35:755-65. [PMID: 21922539 DOI: 10.1002/gepi.20625] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2011] [Revised: 06/09/2011] [Accepted: 07/04/2011] [Indexed: 12/26/2022]
Abstract
In multilocus association analysis, since some markers may not be associated with a trait, it seems attractive to use penalized regression with the capability of automatic variable selection. On the other hand, in spite of a rapidly growing body of literature on penalized regression, most focus on variable selection and outcome prediction, for which penalized methods are generally more effective than their nonpenalized counterparts. However, for statistical inference, i.e. hypothesis testing and interval estimation, it is less clear how penalized methods would perform, or even how to best apply them, largely due to lack of studies on this topic. In our motivating data for a cohort of kidney transplant recipients, it is of primary interest to assess whether a group of genetic variants are associated with a binary clinical outcome, acute rejection at 6 months. In this article, we study some technical issues and alternative implementations of hypothesis testing in Lasso penalized logistic regression, and compare their performance with each other and with several existing global tests, some of which are specifically designed as variance component tests for high-dimensional data. The most interesting, and perhaps surprising, conclusion of this study is that, for low to moderately high-dimensional data, statistical tests based on Lasso penalized regression are not necessarily more powerful than some existing global tests. In addition, in penalized regression, rather than building a test based on a single selected "best" model, combining multiple tests, each of which is built on a candidate model, might be more promising.
Collapse
Affiliation(s)
- Saonli Basu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| | | | | | | |
Collapse
|
102
|
Wang L, Jia P, Wolfinger RD, Chen X, Zhao Z. Gene set analysis of genome-wide association studies: methodological issues and perspectives. Genomics 2011; 98:1-8. [PMID: 21565265 PMCID: PMC3852939 DOI: 10.1016/j.ygeno.2011.04.006] [Citation(s) in RCA: 164] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2010] [Revised: 03/02/2011] [Accepted: 04/15/2011] [Indexed: 12/25/2022]
Abstract
Recent studies have demonstrated that gene set analysis, which tests disease association with genetic variants in a group of functionally related genes, is a promising approach for analyzing and interpreting genome-wide association studies (GWAS) data. These approaches aim to increase power by combining association signals from multiple genes in the same gene set. In addition, gene set analysis can also shed more light on the biological processes underlying complex diseases. However, current approaches for gene set analysis are still in an early stage of development in that analysis results are often prone to sources of bias, including gene set size and gene length, linkage disequilibrium patterns and the presence of overlapping genes. In this paper, we provide an in-depth review of the gene set analysis procedures, along with parameter choices and the particular methodology challenges at each stage. In addition to providing a survey of recently developed tools, we also classify the analysis methods into larger categories and discuss their strengths and limitations. In the last section, we outline several important areas for improving the analytical strategies in gene set analysis.
Collapse
Affiliation(s)
- Lily Wang
- Department of Biostatistics, Vanderbilt University, Nashville, TN 37232, USA
| | - Peilin Jia
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
- Department of Psychiatry, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | | | - Xi Chen
- Division of Cancer Biostatistics, Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
- Department of Psychiatry, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
- Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| |
Collapse
|
103
|
Eleftherohorinou H, Hoggart CJ, Wright VJ, Levin M, Coin LJ. Pathway-driven gene stability selection of two rheumatoid arthritis GWAS identifies and validates new susceptibility genes in receptor mediated signalling pathways. Hum Mol Genet 2011; 20:3494-506. [DOI: 10.1093/hmg/ddr248] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
|
104
|
Slattery ML, Lundgreen A, Herrick JS, Kadlubar S, Caan BJ, Potter JD, Wolff RK. Genetic variation in bone morphogenetic protein and colon and rectal cancer. Int J Cancer 2011; 130:653-64. [PMID: 21387313 DOI: 10.1002/ijc.26047] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2010] [Accepted: 03/14/2011] [Indexed: 12/28/2022]
Abstract
Bone morphogenetic proteins (BMP) are part of the TGF-β-signaling pathway; genetic variation in these genes may be involved in colorectal cancer. In this study, we evaluated the association between genetic variation in BMP1 (11 tagSNPs), BMP2 (5 tagSNPs), BMP4 (3 tagSNPs), BMPR1A (9 tagSNPs), BMPR1B (21 tagSNPs), BMPR2 (11 tagSNPs) and GDF10 (7 tagSNPs) with risk of colon and rectal cancer and tumor molecular phenotype. We used data from population-based case-control studies (colon cancer n = 1,574 cases, 1,970 controls; rectal cancer n = 791 cases, 999 controls). We observed that genetic variation in BMPR1A, BMPR1B, BMPR2, BMP2 and BMP4 was associated with risk of developing colon cancer, with 20 to 30% increased risk for most high-risk genotypes. A summary of high-risk genotypes showed over a twofold increase in colon cancer risk at the upper risk category (OR = 2.49 95% CI = 1.95, 3.18). BMPR2, BMPR1B, BMP2 and GDF10 were associated with rectal cancer. BMPR2 rs2228545 was associated with an almost twofold increased risk of rectal cancer. The risk associated with the highest category of the summary score for rectal cancer was 2.97 (95% CI = 1.87, 4.72). Genes in the BMP-signaling pathway were consistently associated with CIMP+ status in combination with both KRAS-mutated and MSI tumors. BMP genes interacted statistically significantly with other genes in the TGF-β-signaling pathway, including TGFβ1, TGFβR1, Smad 3, Smad 4 and Smad 7. Our data support a role for genetic variation in BMP-related genes in the etiology of colon and rectal cancer. One possible mechanism is via the TGF-β-signaling pathway.
Collapse
Affiliation(s)
- Martha L Slattery
- Department of Internal Medicine, University of Utah Health Sciences Center, Salt Lake City, UT, USA.
| | | | | | | | | | | | | |
Collapse
|
105
|
Fridley BL, Biernacka JM. Gene set analysis of SNP data: benefits, challenges, and future directions. Eur J Hum Genet 2011; 19:837-43. [PMID: 21487444 DOI: 10.1038/ejhg.2011.57] [Citation(s) in RCA: 108] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
The last decade of human genetic research witnessed the completion of hundreds of genome-wide association studies (GWASs). However, the genetic variants discovered through these efforts account for only a small proportion of the heritability of complex traits. One explanation for the missing heritability is that the common analysis approach, assessing the effect of each single-nucleotide polymorphism (SNP) individually, is not well suited to the detection of small effects of multiple SNPs. Gene set analysis (GSA) is one of several approaches that may contribute to the discovery of additional genetic risk factors for complex traits. Complex phenotypes are thought to be controlled by networks of interacting biochemical and physiological pathways influenced by the products of sets of genes. By assessing the overall evidence of association of a phenotype with all measured variation in a set of genes, GSA may identify functionally relevant sets of genes corresponding to relevant biomolecular pathways, which will enable more focused studies of genetic risk factors. This approach may thus contribute to the discovery of genetic variants responsible for some of the missing heritability. With the increased use of these approaches for the secondary analysis of data from GWAS, it is important to understand the different GSA methods and their strengths and weaknesses, and consider challenges inherent in these types of analyses. This paper provides an overview of GSA, highlighting the key challenges, potential solutions, and directions for ongoing research.
Collapse
Affiliation(s)
- Brooke L Fridley
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA.
| | | |
Collapse
|
106
|
Slattery ML, Lundgreen A, Herrick JS, Wolff RK, Caan BJ. Genetic variation in the transforming growth factor-β signaling pathway and survival after diagnosis with colon and rectal cancer. Cancer 2011; 117:4175-83. [PMID: 21365634 DOI: 10.1002/cncr.26018] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2010] [Revised: 01/12/2011] [Accepted: 02/01/2011] [Indexed: 01/10/2023]
Abstract
BACKGROUND The transforming growth factor-β (TGF-β) signaling pathway is involved in many aspects of tumorigenesis, including angiogenesis and metastasis. The authors evaluated this pathway in association with survival after a diagnosis of colon or rectal cancer. METHODS The study included 1553 patients with colon cancer and 754 patients with rectal cancer who had incident first primary disease and were followed for a minimum of 7 years after diagnosis. Genetic variations were evaluated in the genes TGF-β1 (2 single nucleotide polymorphisms [SNPs]), TGF-β receptor 1 (TGF-βR1) (3 SNPs), smooth muscle actin/mothers against decapentaplegic homolog 1 (Smad1) (5 SNPs), Smad2 (4 SNPs), Smad3 (37 SNPs), Smad4 (2 SNPs), Smad7 (11 SNPs), bone morphogenetic protein 1 (BMP1) (11 SNPs), BMP2 (5 SNPs), BMP4 (3 SNPs), bone morphogenetic protein receptor 1A (BMPR1A) (9 SNPs), BMPR1B (21 SNPs), BMPR2 (11 SNPs), growth differentiation factor 10 (GDF10) (7 SNPs), Runt-related transcription factor 1 (RUNX1) (40 SNPs), RUNX2 (19 SNPs), RUNX3 (9 SNPs), eukaryotic translation initiation factor 4E (eiF4E) (3 SNPs), eukaryotic translation initiation factor 4E-binding protein 3 (eiF4EBP2) (2 SNPs), eiF4EBP3 (2 SNPs), and mitogen-activated protein kinase 1 (MAPK1) (6 SNPs). RESULTS After adjusting for American Joint Committee on Cancer stage and tumor molecular phenotype, 12 genes and 18 SNPs were associated with survival in patients with colon cancer, and 7 genes and 15 tagSNPs were associated with survival after a diagnosis of rectal cancer. A summary score based on "at-risk" genotypes revealed a hazard rate ratio of 5.10 (95% confidence interval, 2.56-10.15) for the group with the greatest number of "at-risk" genotypes; for rectal cancer, the hazard rate ratio was 6.03 (95% confidence interval, 2.83-12.75). CONCLUSIONS The current findings suggest that the presence of several higher risk alleles in the TGF-β signaling pathway increase the likelihood of dying after a diagnosis of colon or rectal cancer.
Collapse
Affiliation(s)
- Martha L Slattery
- Department of Internal Medicine, University of Utah Health Sciences Center, Salt Lake City, Utah 84108, USA.
| | | | | | | | | |
Collapse
|
107
|
Wang L, Jia P, Wolfinger RD, Chen X, Grayson BL, Aune TM, Zhao Z. An efficient hierarchical generalized linear mixed model for pathway analysis of genome-wide association studies. ACTA ACUST UNITED AC 2011; 27:686-92. [PMID: 21266443 DOI: 10.1093/bioinformatics/btq728] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION In genome-wide association studies (GWAS) of complex diseases, genetic variants having real but weak associations often fail to be detected at the stringent genome-wide significance level. Pathway analysis, which tests disease association with combined association signals from a group of variants in the same pathway, has become increasingly popular. However, because of the complexities in genetic data and the large sample sizes in typical GWAS, pathway analysis remains to be challenging. We propose a new statistical model for pathway analysis of GWAS. This model includes a fixed effects component that models mean disease association for a group of genes, and a random effects component that models how each gene's association with disease varies about the gene group mean, thus belongs to the class of mixed effects models. RESULTS The proposed model is computationally efficient and uses only summary statistics. In addition, it corrects for the presence of overlapping genes and linkage disequilibrium (LD). Via simulated and real GWAS data, we showed our model improved power over currently available pathway analysis methods while preserving type I error rate. Furthermore, using the WTCCC Type 1 Diabetes (T1D) dataset, we demonstrated mixed model analysis identified meaningful biological processes that agreed well with previous reports on T1D. Therefore, the proposed methodology provides an efficient statistical modeling framework for systems analysis of GWAS. AVAILABILITY The software code for mixed models analysis is freely available at http://biostat.mc.vanderbilt.edu/LilyWang.
Collapse
Affiliation(s)
- Lily Wang
- Department of Biostatistics, Vanderbilt University, Nashville, TN 37232, USA.
| | | | | | | | | | | | | |
Collapse
|
108
|
|
109
|
Wang K, Li M, Hakonarson H. Analysing biological pathways in genome-wide association studies. Nat Rev Genet 2010; 11:843-54. [PMID: 21085203 DOI: 10.1038/nrg2884] [Citation(s) in RCA: 581] [Impact Index Per Article: 41.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Genome-wide association (GWA) studies have typically focused on the analysis of single markers, which often lacks the power to uncover the relatively small effect sizes conferred by most genetic variants. Recently, pathway-based approaches have been developed, which use prior biological knowledge on gene function to facilitate more powerful analysis of GWA study data sets. These approaches typically examine whether a group of related genes in the same functional pathway are jointly associated with a trait of interest. Here we review the development of pathway-based approaches for GWA studies, discuss their practical use and caveats, and suggest that pathway-based approaches may also be useful for future GWA studies with sequencing data.
Collapse
Affiliation(s)
- Kai Wang
- Center for Applied Genomics, The Childrens Hospital of Philadelphia, Pennsylvania 19104, USA
| | | | | |
Collapse
|
110
|
Wilson MA, Baurley JW, Thomas DC, Conti DV. Complex system approaches to genetic analysis Bayesian approaches. ADVANCES IN GENETICS 2010; 72:47-71. [PMID: 21029848 PMCID: PMC4190044 DOI: 10.1016/b978-0-12-380862-2.00003-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Genetic epidemiology is increasingly focused on complex diseases involving multiple genes and environmental factors, often interacting in complex ways. Although standard frequentist methods still have a role in hypothesis generation and testing for discovery of novel main effects and interactions, Bayesian methods are particularly well suited to modeling the relationships in an integrated "systems biology" manner. In this chapter, we provide an overview of the principles of Bayesian analysis and their advantages in this context and describe various approaches to applying them for both model building and discovery in a genome-wide setting. In particular, we highlight the ability of Bayesian methods to construct complex probability models via a hierarchical structure and to account for uncertainty in model specification by averaging over large spaces of alternative models.
Collapse
Affiliation(s)
- Melanie A Wilson
- Department of Preventive Medicine, University of Southern California, Los Angeles, California, USA
| | | | | | | |
Collapse
|