151
|
Jia P, Liu Y, Zhao Z. Integrative pathway analysis of genome-wide association studies and gene expression data in prostate cancer. BMC SYSTEMS BIOLOGY 2012; 6 Suppl 3:S13. [PMID: 23281744 PMCID: PMC3524313 DOI: 10.1186/1752-0509-6-s3-s13] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
BACKGROUND Pathway analysis of large-scale omics data assists us with the examination of the cumulative effects of multiple functionally related genes, which are difficult to detect using the traditional single gene/marker analysis. So far, most of the genomic studies have been conducted in a single domain, e.g., by genome-wide association studies (GWAS) or microarray gene expression investigation. A combined analysis of disease susceptibility genes across multiple platforms at the pathway level is an urgent need because it can reveal more reliable and more biologically important information. RESULTS We performed an integrative pathway analysis of a GWAS dataset and a microarray gene expression dataset in prostate cancer. We obtained a comprehensive pathway annotation set from knowledge-based public resources, including KEGG pathways and the prostate cancer candidate gene set, and gene sets specifically defined based on cross-platform information. By leveraging on this pathway collection, we first searched for significant pathways in the GWAS dataset using four methods, which represent two broad groups of pathway analysis approaches. The significant pathways identified by each method varied greatly, but the results were more consistent within each method group than between groups. Next, we conducted a gene set enrichment analysis of the microarray gene expression data and found 13 pathways with cross-platform evidence, including "Fc gamma R-mediated phagocytosis" (P GWAS = 0.003, P expr < 0.001, and P combined = 6.18 × 10(-8)), "regulation of actin cytoskeleton" (P GWAS = 0.003, P expr = 0.009, and P combined = 3.34 × 10(-4)), and "Jak-STAT signaling pathway" (P GWAS = 0.001, P expr = 0.084, and P combined = 8.79 × 10(-4)). CONCLUSIONS Our results provide evidence at both the genetic variation and expression levels that several key pathways might have been involved in the pathological development of prostate cancer. Our framework that employs gene expression data to facilitate pathway analysis of GWAS data is not only feasible but also much needed in studying complex disease.
Collapse
Affiliation(s)
- Peilin Jia
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | | | | |
Collapse
|
152
|
Chen LS, Hsu L, Gamazon ER, Cox NJ, Nicolae DL. An exponential combination procedure for set-based association tests in sequencing studies. Am J Hum Genet 2012; 91:977-86. [PMID: 23159251 DOI: 10.1016/j.ajhg.2012.09.017] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2012] [Revised: 07/25/2012] [Accepted: 09/20/2012] [Indexed: 01/06/2023] Open
Abstract
State-of-the-art next-generation-sequencing technologies can facilitate in-depth explorations of the human genome by investigating both common and rare variants. For the identification of genetic factors that are associated with disease risk or other complex phenotypes, methods have been proposed for jointly analyzing variants in a set (e.g., all coding SNPs in a gene). Variants in a properly defined set could be associated with risk or phenotype in a concerted fashion, and by accumulating information from them, one can improve power to detect genetic risk factors. Many set-based methods in the literature are based on statistics that can be written as the summation of variant statistics. Here, we propose taking the summation of the exponential of variant statistics as the set summary for association testing. From both Bayesian and frequentist perspectives, we provide theoretical justification for taking the sum of the exponential of variant statistics because it is particularly powerful for sparse alternatives-that is, compared with the large number of variants being tested in a set, only relatively few variants are associated with disease risk-a distinctive feature of genetic data. We applied the exponential combination gene-based test to a sequencing study in anticancer pharmacogenomics and uncovered mechanistic insights into genes and pathways related to chemotherapeutic susceptibility for an important class of oncologic drugs.
Collapse
Affiliation(s)
- Lin S Chen
- Department of Health Studies, The University of Chicago, Chicago, IL 60637, USA.
| | | | | | | | | |
Collapse
|
153
|
Kitahara CM, Neta G, Pfeiffer RM, Kwon D, Xu L, Freedman ND, Hutchinson AA, Chanock SJ, Sturgis EM, Sigurdson AJ, Brenner AV. Common obesity-related genetic variants and papillary thyroid cancer risk. Cancer Epidemiol Biomarkers Prev 2012; 21:2268-71. [PMID: 23064004 PMCID: PMC3518668 DOI: 10.1158/1055-9965.epi-12-0790] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Epidemiologic studies have shown consistent associations between obesity and increased thyroid cancer risk, but, to date, no studies have investigated the relationship between thyroid cancer risk and obesity-related single-nucleotide polymorphisms (SNP). METHODS We evaluated 575 tag SNPs in 23 obesity-related gene regions in a case-control study of 341 incident papillary thyroid cancer (PTC) cases and 444 controls of European ancestry. Logistic regression models, adjusted for attained age, year of birth, and sex were used to calculate ORs and 95% confidence intervals (CI) with SNP genotypes, coded as 0, 1, and 2 and modeled continuously to calculate P(trend). RESULTS Nine of 10 top-ranking SNPs (P(trend) < 0.01) were located in the FTO (fat mass and obesity associated) gene region, whereas the other was located in INSR (insulin receptor). None of the associations were significant after correcting for multiple testing. CONCLUSIONS Our data do not support an important role of obesity-related genetic polymorphisms in determining the risk of PTC. IMPACT Factors other than selected genetic polymorphisms may be responsible for the observed associations between obesity and increased PTC risk.
Collapse
Affiliation(s)
- Cari M Kitahara
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, EPS 7056, 6120 Executive Blvd, Rockville, MD 20852, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
154
|
Ferguson J, Wheeler W, Fu Y, Prokunina-Olsson L, Zhao H, Sampson J. Statistical tests for detecting associations with groups of genetic variants: generalization, evaluation, and implementation. Eur J Hum Genet 2012; 21:680-6. [PMID: 23092956 DOI: 10.1038/ejhg.2012.220] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
With recent advances in sequencing, genotyping arrays, and imputation, GWAS now aim to identify associations with rare and uncommon genetic variants. Here, we describe and evaluate a class of statistics, generalized score statistics (GSS), that can test for an association between a group of genetic variants and a phenotype. GSS are a simple weighted sum of single-variant statistics and their cross-products. We show that the majority of statistics currently used to detect associations with rare variants are equivalent to choosing a specific set of weights within this framework. We then evaluate the power of various weighting schemes as a function of variant characteristics, such as MAF, the proportion associated with the phenotype, and the direction of effect. Ultimately, we find that two classical tests are robust and powerful, but details are provided as to when other GSS may perform favorably. The software package CRaVe is available at our website (http://dceg.cancer.gov/bb/tools/crave).
Collapse
Affiliation(s)
- John Ferguson
- Division of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | | | | | | | | | | |
Collapse
|
155
|
Chasman DI, Fuchsberger C, Pattaro C, Teumer A, Böger CA, Endlich K, Olden M, Chen MH, Tin A, Taliun D, Li M, Gao X, Gorski M, Yang Q, Hundertmark C, Foster MC, O'Seaghdha CM, Glazer N, Isaacs A, Liu CT, Smith AV, O'Connell JR, Struchalin M, Tanaka T, Li G, Johnson AD, Gierman HJ, Feitosa MF, Hwang SJ, Atkinson EJ, Lohman K, Cornelis MC, Johansson A, Tönjes A, Dehghan A, Lambert JC, Holliday EG, Sorice R, Kutalik Z, Lehtimäki T, Esko T, Deshmukh H, Ulivi S, Chu AY, Murgia F, Trompet S, Imboden M, Coassin S, Pistis G, Harris TB, Launer LJ, Aspelund T, Eiriksdottir G, Mitchell BD, Boerwinkle E, Schmidt H, Cavalieri M, Rao M, Hu F, Demirkan A, Oostra BA, de Andrade M, Turner ST, Ding J, Andrews JS, Freedman BI, Giulianini F, Koenig W, Illig T, Meisinger C, Gieger C, Zgaga L, Zemunik T, Boban M, Minelli C, Wheeler HE, Igl W, Zaboli G, Wild SH, Wright AF, Campbell H, Ellinghaus D, Nöthlings U, Jacobs G, Biffar R, Ernst F, Homuth G, Kroemer HK, Nauck M, Stracke S, Völker U, Völzke H, Kovacs P, Stumvoll M, Mägi R, Hofman A, Uitterlinden AG, Rivadeneira F, Aulchenko YS, Polasek O, Hastie N, Vitart V, Helmer C, Wang JJ, Stengel B, Ruggiero D, Bergmann S, Kähönen M, Viikari J, Nikopensius T, Province M, Ketkar S, Colhoun H, Doney A, Robino A, Krämer BK, Portas L, Ford I, Buckley BM, Adam M, Thun GA, Paulweber B, Haun M, Sala C, Mitchell P, Ciullo M, Kim SK, Vollenweider P, Raitakari O, Metspalu A, Palmer C, Gasparini P, Pirastu M, Jukema JW, Probst-Hensch NM, Kronenberg F, Toniolo D, Gudnason V, Shuldiner AR, Coresh J, Schmidt R, Ferrucci L, Siscovick DS, van Duijn CM, Borecki IB, Kardia SLR, Liu Y, Curhan GC, Rudan I, Gyllensten U, Wilson JF, Franke A, Pramstaller PP, Rettig R, Prokopenko I, Witteman J, Hayward C, Ridker PM, Parsa A, Bochud M, Heid IM, Kao WHL, Fox CS, Köttgen A. Integration of genome-wide association studies with biological knowledge identifies six novel genes related to kidney function. Hum Mol Genet 2012; 21:5329-43. [PMID: 22962313 DOI: 10.1093/hmg/dds369] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
In conducting genome-wide association studies (GWAS), analytical approaches leveraging biological information may further understanding of the pathophysiology of clinical traits. To discover novel associations with estimated glomerular filtration rate (eGFR), a measure of kidney function, we developed a strategy for integrating prior biological knowledge into the existing GWAS data for eGFR from the CKDGen Consortium. Our strategy focuses on single nucleotide polymorphism (SNPs) in genes that are connected by functional evidence, determined by literature mining and gene ontology (GO) hierarchies, to genes near previously validated eGFR associations. It then requires association thresholds consistent with multiple testing, and finally evaluates novel candidates by independent replication. Among the samples of European ancestry, we identified a genome-wide significant SNP in FBXL20 (P = 5.6 × 10(-9)) in meta-analysis of all available data, and additional SNPs at the INHBC, LRP2, PLEKHA1, SLC3A2 and SLC7A6 genes meeting multiple-testing corrected significance for replication and overall P-values of 4.5 × 10(-4)-2.2 × 10(-7). Neither the novel PLEKHA1 nor FBXL20 associations, both further supported by association with eGFR among African Americans and with transcript abundance, would have been implicated by eGFR candidate gene approaches. LRP2, encoding the megalin receptor, was identified through connection with the previously known eGFR gene DAB2 and extends understanding of the megalin system in kidney function. These findings highlight integration of existing genome-wide association data with independent biological knowledge to uncover novel candidate eGFR associations, including candidates lacking known connections to kidney-specific pathways. The strategy may also be applicable to other clinical phenotypes, although more testing will be needed to assess its potential for discovery in general.
Collapse
Affiliation(s)
- Daniel I Chasman
- Division of Preventive Medicine, Brigham and Women's Hospital, Boston, MA 02215, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
156
|
Nazarian A, Sichtig H, Riva A. A knowledge-based method for association studies on complex diseases. PLoS One 2012; 7:e44162. [PMID: 22970175 PMCID: PMC3435396 DOI: 10.1371/journal.pone.0044162] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2012] [Accepted: 07/30/2012] [Indexed: 12/29/2022] Open
Abstract
Complex disorders are a class of diseases whose phenotypic variance is caused by the interplay of multiple genetic and environmental factors. Analyzing the complexity underlying the genetic architecture of such traits may help develop more efficient diagnostic tests and therapeutic protocols. Despite the continuous advances in revealing the genetic basis of many of complex diseases using genome-wide association studies (GWAS), a major proportion of their genetic variance has remained unexplained, in part because GWAS are unable to reliably detect small individual risk contributions and to capture the underlying genetic heterogeneity. In this paper we describe a hypothesis-based method to analyze the association between multiple genetic factors and a complex phenotype. Starting from sets of markers selected based on preexisting biomedical knowledge, our method generates multi-marker models relevant to the biological process underlying a complex trait for which genotype data is available. We tested the applicability of our method using the WTCCC case-control dataset. Analyzing a number of biological pathways, the method was able to identify several immune system related multi-SNP models significantly associated with Rheumatoid Arthritis (RA) and Crohn's disease (CD). RA-associated multi-SNP models were also replicated in an independent case-control dataset. The method we present provides a framework for capturing joint contributions of genetic factors to complex traits. In contrast to hypothesis-free approaches, its results can be given a direct biological interpretation. The replicated multi-SNP models generated by our analysis may serve as a predictor to estimate the risk of RA development in individuals of Caucasian ancestry.
Collapse
Affiliation(s)
- Alireza Nazarian
- Department of Molecular Genetics and Microbiology and UF Genetics Institute, University of Florida, Gainesville, Florida, United States of America
| | - Heike Sichtig
- Department of Molecular Genetics and Microbiology and UF Genetics Institute, University of Florida, Gainesville, Florida, United States of America
| | - Alberto Riva
- Department of Molecular Genetics and Microbiology and UF Genetics Institute, University of Florida, Gainesville, Florida, United States of America
- * E-mail:
| |
Collapse
|
157
|
Evangelou M, Rendon A, Ouwehand WH, Wernisch L, Dudbridge F. Comparison of methods for competitive tests of pathway analysis. PLoS One 2012; 7:e41018. [PMID: 22859961 PMCID: PMC3409204 DOI: 10.1371/journal.pone.0041018] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2012] [Accepted: 06/15/2012] [Indexed: 01/16/2023] Open
Abstract
It has been suggested that pathway analysis can complement single-SNP analysis in exploring genomewide association data. Pathway analysis incorporates the available biological knowledge of genes and SNPs and is expected to improve the chances of revealing the underlying genetic architecture of complex traits. Methods for pathway analysis can be classified as competitive (enrichment) or self-contained (association) according to the hypothesis tested. Although association tests are statistically more powerful than enrichment tests they can be difficult to calibrate because biases in analysis accumulate across multiple SNPs or genes. Furthermore, enrichment tests can be more scientifically relevant than association tests, as they detect pathways with relatively more evidence for association than the remaining genes. Here we show how some well known association tests can be simply adapted to test for enrichment, and compare their performance to some established enrichment tests. We propose versions of the Adaptive Rank Truncated Product (ARTP), Tail Strength Measure and Fisher's combination of p-values for testing the enrichment null hypothesis. We compare the behaviour of these proposed methods with the established Hypergeometric Test and Gene-Set Enrichment Analysis (GSEA). The results of the simulation study show that the modified version of the ARTP method has generally the best performance across the situations considered. The methods were also applied for finding enriched pathways for body mass index (BMI) and platelet function phenotypes. The pathway analysis of BMI identified the Vasoactive Intestinal Peptide pathway as significantly associated with BMI. This pathway has been previously reported as associated with BMI and the risk of obesity. The ARTP method was the method that identified the largest number of enriched pathways across all tested pathway databases and phenotypes. The simulation and data application results are in agreement with previous work on association tests and suggests that the ARTP should be preferred for both enrichment and association testing.
Collapse
Affiliation(s)
- Marina Evangelou
- Medical Research Council Biostatistics Unit, Institute of Public Health, Cambridge, United Kingdom
| | - Augusto Rendon
- Medical Research Council Biostatistics Unit, Institute of Public Health, Cambridge, United Kingdom
- Department of Haematology, University of Cambridge, Cambridge, United Kingdom
- National Health Service Blood and Transplant, Cambridge, United Kingdom
| | - Willem H. Ouwehand
- Department of Haematology, University of Cambridge, Cambridge, United Kingdom
- National Health Service Blood and Transplant, Cambridge, United Kingdom
- Wellcome Trust Sanger Institute, Cambridge, United Kingdom
| | - Lorenz Wernisch
- Medical Research Council Biostatistics Unit, Institute of Public Health, Cambridge, United Kingdom
| | - Frank Dudbridge
- Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, United Kingdom
| |
Collapse
|
158
|
Lee J, Ahn S, Oh S, Weir B, Park T. SNP-PRAGE: SNP-based parametric robust analysis of gene set enrichment. BMC SYSTEMS BIOLOGY 2012; 5 Suppl 2:S11. [PMID: 22784568 PMCID: PMC3287477 DOI: 10.1186/1752-0509-5-s2-s11] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
BACKGROUND The current genome-wide association (GWA) analysis mainly focuses on the single genetic variant, which may not reveal some the genetic variants that have small individual effects but large joint effects. Considering the multiple SNPs jointly in Genome-wide association (GWA) analysis can increase power. When multiple SNPs are jointly considered, the corresponding SNP-level association measures are likely to be correlated due to the linkage disequilibrium (LD) among SNPs. METHODS We propose SNP-based parametric robust analysis of gene-set enrichment (SNP-PRAGE) method which handles correlation adequately among association measures of SNPs, and minimizes computing effort by the parametric assumption. SNP-PRAGE first obtains gene-level association measures from SNP-level association measures by incorporating the size of corresponding (or nearby) genes and the LD structure among SNPs. Afterward, SNP-PRAGE acquires the gene-set level summary of genes that undergo the same biological knowledge. This two-step summarization makes the within-set association measures to be independent from each other, and therefore the central limit theorem can be adequately applied for the parametric model. RESULTS & CONCLUSIONS We applied SNP-PRAGE to two GWA data sets: hypertension data of 8,842 samples from the Korean population and bipolar disorder data of 4,806 samples from the Wellcome Trust Case Control Consortium (WTCCC). We found two enriched gene sets for hypertension and three enriched gene sets for bipolar disorder. By a simulation study, we compared our method to other gene set methods, and we found SNP-PRAGE reduced many false positives notably while requiring much less computational efforts than other permutation-based gene set approaches.
Collapse
Affiliation(s)
- Jaehoon Lee
- Department of Statistics, Seoul National University, San 56-1, Shilim-dong, Seoul, Korea
| | | | | | | | | |
Collapse
|
159
|
Curjuric I, Imboden M, Nadif R, Kumar A, Schindler C, Haun M, Kronenberg F, Künzli N, Phuleria H, Postma DS, Russi EW, Rochat T, Demenais F, Probst-Hensch NM. Different genes interact with particulate matter and tobacco smoke exposure in affecting lung function decline in the general population. PLoS One 2012; 7:e40175. [PMID: 22792237 PMCID: PMC3391223 DOI: 10.1371/journal.pone.0040175] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2011] [Accepted: 06/06/2012] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Oxidative stress related genes modify the effects of ambient air pollution or tobacco smoking on lung function decline. The impact of interactions might be substantial, but previous studies mostly focused on main effects of single genes. OBJECTIVES We studied the interaction of both exposures with a broad set of oxidative-stress related candidate genes and pathways on lung function decline and contrasted interactions between exposures. METHODS For 12679 single nucleotide polymorphisms (SNPs), change in forced expiratory volume in one second (FEV(1)), FEV(1) over forced vital capacity (FEV(1)/FVC), and mean forced expiratory flow between 25 and 75% of the FVC (FEF(25-75)) was regressed on interval exposure to particulate matter <10 µm in diameter (PM10) or packyears smoked (a), additive SNP effects (b), and interaction terms between (a) and (b) in 669 adults with GWAS data. Interaction p-values for 152 genes and 14 pathways were calculated by the adaptive rank truncation product (ARTP) method, and compared between exposures. Interaction effect sizes were contrasted for the strongest SNPs of nominally significant genes (p(interaction)<0.05). Replication was attempted for SNPs with MAF>10% in 3320 SAPALDIA participants without GWAS. RESULTS On the SNP-level, rs2035268 in gene SNCA accelerated FEV(1)/FVC decline by 3.8% (p(interaction) = 2.5×10(-6)), and rs12190800 in PARK2 attenuated FEV1 decline by 95.1 ml p(interaction) = 9.7×10(-8)) over 11 years, while interacting with PM10. Genes and pathways nominally interacting with PM10 and packyears exposure differed substantially. Gene CRISP2 presented a significant interaction with PM10 (p(interaction) = 3.0×10(-4)) on FEV(1)/FVC decline. Pathway interactions were weak. Replications for the strongest SNPs in PARK2 and CRISP2 were not successful. CONCLUSIONS Consistent with a stratified response to increasing oxidative stress, different genes and pathways potentially mediate PM10 and tobacco smoke effects on lung function decline. Ignoring environmental exposures would miss these patterns, but achieving sufficient sample size and comparability across study samples is challenging.
Collapse
Affiliation(s)
- Ivan Curjuric
- Department of Epidemiology and Public Health, Swiss Tropical and Public Health Institute SwissTPH, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Medea Imboden
- Department of Epidemiology and Public Health, Swiss Tropical and Public Health Institute SwissTPH, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Rachel Nadif
- INSERM, U1018, CESP Centre for research in Epidemiology and Population Health, Respiratory and Environmental Epidemiology Team, Villejuif, France
- Université Paris-Sud 11, UMRS 1018, Villejuif, France
| | - Ashish Kumar
- Department of Epidemiology and Public Health, Swiss Tropical and Public Health Institute SwissTPH, Basel, Switzerland
- University of Basel, Basel, Switzerland
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Christian Schindler
- Department of Epidemiology and Public Health, Swiss Tropical and Public Health Institute SwissTPH, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Margot Haun
- Division of Genetic Epidemiology, Department of Medical Genetics, Molecular and Clinical Pharmacology, Innsbruck Medical University, Innsbruck, Austria
| | - Florian Kronenberg
- Division of Genetic Epidemiology, Department of Medical Genetics, Molecular and Clinical Pharmacology, Innsbruck Medical University, Innsbruck, Austria
| | - Nino Künzli
- Department of Epidemiology and Public Health, Swiss Tropical and Public Health Institute SwissTPH, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Harish Phuleria
- Department of Epidemiology and Public Health, Swiss Tropical and Public Health Institute SwissTPH, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Dirkje S. Postma
- Department of Pulmonary Medicine and Tuberculosis, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Erich W. Russi
- Division of Pulmonary Medicine, University Hospital Zürich, Zürich, Switzerland
| | - Thierry Rochat
- Division of Pulmonary Medicine, Geneva University Hospitals, Geneva, Switzerland
| | - Florence Demenais
- INSERM, U946, Genetic Variation and Human Diseases Unit, Paris, France
- Fondation Jean Dausset - Centre d’Etude du Polymorphisme Humain (CEPH), Paris, France
- Université Paris Diderot, Sorbonne Paris Cité, Institut Universitaire d’Hématologie, Paris, France
| | - Nicole M. Probst-Hensch
- Department of Epidemiology and Public Health, Swiss Tropical and Public Health Institute SwissTPH, Basel, Switzerland
- University of Basel, Basel, Switzerland
| |
Collapse
|
160
|
Jia P, Wang L, Fanous AH, Pato CN, Edwards TL, Zhao Z. Network-assisted investigation of combined causal signals from genome-wide association studies in schizophrenia. PLoS Comput Biol 2012; 8:e1002587. [PMID: 22792057 PMCID: PMC3390381 DOI: 10.1371/journal.pcbi.1002587] [Citation(s) in RCA: 88] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2011] [Accepted: 05/15/2012] [Indexed: 12/21/2022] Open
Abstract
With the recent success of genome-wide association studies (GWAS), a wealth of association data has been accomplished for more than 200 complex diseases/traits, proposing a strong demand for data integration and interpretation. A combinatory analysis of multiple GWAS datasets, or an integrative analysis of GWAS data and other high-throughput data, has been particularly promising. In this study, we proposed an integrative analysis framework of multiple GWAS datasets by overlaying association signals onto the protein-protein interaction network, and demonstrated it using schizophrenia datasets. Building on a dense module search algorithm, we first searched for significantly enriched subnetworks for schizophrenia in each single GWAS dataset and then implemented a discovery-evaluation strategy to identify module genes with consistent association signals. We validated the module genes in an independent dataset, and also examined them through meta-analysis of the related SNPs using multiple GWAS datasets. As a result, we identified 205 module genes with a joint effect significantly associated with schizophrenia; these module genes included a number of well-studied candidate genes such as DISC1, GNA12, GNA13, GNAI1, GPR17, and GRIN2B. Further functional analysis suggested these genes are involved in neuronal related processes. Additionally, meta-analysis found that 18 SNPs in 9 module genes had Pmeta<1×10−4, including the gene HLA-DQA1 located in the MHC region on chromosome 6, which was reported in previous studies using the largest cohort of schizophrenia patients to date. These results demonstrated our bi-directional network-based strategy is efficient for identifying disease-associated genes with modest signals in GWAS datasets. This approach can be applied to any other complex diseases/traits where multiple GWAS datasets are available. The recent success of genome-wide association studies (GWAS) has generated a wealth of genotyping data critical to studies of genetic architectures of many complex diseases. In contrast to traditional single marker analysis, an integrative analysis of multiple genes and the assessment of their joint effects have been particularly promising, especially upon the availability of many GWAS datasets and other high-throughput datasets for numerous complex diseases. In this study, we developed an integrative analysis framework for multiple GWAS datasets and demonstrated it in schizophrenia. We first constructed a GWAS-weighted protein-protein interaction (PPI) network and then applied a dense module search algorithm to identify subnetworks with combinatory disease effects. We applied combinatorial criteria for module selection based on permutation tests to determine whether the modules are significantly different from random gene sets and whether the modules are associated with the disease in investigation. Importantly, considering there are many complex diseases with multiple GWAS datasets available, we proposed a discovery-evaluation strategy to search for modules with consistent combined effects from two or more GWAS datasets. This approach can be applied to any diseases or traits that have two or more GWAS datasets available.
Collapse
Affiliation(s)
- Peilin Jia
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Department of Psychiatry, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Lily Wang
- Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Ayman H. Fanous
- Department of Psychiatry and Virginia Institute for Psychiatric and Behavior Genetics, Virginia Commonwealth University, Richmond, Virginia, United States of America
- Washington VA Medical Center, Washington, D.C., United States of America
- Department of Psychiatry, Georgetown University School of Medicine, Washington, D.C., United States of America
- Department of Psychiatry, Keck School of Medicine of the University of Southern California, Los Angeles, California, United States of America
| | - Carlos N. Pato
- Department of Psychiatry, Keck School of Medicine of the University of Southern California, Los Angeles, California, United States of America
| | - Todd L. Edwards
- Center for Human Genetics Research, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Division of Epidemiolgy, Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | | | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Department of Psychiatry, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- * E-mail:
| |
Collapse
|
161
|
van Veen T, Goeman JJ, Monajemi R, Wardenaar KJ, Hartman CA, Snieder H, Nolte IM, Penninx BWJH, Zitman FG. Different gene sets contribute to different symptom dimensions of depression and anxiety. Am J Med Genet B Neuropsychiatr Genet 2012; 159B:519-28. [PMID: 22573416 DOI: 10.1002/ajmg.b.32058] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/12/2012] [Accepted: 04/19/2012] [Indexed: 01/09/2023]
Abstract
Although many genetic association studies have been carried out, it remains unclear which genes contribute to depression. This may be due to heterogeneity of the DSM-IV category of depression. Specific symptom-dimensions provide a more homogenous phenotype. Furthermore, as effects of individual genes are small, analysis of genetic data at the pathway-level provides more power to detect associations and yield valuable biological insight. In 1,398 individuals with a Major Depressive Disorder, the symptom dimensions of the tripartite model of anxiety and depression, General Distress, Anhedonic Depression, and Anxious Arousal, were measured with the Mood and Anxiety Symptoms Questionnaire (30-item Dutch adaptation; MASQ-D30). Association of these symptom dimensions with candidate gene sets and gene sets from two public pathway databases was tested using the Global test. One pathway was associated with General Distress, and concerned molecules expressed in the endoplasmatic reticulum lumen. Seven pathways were associated with Anhedonic Depression. Important themes were neurodevelopment, neurodegeneration, and cytoskeleton. Furthermore, three gene sets associated with Anxious Arousal regarded development, morphology, and genetic recombination. The individual pathways explained up to 1.7% of the variance. These data demonstrate mechanisms that influence the specific dimensions. Moreover, they show the value of using dimensional phenotypes on one hand and gene sets on the other hand.
Collapse
Affiliation(s)
- Tineke van Veen
- Department of Psychiatry, Leiden University Medical Centre, Leiden, The Netherlands.
| | | | | | | | | | | | | | | | | |
Collapse
|
162
|
Li D, Duell EJ, Yu K, Risch HA, Olson SH, Kooperberg C, Wolpin BM, Jiao L, Dong X, Wheeler B, Arslan AA, Bueno-de-Mesquita HB, Fuchs CS, Gallinger S, Gross M, Hartge P, Hoover RN, Holly EA, Jacobs EJ, Klein AP, LaCroix A, Mandelson MT, Petersen G, Zheng W, Agalliu I, Albanes D, Boutron-Ruault MC, Bracci PM, Buring JE, Canzian F, Chang K, Chanock SJ, Cotterchio M, Gaziano J, Giovannucci EL, Goggins M, Hallmans G, Hankinson SE, Hoffman Bolton JA, Hunter DJ, Hutchinson A, Jacobs KB, Jenab M, Khaw KT, Kraft P, Krogh V, Kurtz RC, McWilliams RR, Mendelsohn JB, Patel AV, Rabe KG, Riboli E, Shu XO, Tjønneland A, Tobias GS, Trichopoulos D, Virtamo J, Visvanathan K, Watters J, Yu H, Zeleniuch-Jacquotte A, Amundadottir L, Stolzenberg-Solomon RZ. Pathway analysis of genome-wide association study data highlights pancreatic development genes as susceptibility factors for pancreatic cancer. Carcinogenesis 2012; 33:1384-90. [PMID: 22523087 PMCID: PMC3405651 DOI: 10.1093/carcin/bgs151] [Citation(s) in RCA: 82] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2012] [Revised: 04/02/2012] [Accepted: 03/09/2012] [Indexed: 12/20/2022] Open
Abstract
Four loci have been associated with pancreatic cancer through genome-wide association studies (GWAS). Pathway-based analysis of GWAS data is a complementary approach to identify groups of genes or biological pathways enriched with disease-associated single-nucleotide polymorphisms (SNPs) whose individual effect sizes may be too small to be detected by standard single-locus methods. We used the adaptive rank truncated product method in a pathway-based analysis of GWAS data from 3851 pancreatic cancer cases and 3934 control participants pooled from 12 cohort studies and 8 case-control studies (PanScan). We compiled 23 biological pathways hypothesized to be relevant to pancreatic cancer and observed a nominal association between pancreatic cancer and five pathways (P < 0.05), i.e. pancreatic development, Helicobacter pylori lacto/neolacto, hedgehog, Th1/Th2 immune response and apoptosis (P = 2.0 × 10(-6), 1.6 × 10(-5), 0.0019, 0.019 and 0.023, respectively). After excluding previously identified genes from the original GWAS in three pathways (NR5A2, ABO and SHH), the pancreatic development pathway remained significant (P = 8.3 × 10(-5)), whereas the others did not. The most significant genes (P < 0.01) in the five pathways were NR5A2, HNF1A, HNF4G and PDX1 for pancreatic development; ABO for H.pylori lacto/neolacto; SHH for hedgehog; TGFBR2 and CCL18 for Th1/Th2 immune response and MAPK8 and BCL2L11 for apoptosis. Our results provide a link between inherited variation in genes important for pancreatic development and cancer and show that pathway-based approaches to analysis of GWAS data can yield important insights into the collective role of genetic risk variants in cancer.
Collapse
Affiliation(s)
| | - Eric J. Duell
- Catalan Institute of Oncology (ICO-IDIBELL), Barcelona, Spain
| | - Kai Yu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, USA
| | | | - Sara H. Olson
- Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Brian M. Wolpin
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Channing Laboratory, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Li Jiao
- Department of Medicine, Baylor College of Medicine, Houston, TX, USA
| | | | - Bill Wheeler
- Information Management Services, Silver Spring, MD, USA
| | - Alan A. Arslan
- Department of Obstetrics and Gynecology, New York University School of Medicine, New York, NY, USA
- Department of Environmental Medicine, New York University School of Medicine, New York, NY, USA
- New York University Cancer Institute, New York, NY, USA
| | - H. Bas Bueno-de-Mesquita
- National Institute for Public Health and the Environment (RIVM), Bilthoven, The Netherlands
- Department of Gastroenterology and Hepatology, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Charles S. Fuchs
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Channing Laboratory, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Steven Gallinger
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, University of Toronto, Toronto, Canada
| | - Myron Gross
- Department of Laboratory Medicine/Pathology, School of Medicine, University of Minnesota, Minneapolis, MN, USA
| | - Patricia Hartge
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, USA
| | - Robert N. Hoover
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, USA
| | - Elizabeth A. Holly
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, USA
| | - Eric J. Jacobs
- Epidemiology Research Program, American Cancer Society, Atlanta, GA, USA
| | - Alison P. Klein
- Department of Oncology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Epidemiology, The Bloomberg School of Public Health, The Sol Goldman Pancreatic Research Center, The Johns Hopkins Medical Institutions, Baltimore, MD, USA
| | - Andrea LaCroix
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Margaret T. Mandelson
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Group Health Center for Health Studies, Seattle, WA, USA
| | - Gloria Petersen
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Wei Zheng
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, and Vanderbilt-Ingram Cancer Center, Vanderbilt University, Nashville, TN, USA
| | - Ilir Agalliu
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Demetrius Albanes
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, USA
| | | | - Paige M. Bracci
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, USA
| | - Julie E. Buring
- Department of Ambulatory Care and Prevention, Harvard Medical School, Boston, MA, USA
- Divisions of Preventive Medicine and Aging, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Federico Canzian
- Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Kenneth Chang
- Comprehensive Digestive Disease Center, University of California, Irvine Medical Center, Orange, CA, USA
| | - Stephen J. Chanock
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, USA
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, USA
- Core Genotyping Facility, Advanced Technology Program, SAIC-Frederick Inc., NCI-Frederick, Frederick, MD, USA
| | - Michelle Cotterchio
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
- Prevention and Cancer Control, Cancer Care Ontario, Toronto, Ontario, Canada
| | - J.Michael Gaziano
- Physicians’ Health Study, Divisions of Aging, Cardiovascular Medicine, and Preventive Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, and Massachusetts Veterans Epidemiology Research and Information Center, Veterans Affairs Boston Healthcare System, Boston, MA, USA
| | - Edward L. Giovannucci
- Channing Laboratory, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA
- Department of Nutrition, Harvard School of Public Health, Boston, MA, USA
| | - Michael Goggins
- Departments of Oncology, Pathology and Medicine, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Göran Hallmans
- Department of Public Health and Clinical Medicine, Nutritional Research, Umeå University, Umeå, Sweden
| | - Susan E. Hankinson
- Channing Laboratory, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA
| | - Judith A. Hoffman Bolton
- Department of Epidemiology, The Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
| | - David J. Hunter
- Channing Laboratory, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA
| | - Amy Hutchinson
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, USA
- Core Genotyping Facility, Advanced Technology Program, SAIC-Frederick Inc., NCI-Frederick, Frederick, MD, USA
| | - Kevin B. Jacobs
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, USA
- Core Genotyping Facility, Advanced Technology Program, SAIC-Frederick Inc., NCI-Frederick, Frederick, MD, USA
- Bioinformed Consulting Services, Gaithersburg, MD, USA
| | - Mazda Jenab
- International Agency for Research on Cancer (IARC/WHO), Lyon, France
| | - Kay-Tee Khaw
- Department of Public Health and Primary Care, Clinical Gerontology, Addenbrooke's Hospital, University of Cambridge, Cambridge, UK
| | - Peter Kraft
- Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA
| | - Vittorio Krogh
- Nutritional Epidemiology Unit, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy
| | - Robert C. Kurtz
- Department of Medicine, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| | | | - Julie B. Mendelsohn
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, USA
| | - Alpa V. Patel
- Epidemiology Research Program, American Cancer Society, Atlanta, GA, USA
| | - Kari G. Rabe
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Elio Riboli
- Division of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK
| | - Xiao-Ou Shu
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, and Vanderbilt-Ingram Cancer Center, Vanderbilt University, Nashville, TN, USA
| | - Anne Tjønneland
- Institute of Cancer Epidemiology, Danish Cancer Society, Copenhagen, Denmark
| | - Geoffrey S. Tobias
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, USA
| | - Dimitrios Trichopoulos
- Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA
- Bureau of Epidemiologic Research, Academy of Athens, Athens, Greece
| | - Jarmo Virtamo
- Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Finland
| | - Kala Visvanathan
- Departments of Oncology, Pathology and Medicine, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Joanne Watters
- Division of Cancer Prevention and Population Control, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, USA
| | - Herbert Yu
- Yale University School of Public Health, New Haven, CT, USA
| | - Anne Zeleniuch-Jacquotte
- Department of Environmental Medicine, New York University School of Medicine, New York, NY, USA
- New York University Cancer Institute, New York, NY, USA
| | - Laufey Amundadottir
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, USA
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, USA
| | - Rachael Z. Stolzenberg-Solomon
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, USA
| |
Collapse
|
163
|
|
164
|
Aschebrook-Kilfoy B, Neta G, Brenner AV, Hutchinson A, Pfeiffer RM, Sturgis EM, Xu L, Wheeler W, Doody MM, Chanock SJ, Sigurdson AJ. Common genetic variants in metabolism and detoxification pathways and the risk of papillary thyroid cancer. Endocr Relat Cancer 2012; 19:333-44. [PMID: 22389382 PMCID: PMC3394851 DOI: 10.1530/erc-11-0372] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Relationships are unclear between polymorphisms in genes involved in metabolism and detoxification of various chemicals and papillary thyroid cancer (PTC) risk as well as their potential modification by alcohol or tobacco intake. We evaluated associations between 1647 tagging single nucleotide polymorphisms (SNPs) in 132 candidate genes/regions involved in metabolism of exogenous and endogenous compounds (Phase I/II, oxidative stress, and metal binding pathways) and PTC risk in 344 PTC cases and 452 controls. For 15 selected regions and their respective SNPs, we also assessed interaction with alcohol and tobacco use. Logistic regression models were used to evaluate the main effect of SNPs (P(trend)) and interaction with alcohol/tobacco intake. Gene- and pathway-level associations and interactions (P(gene interaction)) were evaluated by combining P(trend) values using the adaptive rank-truncated product method. While we found associations between PTC risk and nine SNPs (P(trend) ≤ 0.01) and seven genes/regions (P(region)<0.05), none remained significant after correction for the false discovery rate. We found a significant interaction between UGT2B7 and NAT1 genes and alcohol intake (P(gene interaction)=0.01 and 0.02 respectively) and between the CYP26B1 gene and tobacco intake (P(gene interaction)=0.02). Our results are suggestive of interaction between the genetic polymorphisms in several detoxification genes and alcohol or tobacco intake on risk of PTC. Larger studies with improved exposure assessment should address potential modification of PTC risk by alcohol and tobacco intake to confirm or refute our findings.
Collapse
Affiliation(s)
- Briseis Aschebrook-Kilfoy
- Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, Department of Health and Human Services, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
165
|
Bhattacharjee S, Rajaraman P, Jacobs KB, Wheeler WA, Melin BS, Hartge P, Yeager M, Chung CC, Chanock SJ, Chatterjee N. A subset-based approach improves power and interpretation for the combined analysis of genetic association studies of heterogeneous traits. Am J Hum Genet 2012; 90:821-35. [PMID: 22560090 DOI: 10.1016/j.ajhg.2012.03.015] [Citation(s) in RCA: 182] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2011] [Revised: 02/04/2012] [Accepted: 03/15/2012] [Indexed: 02/06/2023] Open
Abstract
Pooling genome-wide association studies (GWASs) increases power but also poses methodological challenges because studies are often heterogeneous. For example, combining GWASs of related but distinct traits can provide promising directions for the discovery of loci with small but common pleiotropic effects. Classical approaches for meta-analysis or pooled analysis, however, might not be suitable for such analysis because individual variants are likely to be associated with only a subset of the traits or might demonstrate effects in different directions. We propose a method that exhaustively explores subsets of studies for the presence of true association signals that are in either the same direction or possibly opposite directions. An efficient approximation is used for rapid evaluation of p values. We present two illustrative applications, one for a meta-analysis of separate case-control studies of six distinct cancers and another for pooled analysis of a case-control study of glioma, a class of brain tumors that contains heterogeneous subtypes. Both the applications and additional simulation studies demonstrate that the proposed methods offer improved power and more interpretable results when compared to traditional methods for the analysis of heterogeneous traits. The proposed framework has applications beyond genetic association studies.
Collapse
Affiliation(s)
- Samsiddhi Bhattacharjee
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, 6120 Executive Boulevard, Rockville, MD 20852, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
166
|
Liang XS, Pfeiffer RM, Wheeler W, Maeder D, Burdette L, Yeager M, Chanock S, Tucker MA, Goldstein AM, Yang XR. Genetic variants in DNA repair genes and the risk of cutaneous malignant melanoma in melanoma-prone families with/without CDKN2A mutations. Int J Cancer 2012; 130:2062-6. [PMID: 21671477 PMCID: PMC3274649 DOI: 10.1002/ijc.26231] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2011] [Accepted: 05/16/2011] [Indexed: 11/10/2022]
Abstract
Cutaneous malignant melanoma (CMM) is an etiologically heterogeneous disease with genetic, environmental (sun exposure) and host (pigmentation/nevi) factors and their interactions contributing to risk. Genetic variants in DNA repair genes may be particularly important since their altered function in response to sun exposure-related DNA damage maybe related to risk for CMM. However, systematic evaluations of genetic variants in DNA repair genes are limited, particularly in high-risk families. We comprehensively analyzed DNA repair gene polymorphisms and CMM risk in melanoma-prone families with/without CDKN2A mutations. A total of 586 individuals (183 CMM) from 53 families (23 CDKN2A (+), 30 CDKN2A (-)) were genotyped for 2964 tagSNPs in 131 DNA repair genes. Conditional logistic regression, conditioning on families, was used to estimate trend p-values, odds ratios and 95% confidence intervals for the association between CMM and each SNP separately, adjusted for age and sex. p-Values for SNPs in the same gene were combined to yield gene specific p-values. Two genes, POLN and PRKDC, were significantly associated with melanoma after Bonferroni correction for multiple testing (p = 0.0003 and 0.00035, respectively). DCLRE1B showed suggestive association (p = 0.0006). 28 ∼ 56% of genotyped SNPs in these genes had single SNP p < 0.05. The most significant SNPs in POLN and PRKDC had similar effects in CDKN2A (+) and CDKN2A (-) families. Our finding suggests that polymorphisms in DNA repair genes, POLN and PRKDC, were associated with increased melanoma risk in melanoma families with and without CDKN2A mutations.
Collapse
|
167
|
Sun L, Rommens JM, Corvol H, Li W, Li X, Chiang TA, Lin F, Dorfman R, Busson PF, Parekh RV, Zelenika D, Blackman SM, Corey M, Doshi VK, Henderson L, Naughton KM, O'Neal WK, Pace RG, Stonebraker JR, Wood SD, Wright FA, Zielenski J, Clement A, Drumm ML, Boëlle PY, Cutting GR, Knowles MR, Durie PR, Strug LJ. Multiple apical plasma membrane constituents are associated with susceptibility to meconium ileus in individuals with cystic fibrosis. Nat Genet 2012; 44:562-9. [PMID: 22466613 PMCID: PMC3371103 DOI: 10.1038/ng.2221] [Citation(s) in RCA: 155] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2011] [Accepted: 02/24/2012] [Indexed: 01/18/2023]
Abstract
Variants associated with meconium ileus in cystic fibrosis were identified in 3,763 affected individuals by genome-wide association study (GWAS). Five SNPs at two loci near SLC6A14 at Xq23-24 (minimum P = 1.28 × 10(-12) at rs3788766) and SLC26A9 at 1q32.1 (minimum P = 9.88 × 10(-9) at rs4077468) accounted for ~5% of phenotypic variability and were replicated in an independent sample of affected individuals (n = 2,372; P = 0.001 and 0.0001, respectively). By incorporating the knowledge that disease-causing mutations in CFTR alter electrolyte and fluid flux across surface epithelium into a hypothesis-driven GWAS (GWAS-HD), we identified associations with the same SNPs in SLC6A14 and SLC26A9 and established evidence for the involvement of SNPs in a third solute carrier gene, SLC9A3. In addition, GWAS-HD provided evidence of association between meconium ileus and multiple genes encoding constituents of the apical plasma membrane where CFTR resides (P = 0.0002; testing of 155 apical membrane genes jointly and in replication, P = 0.022). These findings suggest that modulating activities of apical membrane constituents could complement current therapeutic paradigms for cystic fibrosis.
Collapse
Affiliation(s)
- Lei Sun
- Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
168
|
Safaeian M, Hildesheim A, Gonzalez P, Yu K, Porras C, Li Q, Rodriguez AC, Sherman ME, Schiffman M, Wacholder S, Burk R, Herrero R, Burdette L, Chanock SJ, Wang SS. Single nucleotide polymorphisms in the PRDX3 and RPS19 and risk of HPV persistence and cervical precancer/cancer. PLoS One 2012; 7:e33619. [PMID: 22496757 PMCID: PMC3322120 DOI: 10.1371/journal.pone.0033619] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2011] [Accepted: 02/14/2012] [Indexed: 11/22/2022] Open
Abstract
Background Host genetic factors might affect the risk of progression from infection with carcinogenic human papillomavirus (HPV), the etiologic agent for cervical cancer, to persistent HPV infection, and hence to cervical precancer and cancer. Methodology/Principal Findings We assessed 18,310 tag single nucleotide polymorphisms (SNPs) from 1113 genes in 416 cervical intraepithelial neoplasia 3 (CIN3)/cancer cases, 356 women with persistent carcinogenic HPV infection (median persistence of 25 months) and 425 randomly selected women (non-cases and non-HPV persistent) from the 10,049 women from the Guanacaste, Costa Rica HPV natural history cohort. For gene and SNP associations, we computed age-adjusted odds ratio and p-trend. Three comparisons were made: 1) association with CIN3/cancer (compared CIN3/cancer cases to random controls), 2) association with persistence (compared HPV persistence to random controls), and 3) progression (compared CIN3/cancers with HPV-persistent group). Regions statistically significantly associated with CIN3/cancer included genes for peroxiredoxin 3 PRDX3, and ribosomal protein S19 RPS19. The single most significant SNPs from each gene associated with CIN3/cancer were PRDX3 rs7082598 (Ptrend<0.0001), and RPS19 rs2305809 (Ptrend=0.0007), respectively. Both SNPs were also associated with progression. Conclusions/Significance These data suggest involvement of two genes, RSP19 and PRDX3, or other SNPs in linkage disequilibrium, with cervical cancer risk. Further investigation showed that they may be involved in both the persistence and progression transition stages. Our results require replication but, if true, suggest a role for ribosomal dysfunction, mitochondrial processes, and/or oxidative stress, or other unknown function of these genes in cervical carcinogenesis.
Collapse
Affiliation(s)
- Mahboobeh Safaeian
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
169
|
Mirabello L, Sun C, Ghosh A, Rodriguez AC, Schiffman M, Wentzensen N, Hildesheim A, Herrero R, Wacholder S, Lorincz A, Burk RD. Methylation of human papillomavirus type 16 genome and risk of cervical precancer in a Costa Rican population. J Natl Cancer Inst 2012; 104:556-65. [PMID: 22448030 DOI: 10.1093/jnci/djs135] [Citation(s) in RCA: 88] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Previous studies have suggested an association between human papillomavirus type 16 (HPV16) genome methylation and cervical intraepithelial neoplasia grade 3 (CIN3) (ie, cervical precancer) and cancer, but the results have been inconsistent. METHODS We designed a case-control study within a large prospective cohort of women who underwent multiple screenings for cervical cancer in Guanacaste, Costa Rica. Diagnostic specimens were collected at the time of CIN3 diagnosis (n = 30 case subjects) and persistent HPV16 infection (persistence; n = 35 case subjects), prediagnostic specimens at the first HPV16-positive screening visit (n = 20 CIN3 case subjects; n = 35 persistence case subjects), and control specimens from women with infection clearance within 2 years (n = 34 control subjects). DNA extracted from specimens (cervical cells) was analyzed for methylation levels at 67 CpG sites throughout the HPV16 genome using pyrosequencing. Benjamini-Hochberg method was used to account for multiple testing. Associations between methylation levels and risk of CIN3 or persistence were assessed using logistic regression models to estimate odds ratios (ORs) and 95% confidence intervals (CIs). RESULTS Increased methylation in diagnostic vs control specimens at nine CpG sites, three in each L1, L2, and E2/E4 genomic regions, was associated with an increased risk of CIN3 (third tertile [high] vs first and second tertiles combined [low], OR = 3.29 [95% CI = 1.16 to 9.34] to 11.12 [95% CI = 2.29 to 76.80]) and persistence. High methylation at three of these CpG sites was associated with a much higher risk when combined compared with low methylation at these sites (OR = 52, 95% CI = 4.0 to 670). In prediagnostic vs control specimens, increased methylation at a CpG site (nucleotide position 4261) in L2 was associated with an increased risk of CIN3. CONCLUSION In this HPV16-infected cohort, increased methylation of CpG sites within the HPV16 genome before diagnosis and at the time of diagnosis was associated with cervical precancer.
Collapse
Affiliation(s)
- Lisa Mirabello
- Clinical Genetics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, 6120 Executive Blvd, EPS/7101, Rockville, MD 20892, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
170
|
Shahbaba B, Shachaf CM, Yu Z. A pathway analysis method for genome-wide association studies. Stat Med 2012; 31:988-1000. [PMID: 22302470 DOI: 10.1002/sim.4477] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2011] [Revised: 10/20/2011] [Accepted: 11/02/2011] [Indexed: 12/20/2022]
Abstract
For genome-wide association studies, we propose a new method for identifying significant biological pathways. In this approach, we aggregate data across single-nucleotide polymorphisms to obtain summary measures at the gene level. We then use a hierarchical Bayesian model, which takes the gene-level summary measures as data, in order to evaluate the relevance of each pathway to an outcome of interest (e.g., disease status). Although shifting the focus of analysis from individual genes to pathways has proven to improve the statistical power and provide more robust results, such methods tend to eliminate a large number of genes whose pathways are unknown. For these genes, we propose to use a Bayesian multinomial logit model to predict the associated pathways by using the genes with known pathways as the training data. Our hierarchical Bayesian model takes the uncertainty regarding the pathway predictions into account while assessing the significance of pathways. We apply our method to two independent studies on type 2 diabetes and show that the overlap between the results from the two studies is statistically significant. We also evaluate our approach on the basis of simulated data.
Collapse
Affiliation(s)
- Babak Shahbaba
- Department of Statistics, University of California, Irvine, CA, USA
| | | | | |
Collapse
|
171
|
Schonfeld SJ, Neta G, Sturgis EM, Pfeiffer RM, Hutchinson AA, Xu L, Wheeler W, Guénel P, Rajaraman P, de Vathaire F, Ron E, Tucker MA, Chanock SJ, Sigurdson AJ, Brenner AV. Common genetic variants in sex hormone pathway genes and papillary thyroid cancer risk. Thyroid 2012; 22:151-6. [PMID: 22224819 PMCID: PMC3271376 DOI: 10.1089/thy.2011.0309] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
BACKGROUND Hormonal differences are hypothesized to contribute to the approximately ≥2-fold higher thyroid cancer incidence rates among women compared with men worldwide. Although thyroid cancer cells express estrogen receptors and estrogen has a proliferative effect on papillary thyroid cancer (PTC) cells in vitro, epidemiologic studies have not found clear associations between thyroid cancer and female hormonal factors. We hypothesized that polymorphic variation in hormone pathway genes is associated with the risk of developing papillary thyroid cancer. METHODS We evaluated the association between PTC and 1151 tag single nucleotide polymorphisms (SNPs) in 58 candidate gene regions involved in sex hormone synthesis and metabolism, gonadotropins, and prolactin in a case-control study of 344 PTC cases and 452 controls, frequency matched on age and sex. Odds ratios and p-values for the linear trend for the association between each SNP genotype and PTC risk were estimated using unconditional logistic regression. SNPs in the same gene region or pathway were aggregated using adaptive rank-truncated product methods to obtain gene region-specific or pathway-specific p-values. To account for multiple comparisons, we applied the false discovery rate method. RESULTS Seven SNPs had p-values for linear trend <0.01, including four in the CYP19A1 gene, but none of the SNPs remained significant after correction for multiple comparisons. Results were similar when restricting the dataset to women. p-values for examined gene regions and for all genes combined were ≥0.09. CONCLUSIONS Based on these results, SNPs in selected hormone pathway genes do not appear to be strongly related to PTC risk. This observation is in accord with the lack of consistent associations between hormonal factors and PTC risk in epidemiologic studies.
Collapse
Affiliation(s)
- Sara J Schonfeld
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland 20892-7238, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
172
|
Yu K, Wacholder S, Wheeler W, Wang Z, Caporaso N, Landi MT, Liang F. A flexible Bayesian model for studying gene-environment interaction. PLoS Genet 2012; 8:e1002482. [PMID: 22291610 PMCID: PMC3266891 DOI: 10.1371/journal.pgen.1002482] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2011] [Accepted: 11/30/2011] [Indexed: 01/24/2023] Open
Abstract
An important follow-up step after genetic markers are found to be associated with a disease outcome is a more detailed analysis investigating how the implicated gene or chromosomal region and an established environment risk factor interact to influence the disease risk. The standard approach to this study of gene–environment interaction considers one genetic marker at a time and therefore could misrepresent and underestimate the genetic contribution to the joint effect when one or more functional loci, some of which might not be genotyped, exist in the region and interact with the environment risk factor in a complex way. We develop a more global approach based on a Bayesian model that uses a latent genetic profile variable to capture all of the genetic variation in the entire targeted region and allows the environment effect to vary across different genetic profile categories. We also propose a resampling-based test derived from the developed Bayesian model for the detection of gene–environment interaction. Using data collected in the Environment and Genetics in Lung Cancer Etiology (EAGLE) study, we apply the Bayesian model to evaluate the joint effect of smoking intensity and genetic variants in the 15q25.1 region, which contains a cluster of nicotinic acetylcholine receptor genes and has been shown to be associated with both lung cancer and smoking behavior. We find evidence for gene–environment interaction (P-value = 0.016), with the smoking effect appearing to be stronger in subjects with a genetic profile associated with a higher lung cancer risk; the conventional test of gene–environment interaction based on the single-marker approach is far from significant. Many common diseases result from a complex interplay of genetic and environmental risk factors. It is important to study the potential genetic and environmental risk factors jointly in order to achieve a better understanding of the mechanisms underlying disease development. The standard single-marker approach that studies the environmental risk factor and one genetic marker at a time could misrepresent the gene–environment interaction, as the single genetic marker might not be an appropriate surrogate for the underlying genetic functioning polymorphisms. We propose a method to look at gene–environment interaction at the gene/region level by integrating information observed on multiple genetic markers within the selected gene/region with measures of environmental exposure. Using data collected in the Environment and Genetics in Lung Cancer Etiology (EAGLE) study, we apply the proposed model to evaluate the joint effect of smoking intensity and genetic variants in the 15q25.1 region and find evidence for gene–environment interaction (P-value = 0.016), with the smoking effect varying according to a subject's genetic profile.
Collapse
Affiliation(s)
- Kai Yu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, USA.
| | | | | | | | | | | | | |
Collapse
|
173
|
Han SS, Sue LY, Berndt SI, Selhub J, Burdette LA, Rosenberg PS, Ziegler RG. Associations between genes in the one-carbon metabolism pathway and advanced colorectal adenoma risk in individuals with low folate intake. Cancer Epidemiol Biomarkers Prev 2012; 21:417-27. [PMID: 22253295 DOI: 10.1158/1055-9965.epi-11-0782] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Folate is essential for one-carbon metabolism, a pathway required by DNA synthesis, methylation, and repair. Low dietary and circulating folate and polymorphic variation in this pathway are associated with increased risk of colorectal adenoma and cancer. METHODS We genotyped 882 single nucleotide polymorphisms (SNP) in 82 one-carbon metabolism genes for 1,331 cases of advanced colorectal adenoma, identified by sigmoidoscopy at baseline, and 1,501 controls from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO). We evaluated associations between one-carbon genes and adenoma risk in all subjects and stratified by folate intake. We applied the Adaptive Rank Truncated Product (ARTP) method to assess statistical significance at the gene and pathway levels. RESULTS Folate intake was inversely associated with advanced colorectal adenoma risk [odds ratio (OR) by quartile = 0.85, P = 1.9 × 10(-5)]. We found no statistically significant associations between one-carbon genes and adenoma risk in all subjects. As hypothesized, we observed a statistically significant pathway-level association (P = 0.038) in the lowest quartile of folate; no significant associations were found in higher quartiles. Several genes including adenosine deaminase (ADA) and cysteine dioxygenase (CDO1) contributed to this signal (gene-level P = 0.001 and 0.0073, respectively). The most statistically significant SNP was rs244072 in ADA (P = 2.37 × 10(-5)). CONCLUSIONS AND IMPACT Stratification by dietary folate and application of the ARTP method revealed statistically significant pathway- and gene-level associations between one-carbon metabolism genes and risk of advanced colorectal adenoma, which were not apparent in analysis of the entire population. Folate intake may interact with associations between common variants in one-carbon metabolism genes and colorectal adenoma risk.
Collapse
Affiliation(s)
- Summer S Han
- 1Epidemiology and Biostatistics Program, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, Maryland 20892, USA.
| | | | | | | | | | | | | |
Collapse
|
174
|
Menashe I, Figueroa JD, Garcia-Closas M, Chatterjee N, Malats N, Picornell A, Maeder D, Yang Q, Prokunina-Olsson L, Wang Z, Real FX, Jacobs KB, Baris D, Thun M, Albanes D, Purdue MP, Kogevinas M, Hutchinson A, Fu YP, Tang W, Burdette L, Tardón A, Serra C, Carrato A, García-Closas R, Lloreta J, Johnson A, Schwenn M, Schned A, Andriole G, Black A, Jacobs EJ, Diver RW, Gapstur SM, Weinstein SJ, Virtamo J, Caporaso NE, Landi MT, Fraumeni JF, Chanock SJ, Silverman DT, Rothman N. Large-scale pathway-based analysis of bladder cancer genome-wide association data from five studies of European background. PLoS One 2012; 7:e29396. [PMID: 22238607 PMCID: PMC3251580 DOI: 10.1371/journal.pone.0029396] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2011] [Accepted: 11/28/2011] [Indexed: 12/14/2022] Open
Abstract
Pathway analysis of genome-wide association studies (GWAS) offer a unique opportunity to collectively evaluate genetic variants with effects that are too small to be detected individually. We applied a pathway analysis to a bladder cancer GWAS containing data from 3,532 cases and 5,120 controls of European background (n = 5 studies). Thirteen hundred and ninety-nine pathways were drawn from five publicly available resources (Biocarta, Kegg, NCI-PID, HumanCyc, and Reactome), and we constructed 22 additional candidate pathways previously hypothesized to be related to bladder cancer. In total, 1421 pathways, 5647 genes and ∼90,000 SNPs were included in our study. Logistic regression model adjusting for age, sex, study, DNA source, and smoking status was used to assess the marginal trend effect of SNPs on bladder cancer risk. Two complementary pathway-based methods (gene-set enrichment analysis [GSEA], and adapted rank-truncated product [ARTP]) were used to assess the enrichment of association signals within each pathway. Eighteen pathways were detected by either GSEA or ARTP at P≤0.01. To minimize false positives, we used the I(2) statistic to identify SNPs displaying heterogeneous effects across the five studies. After removing these SNPs, seven pathways ('Aromatic amine metabolism' [P(GSEA) = 0.0100, P(ARTP) = 0.0020], 'NAD biosynthesis' [P(GSEA) = 0.0018, P(ARTP) = 0.0086], 'NAD salvage' [P(ARTP) = 0.0068], 'Clathrin derived vesicle budding' [P(ARTP) = 0.0018], 'Lysosome vesicle biogenesis' [P(GSEA) = 0.0023, P(ARTP)<0.00012], 'Retrograde neurotrophin signaling' [P(GSEA) = 0.00840], and 'Mitotic metaphase/anaphase transition' [P(GSEA) = 0.0040]) remained. These pathways seem to belong to three fundamental cellular processes (metabolic detoxification, mitosis, and clathrin-mediated vesicles). Identification of the aromatic amine metabolism pathway provides support for the ability of this approach to identify pathways with established relevance to bladder carcinogenesis.
Collapse
Affiliation(s)
- Idan Menashe
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
175
|
Meyer TE, Chu LW, Li Q, Yu K, Rosenberg PS, Menashe I, Chokkalingam AP, Quraishi SM, Huang WY, Weiss JM, Kaaks R, Hayes RB, Chanock SJ, Hsing AW. The association between inflammation-related genes and serum androgen levels in men: the prostate, lung, colorectal, and ovarian study. Prostate 2012; 72:65-71. [PMID: 21520164 PMCID: PMC3156884 DOI: 10.1002/pros.21407] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/25/2010] [Accepted: 03/28/2011] [Indexed: 01/13/2023]
Abstract
BACKGROUND Androgens and inflammation have been implicated in the etiology of several cancers, including prostate cancer. Serum androgens have been shown to correlate with markers of inflammation and expression of inflammation-related genes. METHODS In this report, we evaluated associations between 9,932 single nucleotide polymorphisms (SNPs) marking common genetic variants in 774 inflammation-related genes and four serum androgen levels (total testosterone [T], bioavailable T [BioT]; 5α-androstane-3α, 17β-diol glucuronide [3αdiol G], and 4-androstene-3,17-dione [androstenedione]), in 560 healthy men (median age 64 years) drawn from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial. Baseline serum androgens were measured by radioimmunoassay. Genotypes were determined as part of the Cancer Genetic Markers of Susceptibility Study genome-wide scan. SNP-hormone associations were evaluated using linear regression of hormones adjusted for age. Gene-based P values were generated using an adaptive rank truncated product (ARTP) method. RESULTS Suggestive associations were observed for two inflammation-related genes and circulating androgen levels (false discovery rate [FDR] q-value <0.1) in both SNP and gene-based tests. Specifically, T was associated with common variants in MMP2 and CD14, with the most significant SNPs being rs893226G > T in MMP2 and rs3822356T > C in CD14 (FDR q-value = 0.09 for both SNPs). Other genes implicated in either SNP or gene-based tests were IK with T and BioT, PRG2 with T, and TNFSF9 with androstenedione. CONCLUSION These results suggest possible cross-talk between androgen levels and inflammation pathways, but larger studies are needed to confirm these findings and to further clarify the interrelationship between inflammation and androgens and their effects on cancer risk.
Collapse
Affiliation(s)
- Tamra E Meyer
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20852, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
176
|
Abstract
Gene-set analysis (GSA) evaluates the overall evidence of association between a phenotype and all genotyped single nucleotide polymorphisms (SNPs) in a set of genes, as opposed to testing for association between a phenotype and each SNP individually. We propose using the Gamma Method (GM) to combine gene-level P-values for assessing the significance of GS association. We performed simulations to compare the GM with several other self-contained GSA strategies, including both one-step and two-step GSA approaches, in a variety of scenarios. We denote a 'one-step' GSA approach to be one in which all SNPs in a GS are used to derive a test of GS association without consideration of gene-level effects, and a 'two-step' approach to be one in which all genotyped SNPs in a gene are first used to evaluate association of the phenotype with all measured variation in the gene and then the gene-level tests of association are aggregated to assess the GS association with the phenotype. The simulations suggest that, overall, two-step methods provide higher power than one-step approaches and that combining gene-level P-values using the GM with a soft truncation threshold between 0.05 and 0.20 is a powerful approach for conducting GSA, relative to the competing approaches assessed. We also applied all of the considered GSA methods to data from a pharmacogenomic study of cisplatin, and obtained evidence suggesting that the glutathione metabolism GS is associated with cisplatin drug response.
Collapse
|
177
|
Schaid DJ, Sinnwell JP, Jenkins GD, McDonnell SK, Ingle JN, Kubo M, Goss PE, Costantino JP, Wickerham DL, Weinshilboum RM. Using the gene ontology to scan multilevel gene sets for associations in genome wide association studies. Genet Epidemiol 2011; 36:3-16. [PMID: 22161999 DOI: 10.1002/gepi.20632] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2011] [Revised: 07/22/2011] [Accepted: 08/02/2011] [Indexed: 11/07/2022]
Abstract
Gene-set analyses have been widely used in gene expression studies, and some of the developed methods have been extended to genome wide association studies (GWAS). Yet, complications due to linkage disequilibrium (LD) among single nucleotide polymorphisms (SNPs), and variable numbers of SNPs per gene and genes per gene-set, have plagued current approaches, often leading to ad hoc "fixes." To overcome some of the current limitations, we developed a general approach to scan GWAS SNP data for both gene-level and gene-set analyses, building on score statistics for generalized linear models, and taking advantage of the directed acyclic graph structure of the gene ontology when creating gene-sets. However, other types of gene-set structures can be used, such as the popular Kyoto Encyclopedia of Genes and Genomes (KEGG). Our approach combines SNPs into genes, and genes into gene-sets, but assures that positive and negative effects of genes on a trait do not cancel. To control for multiple testing of many gene-sets, we use an efficient computational strategy that accounts for LD and provides accurate step-down adjusted P-values for each gene-set. Application of our methods to two different GWAS provide guidance on the potential strengths and weaknesses of our proposed gene-set analyses.
Collapse
Affiliation(s)
- Daniel J Schaid
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota 55905, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
178
|
Melatonin pathway genes and breast cancer risk among Chinese women. Breast Cancer Res Treat 2011; 132:693-9. [PMID: 22138747 DOI: 10.1007/s10549-011-1884-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2011] [Accepted: 11/12/2011] [Indexed: 12/29/2022]
Abstract
Previous studies suggest that melatonin may act on cancer growth through a variety of mechanisms, most notably by direct anti-proliferative effects on breast cancer cells and via interactions with the estrogen pathway. Three genes are largely responsible for mediating the downstream effects of melatonin: melatonin receptors 1a and 1b (MTNR1a and MTNR1b), and arylalkylamine N-acetyltransferase (AANAT). It is hypothesized that genetic variation in these genes may lead to altered protein production or function. To address this question, we conducted a comprehensive evaluation of the association between common single nucleotide polymorphisms (SNPs) in the MTNR1a, MTNR1b, and AANAT genes and breast cancer risk among 2,073 cases and 2,083 controls, using a two-stage analysis of genome-wide association data among women of the Shanghai Breast Cancer Study. Results demonstrate two SNPs were consistently associated with breast cancer risk across both study stages. Compared with MTNR1b rs10765576 major allele carriers (GG or GA), a decreased risk of breast cancer was associated with the AA genotype (OR = 0.78, 95% CI = 0.62-0.97, P = 0.0281). Although no overall association was seen in the combined analysis, the effect of MTNR1a rs7665392 was found to vary by menopausal status (P-value for interaction = 0.001). Premenopausal women with the GG genotype were at increased risk for breast cancer compared with major allele carriers (TT or TG) (OR = 1.57, 95% CI = 1.07-2.31, P = 0.020), while postmenopausal women were at decreased risk (OR = 0.58, 95% 0.36-0.95, P = 0.030). No significant breast cancer associations were found for variants in the AANAT gene. These results suggest that common genetic variation in the MTNR1a and 1b genes may contribute to breast cancer susceptibility, and that associations may vary by menopausal status. Given that multiple variants in high linkage disequilibrium with MTNR1b rs76653292 have been associated with altered function or expression of insulin and glucose family members, further research may focus on clarifying this relationship.
Collapse
|
179
|
Abstract
Quantitative trait locus (QTL) mapping using deep DNA sequencing data is a challenging task. In this study we performed region-based and pathway-based QTL mappings using a p-value combination method to analyze the simulated quantitative traits Q1 and Q4 and the exome sequencing data. The aims were to evaluate the performance of the QTL mapping approaches that were used and to suggest plausible strategies for QTL mapping of DNA sequencing data. We conducted single-locus QTL mappings using a linear regression model with adjustments for age and smoking status, and we also conducted region-based and pathway-based QTL mappings using a truncated product method for combining p-values from the single-locus QTL mapping. To account for the features of rare variants and common single-nucleotide polymorphisms (SNPs), we considered independently rare-variant-only, common-SNP-only, and combined analyses. An analysis of 200 simulated replications showed that the three region-based methods reasonably controlled type I error, whereas the combined analysis yielded the greatest statistical power. Rare-variant-only, common-SNP-only, and combined analyses were also applied to pathway-based QTL mappings. We found that pathway-based QTL mappings had a power of approximately 100% when the significance of the vascular endothelial growth factor pathway was evaluated, but type I errors were slightly inflated. Our approach complements single-locus QTL mapping. An integrated approach using single-locus, combined region-based, and combined pathway-based analyses should yield promising results for QTL mapping of DNA sequencing data.
Collapse
Affiliation(s)
- Hsin-Chou Yang
- Institute of Statistical Science, Academia Sinica, Nankang 115, Taipei, Taiwan.
| | | |
Collapse
|
180
|
Gui H, Li M, Sham PC, Cherny SS. Comparisons of seven algorithms for pathway analysis using the WTCCC Crohn's Disease dataset. BMC Res Notes 2011; 4:386. [PMID: 21981765 PMCID: PMC3199264 DOI: 10.1186/1756-0500-4-386] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2011] [Accepted: 10/07/2011] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Though rooted in genomic expression studies, pathway analysis for genome-wide association studies (GWAS) has gained increasing popularity, since it has the potential to discover hidden disease pathogenic mechanisms by combining statistical methods with biological knowledge. Generally, algorithms or programs proposed recently can be categorized by different types of input data, null hypothesis or counts of analysis stages. Due to complexity caused by SNP, gene and pathway relationships, re-sampling strategies like permutation are always utilized to derive an empirical distribution for test statistics for evaluating the significance of candidate pathways. However, evaluation of these algorithms on real GWAS datasets and real biological pathway databases needs to be addressed before we apply them widely with confidence. FINDINGS Two algorithms which use summary statistics from GWAS as input were implemented in KGG, a novel and user-friendly software tool for GWAS pathway analysis. Comparisons of these two algorithms as well as the other five selected algorithms were conducted by analyzing the WTCCC Crohn's Disease dataset utilizing the MsigDB canonical pathways. As a result of using permutation to obtain empirical p-value, most of these methods could control Type I error rate well, although some are conservative. However, the methods varied greatly in terms of power and running time, with the PLINK truncated set-based test being the most powerful and KGG being the fastest. CONCLUSIONS Raw data-based algorithms, such as those implemented in PLINK, are preferable for GWAS pathway analysis as long as computational capacity is available. It may be worthwhile to apply two or more pathway analysis algorithms on the same GWAS dataset, since the methods differ greatly in their outputs and might provide complementary findings for the studied complex disease.
Collapse
Affiliation(s)
- Hongsheng Gui
- Department of Psychiatry, The University of Hong Kong, Hong Kong, SAR, China.
| | | | | | | |
Collapse
|
181
|
A targeted association study of immunity genes and networks suggests novel associations with placental malaria infection. PLoS One 2011; 6:e24996. [PMID: 21949827 PMCID: PMC3176307 DOI: 10.1371/journal.pone.0024996] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2011] [Accepted: 08/22/2011] [Indexed: 01/17/2023] Open
Abstract
A large proportion of the death toll associated with malaria is a consequence of malaria infection during pregnancy, causing up to 200,000 infant deaths annually. We previously published the first extensive genetic association study of placental malaria infection, and here we extend this analysis considerably, investigating genetic variation in over 9,000 SNPs in more than 1,000 genes involved in immunity and inflammation for their involvement in susceptibility to placental malaria infection. We applied a new approach incorporating results from both single gene analysis as well as gene-gene interactions on a protein-protein interaction network. We found suggestive associations of variants in the gene KLRK1 in the single gene analysis, as well as evidence for associations of multiple members of the IL-7/IL-7R signalling cascade in the combined analysis. To our knowledge, this is the first large-scale genetic study on placental malaria infection to date, opening the door for follow-up studies trying to elucidate the genetic basis of this neglected form of malaria.
Collapse
|
182
|
Pan W, Basu S, Shen X. Adaptive tests for detecting gene-gene and gene-environment interactions. Hum Hered 2011; 72:98-109. [PMID: 21934325 DOI: 10.1159/000330632] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2011] [Accepted: 07/02/2011] [Indexed: 12/14/2022] Open
Abstract
There has been an increasing interest in detecting gene-gene and gene-environment interactions in genetic association studies. A major statistical challenge is how to deal with a large number of parameters measuring possible interaction effects, which leads to reduced power of any statistical test due to a large number of degrees of freedom or high cost of adjustment for multiple testing. Hence, a popular idea is to first apply some dimension reduction techniques before testing, while another is to apply only statistical tests that are developed for and robust to high-dimensional data. To combine both ideas, we propose applying an adaptive sum of squared score (SSU) test and several other adaptive tests. These adaptive tests are extensions of the adaptive Neyman test [Fan, 1996], which was originally proposed for high-dimensional data, providing a simple and effective way for dimension reduction. On the other hand, the original SSU test coincides with a version of a test specifically developed for high-dimensional data. We apply these adaptive tests and their original nonadaptive versions to simulated data to detect interactions between two groups of SNPs (e.g. multiple SNPs in two candidate regions). We found that for sparse models (i.e. with only few non-zero interaction parameters), the adaptive SSU test and its close variant, an adaptive version of the weighted sum of squared score (SSUw) test, improved the power over their non-adaptive versions, and performed consistently well across various scenarios. The proposed adaptive tests are built in the general framework of regression analysis, and can thus be applied to various types of traits in the presence of covariates.
Collapse
Affiliation(s)
- Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, USA. weip @ biostat.umn.edu
| | | | | |
Collapse
|
183
|
Liu X, Wang G, Hong X, Tsai HJ, Liu R, Zhang S, Wang H, Pearson C, Ortiz K, Wang D, Hirsch E, Zuckerman B, Wang X. Associations between gene polymorphisms in fatty acid metabolism pathway and preterm delivery in a US urban black population. Hum Genet 2011; 131:341-51. [PMID: 21847588 DOI: 10.1007/s00439-011-1079-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2011] [Accepted: 08/03/2011] [Indexed: 12/20/2022]
Abstract
There is increasing evidence suggesting that higher intakes of fish or n-3 polyunsaturated fatty acids supplements may decrease the risk of preterm delivery (PTD). We hypothesized that genetic variants of the enzymes critical to fatty acids biosynthesis and metabolism may be associated with PTD. We genotyped 231 potentially functional single nucleotide polymorphisms (SNPs) and tagSNPs in 9 genes (FADS1, FADS2, PTGS1, PTGS2, ALOX5, ALOX5AP, PTGES, PTGES2, and PTGES3) among 1,110 black mothers, including 542 mothers who delivered preterm (<37 weeks gestation) and 568 mothers who delivered full-term babies (≥37 weeks gestation) at Boston Medical Center. After excluding SNPs that are in complete linkage disequilibrium or have lower minor allele frequency (<1%) or call rate (<90%), we examined the association of 206 SNPs with PTD using multiple logistic regression models. We also imputed 190 HapMap SNPs via program MACH and examined their associations with PTD. Finally, we explored gene-level and pathway-level associations with PTD using the adaptive rank truncated product (ARTP) methods. A total of 21 SNPs were associated with PTD (p value ranging from 0.003 to 0.05), including 3 imputed SNPs. Gene-level ARTP statistics indicated that the gene PTGES2 was significantly associated with PTD with a gene-based p value equal to 0.01. No pathway-based association was found. In this large and comprehensive candidate gene study, we found a modest association of genes in fatty acid metabolism pathway with PTD. Further investigation of these gene polymorphisms jointly with fatty acid measures and other genetic factors would help better understand the pathogenesis of PTD.
Collapse
Affiliation(s)
- Xin Liu
- Mary Ann and J. Milburn Smith Child Health Research Program, Children's Memorial Hospital, Children's Memorial Research Center, Department of Pediatrics, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
184
|
Wang L, Jia P, Wolfinger RD, Chen X, Zhao Z. Gene set analysis of genome-wide association studies: methodological issues and perspectives. Genomics 2011; 98:1-8. [PMID: 21565265 PMCID: PMC3852939 DOI: 10.1016/j.ygeno.2011.04.006] [Citation(s) in RCA: 164] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2010] [Revised: 03/02/2011] [Accepted: 04/15/2011] [Indexed: 12/25/2022]
Abstract
Recent studies have demonstrated that gene set analysis, which tests disease association with genetic variants in a group of functionally related genes, is a promising approach for analyzing and interpreting genome-wide association studies (GWAS) data. These approaches aim to increase power by combining association signals from multiple genes in the same gene set. In addition, gene set analysis can also shed more light on the biological processes underlying complex diseases. However, current approaches for gene set analysis are still in an early stage of development in that analysis results are often prone to sources of bias, including gene set size and gene length, linkage disequilibrium patterns and the presence of overlapping genes. In this paper, we provide an in-depth review of the gene set analysis procedures, along with parameter choices and the particular methodology challenges at each stage. In addition to providing a survey of recently developed tools, we also classify the analysis methods into larger categories and discuss their strengths and limitations. In the last section, we outline several important areas for improving the analytical strategies in gene set analysis.
Collapse
Affiliation(s)
- Lily Wang
- Department of Biostatistics, Vanderbilt University, Nashville, TN 37232, USA
| | - Peilin Jia
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
- Department of Psychiatry, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | | | - Xi Chen
- Division of Cancer Biostatistics, Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
- Department of Psychiatry, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
- Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| |
Collapse
|
185
|
Kratz CP, Han SS, Rosenberg PS, Berndt SI, Burdett L, Yeager M, Korde LA, Mai PL, Pfeiffer R, Greene MH. Variants in or near KITLG, BAK1, DMRT1, and TERT-CLPTM1L predispose to familial testicular germ cell tumour. J Med Genet 2011; 48:473-6. [PMID: 21617256 PMCID: PMC3131696 DOI: 10.1136/jmedgenet-2011-100001] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
BACKGROUND Familial testicular germ cell tumours (TGCTs) and bilateral TGCTs comprise 1-2% and 5% of all TGCTs, respectively, but their genetic basis remains largely unknown. AIM To investigate the contribution of known testicular cancer risk variants in familial and bilateral TGCTs. METHODS AND RESULTS The study genotyped 106 single nucleotide polymorphisms (SNPs) in four regions (BAK1, DMRT1, KITLG, TERT-CLPTM1L) previously identified from genome-wide association studies of TGCT, including risk single nucleotide polymorphisms (SNPs) rs210138 (BAK1), rs755383 (DMRT1), rs4635969 (TERT-CLPTM1L) in 97 cases with familial TGCT and 22 affected individuals with sporadic bilateral TGCT as well as 871 controls. Using a generalised estimating equations method that takes into account blood relationships among cases, the associations with familial and bilateral TGCT were analysed. Three previously identified risk SNPs were found to be associated with familial and bilateral TGCT (rs210138: OR 1.80, CI 1.35 to 2.41, p=7.03×10(-5); rs755383: OR 1.67, CI 1.23 to 2.22, p=6.70×10(-4); rs4635969: OR 1.59, CI 1.16 to 2.19, p=4.07×10(-3)). Evidence for a second independent association was found for an SNP in TERT (rs4975605: OR 1.68, CI 1.23 to 2.29, p=1.24×10(-3)). Another association with an SNP was identified in KITLG (rs2046971: OR 2.33, p=1.28×10(-3)); this SNP is in high linkage disequilibrium (LD) with reported risk variant rs995030. CONCLUSION This study provides evidence for replication of recent genome-wide association studies results and shows that variants in or near BAK1, DMRT1, TERT-CLPTM1L, and KITLG predispose to familial and bilateral TGCT. These findings imply that familial TGCT and sporadic TGCT share a common genetic basis.
Collapse
Affiliation(s)
- Christian P Kratz
- Division of CancerEpidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland 20852, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
186
|
Lehne B, Lewis CM, Schlitt T. From SNPs to genes: disease association at the gene level. PLoS One 2011; 6:e20133. [PMID: 21738570 PMCID: PMC3128073 DOI: 10.1371/journal.pone.0020133] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2010] [Accepted: 04/26/2011] [Indexed: 01/16/2023] Open
Abstract
Interpreting Genome-Wide Association Studies (GWAS) at a gene level is an important step towards understanding the molecular processes that lead to disease. In order to incorporate prior biological knowledge such as pathways and protein interactions in the analysis of GWAS data it is necessary to derive one measure of association for each gene. We compare three different methods to obtain gene-wide test statistics from Single Nucleotide Polymorphism (SNP) based association data: choosing the test statistic from the most significant SNP; the mean test statistics of all SNPs; and the mean of the top quartile of all test statistics. We demonstrate that the gene-wide test statistics can be controlled for the number of SNPs within each gene and show that all three methods perform considerably better than expected by chance at identifying genes with confirmed associations. By applying each method to GWAS data for Crohn's Disease and Type 1 Diabetes we identified new potential disease genes.
Collapse
Affiliation(s)
- Benjamin Lehne
- Department of Medical and Molecular Genetics, King's College London, London, United Kingdom
| | - Cathryn M. Lewis
- Department of Medical and Molecular Genetics, King's College London, London, United Kingdom
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, King's College London, London, United Kingdom
| | - Thomas Schlitt
- Department of Medical and Molecular Genetics, King's College London, London, United Kingdom
- * E-mail:
| |
Collapse
|
187
|
Neta G, Brenner AV, Sturgis EM, Pfeiffer RM, Hutchinson AA, Aschebrook-Kilfoy B, Yeager M, Xu L, Wheeler W, Abend M, Ron E, Tucker MA, Chanock SJ, Sigurdson AJ. Common genetic variants related to genomic integrity and risk of papillary thyroid cancer. Carcinogenesis 2011; 32:1231-7. [PMID: 21642358 DOI: 10.1093/carcin/bgr100] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
DNA damage is an important mechanism in carcinogenesis, so genes related to maintaining genomic integrity may influence papillary thyroid cancer (PTC) risk. Candidate gene studies targeting some of these genes have identified only a few polymorphisms associated with risk of PTC. Here, we expanded the scope of previous candidate studies by increasing the number and coverage of genes related to maintenance of genomic integrity. We evaluated 5077 tag single-nucleotide polymorphisms (SNPs) from 340 candidate gene regions hypothesized to be involved in DNA repair, epigenetics, tumor suppression, apoptosis, telomere function and cell cycle control and signaling pathways in a case-control study of 344 PTC cases and 452 matched controls. We estimated odds ratios for associations of single SNPs with PTC risk and combined P values for SNPs in the same gene region or pathway to obtain gene region-specific or pathway-specific P values using adaptive rank-truncated product methods. Nine SNPs had P values <0.0005, three of which were in HDAC4 and were inversely related to PTC risk. After multiple comparisons adjustment, no SNPs remained associated with PTC risk. Seven gene regions were associated with PTC risk at P < 0.01, including HUS1, ALKBH3, HDAC4, BAK1, FAF1_CDKN2C, DACT3 and FZD6. Our results suggest a possible role of genes involved in maintenance of genomic integrity in relation to risk of PTC.
Collapse
Affiliation(s)
- Gila Neta
- Radiation Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health/DHHS, 6120 Executive Boulevard, Rockville, MD 20852-7244, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
188
|
Mirabello L, Yu K, Berndt SI, Burdett L, Wang Z, Chowdhury S, Teshome K, Uzoka A, Hutchinson A, Grotmol T, Douglass C, Hayes RB, Hoover RN, Savage SA. A comprehensive candidate gene approach identifies genetic variation associated with osteosarcoma. BMC Cancer 2011; 11:209. [PMID: 21619704 PMCID: PMC3138419 DOI: 10.1186/1471-2407-11-209] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2010] [Accepted: 05/29/2011] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Osteosarcoma (OS) is a bone malignancy which occurs primarily in adolescents. Since it occurs during a period of rapid growth, genes important in bone formation and growth are plausible modifiers of risk. Genes involved in DNA repair and ribosomal function may contribute to OS pathogenesis, because they maintain the integrity of critical cellular processes. We evaluated these hypotheses in an OS association study of genes from growth/hormone, bone formation, DNA repair, and ribosomal pathways. METHODS We evaluated 4836 tag-SNPs across 255 candidate genes in 96 OS cases and 1426 controls. Logistic regression models were used to estimate the odds ratios (OR) and 95% confidence intervals (CI). RESULTS Twelve SNPs in growth or DNA repair genes were significantly associated with OS after Bonferroni correction. Four SNPs in the DNA repair gene FANCM (ORs 1.9-2.0, P = 0.003-0.004) and 2 SNPs downstream of the growth hormone gene GH1 (OR 1.6, P = 0.002; OR 0.5, P = 0.0009) were significantly associated with OS. One SNP in the region of each of the following genes was significant: MDM2, MPG, FGF2, FGFR3, GNRH2, and IGF1. CONCLUSIONS Our results suggest that several SNPs in biologically plausible pathways are associated with OS. Larger studies are required to confirm our findings.
Collapse
Affiliation(s)
- Lisa Mirabello
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD 20852, USA
| | - Kai Yu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD 20852, USA
| | - Sonja I Berndt
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD 20852, USA
| | - Laurie Burdett
- Core Genotyping Facility, National Cancer Institute, SAIC-Frederick, Inc., Gaithersburg, MD, USA
| | - Zhaoming Wang
- Core Genotyping Facility, National Cancer Institute, SAIC-Frederick, Inc., Gaithersburg, MD, USA
| | - Salma Chowdhury
- Core Genotyping Facility, National Cancer Institute, SAIC-Frederick, Inc., Gaithersburg, MD, USA
| | - Kedest Teshome
- Core Genotyping Facility, National Cancer Institute, SAIC-Frederick, Inc., Gaithersburg, MD, USA
| | - Arinze Uzoka
- Core Genotyping Facility, National Cancer Institute, SAIC-Frederick, Inc., Gaithersburg, MD, USA
| | - Amy Hutchinson
- Core Genotyping Facility, National Cancer Institute, SAIC-Frederick, Inc., Gaithersburg, MD, USA
| | - Tom Grotmol
- Cancer Registry of Norway, PO Box 5313 Majorstuen, NO-0304 Oslo, Norway
| | | | - Richard B Hayes
- Division of Epidemiology, Department of Environmental Medicine, New York University, New York, NY, USA
| | - Robert N Hoover
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD 20852, USA
| | - Sharon A Savage
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD 20852, USA
| |
Collapse
|
189
|
Zhang H, Wacholder S, Qin J, Hildesheim A, Yu K. Improved genetic association tests for an ordinal outcome representing the disease progression process. Genet Epidemiol 2011; 35:499-505. [PMID: 21618605 DOI: 10.1002/gepi.20599] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2011] [Revised: 03/17/2011] [Accepted: 04/26/2011] [Indexed: 11/09/2022]
Abstract
We are interested in detecting genetic variants that influence transition between discrete stages of a disease progression process, such as the natural history of progression to cervical cancer with the following four stages: (1) normal-human papillomavirus (HPV) exposed, (2) persistent infection with oncogenic HPV, (3) cervical intraepithelial neoplasia grades 2 or 3 (CIN2/3), and (4) cervical cancer. Standard statistical tests derived from the proportional odds model or polytomous regression model can be used to study this type of ordinal outcome. But these methods are either too sensitive to the proportion odds assumption or fail to take advantage of the restriction on the parameter space for the genetic variants. Two alternative tests, the maximum score test (MAX) and the adaptive P-value combination test (Adapt-P), are proposed with the aim of striking a balance between efficiency and robustness. A simulation study demonstrates that MAX and Adapt-P have the most robust performance among all considered tests under various realistic scenarios. As a demonstration, we applied the considered tests to a genetic association study of cervical cancer.
Collapse
Affiliation(s)
- Hong Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, 6120 Executive Blvd., Bethesda, MD 20892, USA
| | | | | | | | | |
Collapse
|
190
|
Fernald GH, Capriotti E, Daneshjou R, Karczewski KJ, Altman RB. Bioinformatics challenges for personalized medicine. ACTA ACUST UNITED AC 2011; 27:1741-8. [PMID: 21596790 PMCID: PMC3117361 DOI: 10.1093/bioinformatics/btr295] [Citation(s) in RCA: 134] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
MOTIVATION Widespread availability of low-cost, full genome sequencing will introduce new challenges for bioinformatics. RESULTS This review outlines recent developments in sequencing technologies and genome analysis methods for application in personalized medicine. New methods are needed in four areas to realize the potential of personalized medicine: (i) processing large-scale robust genomic data; (ii) interpreting the functional effect and the impact of genomic variation; (iii) integrating systems data to relate complex genetic interactions with phenotypes; and (iv) translating these discoveries into medical practice. CONTACT russ.altman@stanford.edu
Collapse
Affiliation(s)
- Guy Haskin Fernald
- Biomedical Informatics Training Program, Stanford University School of Medicine, Department of Bioengineering, Stanford University, Stanford, CA, USA
| | | | | | | | | |
Collapse
|
191
|
Weng L, Macciardi F, Subramanian A, Guffanti G, Potkin SG, Yu Z, Xie X. SNP-based pathway enrichment analysis for genome-wide association studies. BMC Bioinformatics 2011; 12:99. [PMID: 21496265 PMCID: PMC3102637 DOI: 10.1186/1471-2105-12-99] [Citation(s) in RCA: 92] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2010] [Accepted: 04/15/2011] [Indexed: 11/10/2022] Open
Abstract
Background Recently we have witnessed a surge of interest in using genome-wide association studies (GWAS) to discover the genetic basis of complex diseases. Many genetic variations, mostly in the form of single nucleotide polymorphisms (SNPs), have been identified in a wide spectrum of diseases, including diabetes, cancer, and psychiatric diseases. A common theme arising from these studies is that the genetic variations discovered by GWAS can only explain a small fraction of the genetic risks associated with the complex diseases. New strategies and statistical approaches are needed to address this lack of explanation. One such approach is the pathway analysis, which considers the genetic variations underlying a biological pathway, rather than separately as in the traditional GWAS studies. A critical challenge in the pathway analysis is how to combine evidences of association over multiple SNPs within a gene and multiple genes within a pathway. Most current methods choose the most significant SNP from each gene as a representative, ignoring the joint action of multiple SNPs within a gene. This approach leads to preferential identification of genes with a greater number of SNPs. Results We describe a SNP-based pathway enrichment method for GWAS studies. The method consists of the following two main steps: 1) for a given pathway, using an adaptive truncated product statistic to identify all representative (potentially more than one) SNPs of each gene, calculating the average number of representative SNPs for the genes, then re-selecting the representative SNPs of genes in the pathway based on this number; and 2) ranking all selected SNPs by the significance of their statistical association with a trait of interest, and testing if the set of SNPs from a particular pathway is significantly enriched with high ranks using a weighted Kolmogorov-Smirnov test. We applied our method to two large genetically distinct GWAS data sets of schizophrenia, one from European-American (EA) and the other from African-American (AA). In the EA data set, we found 22 pathways with nominal P-value less than or equal to 0.001 and corresponding false discovery rate (FDR) less than 5%. In the AA data set, we found 11 pathways by controlling the same nominal P-value and FDR threshold. Interestingly, 8 of these pathways overlap with those found in the EA sample. We have implemented our method in a JAVA software package, called SNP Set Enrichment Analysis (SSEA), which contains a user-friendly interface and is freely available at http://cbcl.ics.uci.edu/SSEA. Conclusions The SNP-based pathway enrichment method described here offers a new alternative approach for analysing GWAS data. By applying it to schizophrenia GWAS studies, we show that our method is able to identify statistically significant pathways, and importantly, pathways that can be replicated in large genetically distinct samples.
Collapse
Affiliation(s)
- Lingjie Weng
- Department of Computer Science, University of California, Irvine, USA
| | | | | | | | | | | | | |
Collapse
|
192
|
Fridley BL, Biernacka JM. Gene set analysis of SNP data: benefits, challenges, and future directions. Eur J Hum Genet 2011; 19:837-43. [PMID: 21487444 DOI: 10.1038/ejhg.2011.57] [Citation(s) in RCA: 108] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
The last decade of human genetic research witnessed the completion of hundreds of genome-wide association studies (GWASs). However, the genetic variants discovered through these efforts account for only a small proportion of the heritability of complex traits. One explanation for the missing heritability is that the common analysis approach, assessing the effect of each single-nucleotide polymorphism (SNP) individually, is not well suited to the detection of small effects of multiple SNPs. Gene set analysis (GSA) is one of several approaches that may contribute to the discovery of additional genetic risk factors for complex traits. Complex phenotypes are thought to be controlled by networks of interacting biochemical and physiological pathways influenced by the products of sets of genes. By assessing the overall evidence of association of a phenotype with all measured variation in a set of genes, GSA may identify functionally relevant sets of genes corresponding to relevant biomolecular pathways, which will enable more focused studies of genetic risk factors. This approach may thus contribute to the discovery of genetic variants responsible for some of the missing heritability. With the increased use of these approaches for the secondary analysis of data from GWAS, it is important to understand the different GSA methods and their strengths and weaknesses, and consider challenges inherent in these types of analyses. This paper provides an overview of GSA, highlighting the key challenges, potential solutions, and directions for ongoing research.
Collapse
Affiliation(s)
- Brooke L Fridley
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA.
| | | |
Collapse
|
193
|
Abstract
Complex diseases such as hypertension are inherently multifactorial and involve many factors of mild-to-minute effect sizes. A genome-wide association study (GWAS) typically tests hundreds of thousands of single-nucleotide polymorphisms (SNPs), and offers opportunity to evaluate aggregated effects of many genetic variants with effects that are too small to detect individually. The gene-set-enrichment analysis (GSEA) is a pathway-based approach that tests for such aggregated effects of genes that are linked by biological functions. A key step in GSEA is the summary statistic (gene score) used to measure the overall relevance of a gene based on all SNPs tested in the gene. Existing GSEA methods use maximum statistics sensitive to gene size and linkage equilibrium. We propose the approach of variable set enrichment analysis (VSEA) and study new gene score methods that are less dependent on gene size. The new method treats groups of variables (SNPs or other variants) as base units for summarizing gene scores and relies less on gene definition itself. The power of VSEA is analyzed by simulation studies modeling various scenarios of complex multiloci interactions. Results show that the new gene scores generally performed better, some substantially so, than existing GSEA extension to GWAS. The new methods are implemented in an R package and when applied to a real GWAS data set demonstrated its practical utility in a GWAS setting.
Collapse
|
194
|
Jiang B, Zhang X, Zuo Y, Kang G. A powerful truncated tail strength method for testing multiple null hypotheses in one dataset. J Theor Biol 2011; 277:67-73. [PMID: 21295595 DOI: 10.1016/j.jtbi.2011.01.029] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2010] [Revised: 01/14/2011] [Accepted: 01/19/2011] [Indexed: 10/18/2022]
Abstract
In microarray analysis, medical imaging analysis and functional magnetic resonance imaging, we often need to test an overall null hypothesis involving a large number of single hypotheses (usually larger than 1000) in one dataset. A tail strength statistic (Taylor and Tibshirani, 2006) and Fisher's probability method are useful and can be applied to measure an overall significance for a large set of independent single hypothesis tests with the overall null hypothesis assuming that all single hypotheses are true. In this paper we propose a new method that improves the tail strength statistic by considering only the values whose corresponding p-values are less than some pre-specified cutoff. We call it truncated tail strength statistic. We illustrate our method using a simulation study and two genome-wide datasets by chromosome. Our method not only controls type one error rate quite well, but also has significantly higher power than the tail strength method and Fisher's method in most cases.
Collapse
Affiliation(s)
- Bo Jiang
- Department of Biostatistics, University of Alabama at Birmingham, AL 35294, USA
| | | | | | | |
Collapse
|
195
|
Lacey JV, Yang H, Gaudet MM, Dunning A, Lissowska J, Sherman ME, Peplonska B, Brinton LA, Healey CS, Ahmed S, Pharoah P, Easton D, Chanock S, Garcia-Closas M. Endometrial cancer and genetic variation in PTEN, PIK3CA, AKT1, MLH1, and MSH2 within a population-based case-control study. Gynecol Oncol 2011; 120:167-73. [PMID: 21093899 PMCID: PMC3073848 DOI: 10.1016/j.ygyno.2010.10.016] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2010] [Revised: 10/13/2010] [Accepted: 10/18/2010] [Indexed: 10/18/2022]
Abstract
OBJECTIVE We assessed whether common genetic variation in PTEN, PIK3CA, AKT1, MLH1, and MSH2-genes that reportedly are frequently altered in endometrial cancer-was associated with risk of endometrial cancer. METHODS Using data from a population-based case-control study in Poland (PECS) of 417 cases and 407 matched controls, we genotyped 76 tagging single nucleotide polymorphisms (tagSNPs; located in or within 10 kb upstream or 5 kb downstream of the gene of interest, minor allele frequency >=5% among various ethnic groups, and not already represented by another tagSNP at a LD of r(2) >=0.80) on an Illumina Custom Infinium iSelect assay that included over 29,000 SNPs in 1316 genes. For individual SNPs, we used unconditional logistic regression models, adjusted for age and site, to generate odds ratios (ORs) and 95% confidence intervals (CIs). To replicate the one statistically significant association in PECS, we independently genotyped that tagSNP among 1141 endometrial cancer cases and 2275 controls from the SEARCH study in the UK. We assessed haplotypes via extended haplotype blocks and the sequential haplotype scan method. RESULTS The rs2677764 tagSNP in PIK3CA was statistically significantly associated with endometrial cancer in PECS (OR=1.42, 95% CI, 1.03-1.95; P=0.03) but not SEARCH (OR=0.98, 95% CI=0.82-1.17). Of the 25 haplotypes observed in at least 5% of cases and controls in PECS, only 1, in PIK3CA, was statistically significantly associated with endometrial cancer (OR=1.39, 95% CI, 1.00-1.93). All haplotype global p-values were null. CONCLUSION Common genetic variation in PTEN, PIK3CA, AKT1, MLH1, or MSH2 was not statistically significantly associated with endometrial cancer.
Collapse
Affiliation(s)
- James V Lacey
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
196
|
Wang L, Jia P, Wolfinger RD, Chen X, Grayson BL, Aune TM, Zhao Z. An efficient hierarchical generalized linear mixed model for pathway analysis of genome-wide association studies. ACTA ACUST UNITED AC 2011; 27:686-92. [PMID: 21266443 DOI: 10.1093/bioinformatics/btq728] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION In genome-wide association studies (GWAS) of complex diseases, genetic variants having real but weak associations often fail to be detected at the stringent genome-wide significance level. Pathway analysis, which tests disease association with combined association signals from a group of variants in the same pathway, has become increasingly popular. However, because of the complexities in genetic data and the large sample sizes in typical GWAS, pathway analysis remains to be challenging. We propose a new statistical model for pathway analysis of GWAS. This model includes a fixed effects component that models mean disease association for a group of genes, and a random effects component that models how each gene's association with disease varies about the gene group mean, thus belongs to the class of mixed effects models. RESULTS The proposed model is computationally efficient and uses only summary statistics. In addition, it corrects for the presence of overlapping genes and linkage disequilibrium (LD). Via simulated and real GWAS data, we showed our model improved power over currently available pathway analysis methods while preserving type I error rate. Furthermore, using the WTCCC Type 1 Diabetes (T1D) dataset, we demonstrated mixed model analysis identified meaningful biological processes that agreed well with previous reports on T1D. Therefore, the proposed methodology provides an efficient statistical modeling framework for systems analysis of GWAS. AVAILABILITY The software code for mixed models analysis is freely available at http://biostat.mc.vanderbilt.edu/LilyWang.
Collapse
Affiliation(s)
- Lily Wang
- Department of Biostatistics, Vanderbilt University, Nashville, TN 37232, USA.
| | | | | | | | | | | | | |
Collapse
|
197
|
Zhao J, Gupta S, Seielstad M, Liu J, Thalamuthu A. Pathway-based analysis using reduced gene subsets in genome-wide association studies. BMC Bioinformatics 2011; 12:17. [PMID: 21226955 PMCID: PMC3033801 DOI: 10.1186/1471-2105-12-17] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2010] [Accepted: 01/12/2011] [Indexed: 12/02/2022] Open
Abstract
Background Single Nucleotide Polymorphism (SNP) analysis only captures a small proportion of associated genetic variants in Genome-Wide Association Studies (GWAS) partly due to small marginal effects. Pathway level analysis incorporating prior biological information offers another way to analyze GWAS's of complex diseases, and promises to reveal the mechanisms leading to complex diseases. Biologically defined pathways are typically comprised of numerous genes. If only a subset of genes in the pathways is associated with disease then a joint analysis including all individual genes would result in a loss of power. To address this issue, we propose a pathway-based method that allows us to test for joint effects by using a pre-selected gene subset. In the proposed approach, each gene is considered as the basic unit, which reduces the number of genetic variants considered and hence reduces the degrees of freedom in the joint analysis. The proposed approach also can be used to investigate the joint effect of several genes in a candidate gene study. Results We applied this new method to a published GWAS of psoriasis and identified 6 biologically plausible pathways, after adjustment for multiple testing. The pathways identified in our analysis overlap with those reported in previous studies. Further, using simulations across a range of gene numbers and effect sizes, we demonstrate that the proposed approach enjoys higher power than several other approaches to detect associated pathways. Conclusions The proposed method could increase the power to discover susceptibility pathways and to identify associated genes using GWAS. In our analysis of genome-wide psoriasis data, we have identified a number of relevant pathways for psoriasis.
Collapse
Affiliation(s)
- Jingyuan Zhao
- Human Genetics, 60 Biopolis Street 02-01, Genome Institute of Singapore, 138672 Singapore
| | | | | | | | | |
Collapse
|
198
|
Yu K, Liang F, Ciampa J, Chatterjee N. Efficient p-value evaluation for resampling-based tests. Biostatistics 2011; 12:582-93. [PMID: 21209154 DOI: 10.1093/biostatistics/kxq078] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The resampling-based test, which often relies on permutation or bootstrap procedures, has been widely used for statistical hypothesis testing when the asymptotic distribution of the test statistic is unavailable or unreliable. It requires repeated calculations of the test statistic on a large number of simulated data sets for its significance level assessment, and thus it could become very computationally intensive. Here, we propose an efficient p-value evaluation procedure by adapting the stochastic approximation Markov chain Monte Carlo algorithm. The new procedure can be used easily for estimating the p-value for any resampling-based test. We show through numeric simulations that the proposed procedure can be 100-500 000 times as efficient (in term of computing time) as the standard resampling-based procedure when evaluating a test statistic with a small p-value (e.g. less than 10( - 6)). With its computational burden reduced by this proposed procedure, the versatile resampling-based test would become computationally feasible for a much wider range of applications. We demonstrate the application of the new method by applying it to a large-scale genetic association study of prostate cancer.
Collapse
Affiliation(s)
- Kai Yu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20892, USA.
| | | | | | | |
Collapse
|
199
|
Mirabello L, Richards EG, Duong LM, Yu K, Wang Z, Cawthon R, Berndt SI, Burdett L, Chowdhury S, Teshome K, Douglass C, Savage SA. Telomere length and variation in telomere biology genes in individuals with osteosarcoma. INTERNATIONAL JOURNAL OF MOLECULAR EPIDEMIOLOGY AND GENETICS 2011; 2:19-29. [PMID: 21537398 PMCID: PMC3077235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Received: 10/15/2010] [Accepted: 11/18/2010] [Indexed: 05/30/2023]
Abstract
Osteosarcoma, the most common primary bone tumor, occurs most frequently in adolescents. Chromosomal aneuploidy is common in osteosarcoma cells, suggesting underlying chromosomal instability. Telomeres, located at chromosome ends, are essential for genomic stability; several studies have suggested that germline telomere length (TL) is associated with cancer risk. We hypothesized that TL and/or common genetic variation in telomere biology genes may be associated with risk of osteosarcoma. We investigated TL in peripheral blood DNA and 713 single nucleotide polymorphisms (SNPs) from 39 telomere biology genes in 98 osteosarcoma cases and 69 orthopedic controls. For the genotyping component, we added 1363 controls from the Prostate, Lung, Colorectal, and Ovarian Cancer ScreeningTrial. Short TL was not associated with osteosarcoma risk overall (OR 1.39, P=0.67), although there was a statistically significant association in females (OR 4.35, 95% Cl 1.20-15.74, P=0.03). Genotype analyses identified seven SNPs in TERF1 significantly associated with osteosarcoma risk after Bonferroni correction by gene. These SNPs were highly linked and associated with a reduced risk of osteosarcoma (OR 0.48-0.53, P=0.0001-0.0006). We also investigated associations between TL and telomere gene SNPs in osteosarcoma cases and orthopedic controls. Several SNPs were associated with TL prior to Bonferroni correction; one SNP in NOLA2 and one in MEN1 were marginally non-significant after correction (P(adj)=0.057 and 0.066, respectively). This pilot-study suggests that females with short telomeres may be at increased risk of osteosarcoma, and that SNPs in TERF1 are inversely associated with osteosarcoma risk.
Collapse
|
200
|
LIU YU, PATEL SANJAY, NIBBE ROD, MAXWELL SEAN, CHOWDHURY SALIMA, KOYUTURK MEHMET, ZHU XIAOFENG, LARKIN EMMAK, BUXBAUM SARAHG, PUNJABI NARESHM, GHARIB SINAA, REDLINE SUSAN, CHANCE MARKR. Systems biology analyses of gene expression and genome wide association study data in obstructive sleep apnea. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2011:14-25. [PMID: 21121029 PMCID: PMC4465214 DOI: 10.1142/9789814335058_0003] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
The precise molecular etiology of obstructive sleep apnea (OSA) is unknown; however recent research indicates that several interconnected aberrant pathways and molecular abnormalities are contributors to OSA. Identifying the genes and pathways associated with OSA can help to expand our understanding of the risk factors for the disease as well as provide new avenues for potential treatment. Towards these goals, we have integrated relevant high dimensional data from various sources, such as genome-wide expression data (microarray), protein-protein interaction (PPI) data and results from genome-wide association studies (GWAS) in order to define sub-network elements that connect some of the known pathways related to the disease as well as define novel regulatory modules related to OSA. Two distinct approaches are applied to identify sub-networks significantly associated with OSA. In the first case we used a biased approach based on sixty genes/proteins with known associations with sleep disorders and/or metabolic disease to seed a search using commercial software to discover networks associated with disease followed by information theoretic (mutual information) scoring of the sub-networks. In the second case we used an unbiased approach and generated an interactome constructed from publicly available gene expression profiles and PPI databases, followed by scoring of the network with p-values from GWAS data derived from OSA patients to uncover sub-networks significant for the disease phenotype. A comparison of the approaches reveals a number of proteins that have been previously known to be associated with OSA or sleep. In addition, our results indicate a novel association of Phosphoinositide 3-kinase, the STAT family of proteins and its related pathways with OSA.
Collapse
Affiliation(s)
- YU LIU
- Center for Proteomics & Bioinformatics, Case Western Reserve University (CWRU), Cleveland, Ohio, 44106, USA
| | - SANJAY PATEL
- Division of Pulmonary, Critical Care and Sleep Medicine, CWRU, Cleveland, Ohio, 44106, USA
| | - ROD NIBBE
- Center for Proteomics & Bioinformatics, CWRU, Cleveland, Ohio, 44106, USA
| | - SEAN MAXWELL
- Center for Proteomics & Bioinformatics, CWRU, Cleveland, Ohio, 44106, USA
| | - SALIM A. CHOWDHURY
- Department of Electrical Engineering & Computer Science, CWRU, Cleveland, Ohio, 44106, USA
| | - MEHMET KOYUTURK
- Department of Electrical Engineering & Computer Science, CWRU, Cleveland, Ohio, 44106, USA
| | - XIAOFENG ZHU
- Department of Epidemiology and Biostatistics, CWRU, Cleveland, Ohio, 44106, USA
| | - EMMA K. LARKIN
- Division of Allergy, Pulmonary and Critical Care, Vanderbilt University Medical Center, 1215 21st Ave S., Nashville, Tennessee, 37232, USA
| | - SARAH G BUXBAUM
- Jackson Heart Study, Jackson State University, Jackson, MS 39213, USA
| | - NARESH M. PUNJABI
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - SINA A. GHARIB
- Center for Lung Biology, Division of Pulmonary and Critical Care Medicine, University of Washington, Seattle, WA 98109, USA
| | - SUSAN REDLINE
- Department of Medicine, CWRU, Cleveland, Ohio, 44106, and Depart of Medicine, Brigham & Women’s Hospital and Beth Israel Deaconess Medical School, Harvard Medical School, Boston, MA, 02115
| | - MARK R. CHANCE
- Center for Proteomics & Bioinformatics, Department of Genetics, Case Western Reserve University, Cleveland, Ohio, 44106, USA
| |
Collapse
|