1
|
Kim EE, Jang CS, Kim H, Han B. PASTRY: achieving balanced power for detecting risk and protective minor alleles in meta-analysis of association studies with overlapping subjects. BMC Bioinformatics 2024; 25:24. [PMID: 38216869 PMCID: PMC10790263 DOI: 10.1186/s12859-023-05627-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Accepted: 12/20/2023] [Indexed: 01/14/2024] Open
Abstract
BACKGROUND Meta-analysis is a statistical method that combines the results of multiple studies to increase statistical power. When multiple studies participating in a meta-analysis utilize the same public dataset as controls, the summary statistics from these studies become correlated. To solve this challenge, Lin and Sullivan proposed a method to provide an optimal test statistic adjusted for the correlation. This method quickly became the standard practice. However, we identified an unexpected power asymmetry phenomenon in this standard framework. This can lead to unbalanced power for detecting protective minor alleles and risk minor alleles. RESULTS We found that the power asymmetry of the current framework is mainly due to the errors in approximating the correlation term. We then developed a meta-analysis method based on an accurate correlation estimator, called PASTRY (A method to avoid Power ASymmeTRY). PASTRY outperformed the standard method on both simulated and real datasets in terms of the power symmetry. CONCLUSIONS Our findings suggest that PASTRY can help to alleviate the power asymmetry problem. PASTRY is available at https://github.com/hanlab-SNU/PASTRY .
Collapse
Affiliation(s)
- Emma E Kim
- Department of Chemistry, Seoul National University, Seoul, 03080, Korea
| | - Chloe Soohyun Jang
- Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, 03080, Korea
| | - Hakin Kim
- Interdisciplinary Program for Bioengineering, Seoul National University, Seoul, 03080, Korea
| | - Buhm Han
- Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, 03080, Korea.
- Interdisciplinary Program for Bioengineering, Seoul National University, Seoul, 03080, Korea.
| |
Collapse
|
2
|
Katki HA, Berndt SI, Machiela MJ, Stewart DR, Garcia-Closas M, Kim J, Shi J, Yu K, Rothman N. Increase in power by obtaining 10 or more controls per case when type-1 error is small in large-scale association studies. BMC Med Res Methodol 2023; 23:153. [PMID: 37386403 PMCID: PMC10308790 DOI: 10.1186/s12874-023-01973-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 06/10/2023] [Indexed: 07/01/2023] Open
Abstract
BACKGROUND The rule of thumb that there is little gain in statistical power by obtaining more than 4 controls per case, is based on type-1 error α = 0.05. However, association studies that evaluate thousands or millions of associations use smaller α and may have access to plentiful controls. We investigate power gains, and reductions in p-values, when increasing well beyond 4 controls per case, for small α. METHODS We calculate the power, the median expected p-value, and the minimum detectable odds-ratio (OR), as a function of the number of controls/case, as α decreases. RESULTS As α decreases, at each ratio of controls per case, the increase in power is larger than for α = 0.05. For α between 10-6 and 10-9 (typical for thousands or millions of associations), increasing from 4 controls per case to 10-50 controls per case increases power. For example, a study with power = 0.2 (α = 5 × 10-8) with 1 control/case has power = 0.65 with 4 controls/case, but with 10 controls/case has power = 0.78, and with 50 controls/case has power = 0.84. For situations where obtaining more than 4 controls per case provides small increases in power beyond 0.9 (at small α), the expected p-value can decrease by orders-of-magnitude below α. Increasing from 1 to 4 controls/case reduces the minimum detectable OR toward the null by 20.9%, and from 4 to 50 controls/case reduces by an additional 9.7%, a result which applies regardless of α and hence also applies to "regular" α = 0.05 epidemiology. CONCLUSIONS At small α, versus 4 controls/case, recruiting 10 or more controls/cases can increase power, reduce the expected p-value by 1-2 orders of magnitude, and meaningfully reduce the minimum detectable OR. These benefits of increasing the controls/case ratio increase as the number of cases increases, although the amount of benefit depends on exposure frequencies and true OR. Provided that controls are comparable to cases, our findings suggest greater sharing of comparable controls in large-scale association studies.
Collapse
Affiliation(s)
- Hormuzd A Katki
- Division of Cancer Epidemiology and Genetics, Department of Health and Human Services, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Sonja I Berndt
- Division of Cancer Epidemiology and Genetics, Department of Health and Human Services, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mitchell J Machiela
- Division of Cancer Epidemiology and Genetics, Department of Health and Human Services, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Douglas R Stewart
- Division of Cancer Epidemiology and Genetics, Department of Health and Human Services, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Montserrat Garcia-Closas
- Division of Cancer Epidemiology and Genetics, Department of Health and Human Services, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Jung Kim
- Division of Cancer Epidemiology and Genetics, Department of Health and Human Services, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Jianxin Shi
- Division of Cancer Epidemiology and Genetics, Department of Health and Human Services, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Kai Yu
- Division of Cancer Epidemiology and Genetics, Department of Health and Human Services, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Nathaniel Rothman
- Division of Cancer Epidemiology and Genetics, Department of Health and Human Services, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
3
|
Mathur R, Fang F, Gaddis N, Hancock DB, Cho MH, Hokanson JE, Bierut LJ, Lutz SM, Young K, Smith AV, Silverman EK, Page GP, Johnson EO. GAWMerge expands GWAS sample size and diversity by combining array-based genotyping and whole-genome sequencing. Commun Biol 2022; 5:806. [PMID: 35953715 PMCID: PMC9372058 DOI: 10.1038/s42003-022-03738-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 07/18/2022] [Indexed: 11/09/2022] Open
Abstract
Genome-wide association studies (GWAS) have made impactful discoveries for complex diseases, often by amassing very large sample sizes. Yet, GWAS of many diseases remain underpowered, especially for non-European ancestries. One cost-effective approach to increase sample size is to combine existing cohorts, which may have limited sample size or be case-only, with public controls, but this approach is limited by the need for a large overlap in variants across genotyping arrays and the scarcity of non-European controls. We developed and validated a protocol, Genotyping Array-WGS Merge (GAWMerge), for combining genotypes from arrays and whole-genome sequencing, ensuring complete variant overlap, and allowing for diverse samples like Trans-Omics for Precision Medicine to be used. Our protocol involves phasing, imputation, and filtering. We illustrated its ability to control technology driven artifacts and type-I error, as well as recover known disease-associated signals across technologies, independent datasets, and ancestries in smoking-related cohorts. GAWMerge enables genetic studies to leverage existing cohorts to validly increase sample size and enhance discovery for understudied traits and ancestries.
Collapse
Affiliation(s)
- Ravi Mathur
- GenOmics, Bioinformatics, and Translational Research Center, RTI International, Research Triangle Park, NC, USA
| | - Fang Fang
- GenOmics, Bioinformatics, and Translational Research Center, RTI International, Research Triangle Park, NC, USA
| | - Nathan Gaddis
- GenOmics, Bioinformatics, and Translational Research Center, RTI International, Research Triangle Park, NC, USA
| | - Dana B Hancock
- GenOmics, Bioinformatics, and Translational Research Center, RTI International, Research Triangle Park, NC, USA
| | - Michael H Cho
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - John E Hokanson
- Department of Epidemiology, Colorado School of Public Health, University of Colorado Denver, Aurora, CO, USA
| | - Laura J Bierut
- Department of Psychiatry, Washington University, St. Louis, MO, USA
| | - Sharon M Lutz
- PRecisiOn Medicine Translational Research (PROMoTeR) Center, Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care, Boston, MA, USA
| | - Kendra Young
- Department of Epidemiology, Colorado School of Public Health, University of Colorado Denver, Aurora, CO, USA
| | - Albert V Smith
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Edwin K Silverman
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Grier P Page
- GenOmics, Bioinformatics, and Translational Research Center, RTI International, Research Triangle Park, NC, USA
- Fellow Program, RTI International, Research Triangle Park, NC, USA
| | - Eric O Johnson
- GenOmics, Bioinformatics, and Translational Research Center, RTI International, Research Triangle Park, NC, USA.
- Fellow Program, RTI International, Research Triangle Park, NC, USA.
| |
Collapse
|
4
|
Claus EB, Cornish AJ, Broderick P, Schildkraut JM, Dobbins SE, Holroyd A, Calvocoressi L, Lu L, Hansen HM, Smirnov I, Walsh KM, Schramm J, Hoffmann P, Nöthen MM, Jöckel KH, Swerdlow A, Larsen SB, Johansen C, Simon M, Bondy M, Wrensch M, Houlston RS, Wiemels JL. Genome-wide association analysis identifies a meningioma risk locus at 11p15.5. Neuro Oncol 2019; 20:1485-1493. [PMID: 29762745 PMCID: PMC6176799 DOI: 10.1093/neuonc/noy077] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Background Meningiomas are adult brain tumors originating in the meningeal coverings of the brain and spinal cord, with significant heritable basis. Genome-wide association studies (GWAS) have previously identified only a single risk locus for meningioma, at 10p12.31. Methods To identify a susceptibility locus for meningioma, we conducted a meta-analysis of 2 GWAS, imputed using a merged reference panel from the 1000 Genomes Project and UK10K data, with validation in 2 independent sample series totaling 2138 cases and 12081 controls. Results We identified a new susceptibility locus for meningioma at 11p15.5 (rs2686876, odds ratio = 1.44, P = 9.86 × 10–9). A number of genes localize to the region of linkage disequilibrium encompassing rs2686876, including RIC8A, which plays a central role in the development of neural crest-derived structures, such as the meninges. Conclusions This finding advances our understanding of the genetic basis of meningioma development and provides additional support for a polygenic model of meningioma.
Collapse
Affiliation(s)
- Elizabeth B Claus
- School of Public Health, Yale University, New Haven, Connecticut, USA.,Department of Neurosurgery, Brigham and Women's Hospital, Boston, Massachusetts, USA
| | - Alex J Cornish
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Peter Broderick
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Joellen M Schildkraut
- Department of Public Health Sciences, University of Virginia, Charlottesville, Virginia, USA
| | - Sara E Dobbins
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Amy Holroyd
- School of Public Health, Yale University, New Haven, Connecticut, USA.,Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Lisa Calvocoressi
- School of Public Health, Yale University, New Haven, Connecticut, USA
| | - Lingeng Lu
- School of Public Health, Yale University, New Haven, Connecticut, USA
| | - Helen M Hansen
- School of Public Health, Yale University, New Haven, Connecticut, USA.,Division of Neuroepidemiology, Department of Neurological Surgery, University of California San Francisco, San Francisco, California, USA
| | - Ivan Smirnov
- School of Public Health, Yale University, New Haven, Connecticut, USA.,Division of Neuroepidemiology, Department of Neurological Surgery, University of California San Francisco, San Francisco, California, USA
| | - Kyle M Walsh
- School of Public Health, Yale University, New Haven, Connecticut, USA.,Division of Neuroepidemiology, Department of Neurological Surgery, University of California San Francisco, San Francisco, California, USA
| | - Johannes Schramm
- School of Public Health, Yale University, New Haven, Connecticut, USA.,University of Bonn Medical School, Bonn, Germany
| | - Per Hoffmann
- School of Public Health, Yale University, New Haven, Connecticut, USA.,Human Genomics Research Group, Department of Biomedicine, University of Basel, Basel, Switzerland.,Department of Genomics, Life & Brain Center, University of Bonn, Bonn, Germany
| | - Markus M Nöthen
- School of Public Health, Yale University, New Haven, Connecticut, USA.,Department of Genomics, Life & Brain Center, University of Bonn, Bonn, Germany.,Institute of Human Genetics, University of Bonn School of Medicine and University Hospital Bonn, Bonn, Germany
| | - Karl-Heinz Jöckel
- School of Public Health, Yale University, New Haven, Connecticut, USA.,Institute for Medical Informatics, Biometry and Epidemiology, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Anthony Swerdlow
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK.,Division of Breast Cancer Research, The Institute of Cancer Research, London, UK
| | - Signe Benzon Larsen
- Unit of Survivorship, The Danish Cancer Society Research Center, Copenhagen, Denmark
| | - Christoffer Johansen
- Unit of Survivorship, The Danish Cancer Society Research Center, Copenhagen, Denmark
| | - Matthias Simon
- University of Bonn Medical School, Bonn, Germany.,Department of Neurosurgery, Bethel Clinic, Bielefeld, Germany
| | - Melissa Bondy
- Section of Epidemiology and Population Sciences, Department of Medicine and Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, Texas, USA
| | - Margaret Wrensch
- Division of Neuroepidemiology, Department of Neurological Surgery, University of California San Francisco, San Francisco, California, USA.,Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
| | - Richard S Houlston
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Joseph L Wiemels
- Division of Neuroepidemiology, Department of Neurological Surgery, University of California San Francisco, San Francisco, California, USA.,Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA.,Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, California, USA
| |
Collapse
|
5
|
Kim EE, Lee S, Lee CH, Oh H, Song K, Han B. FOLD: a method to optimize power in meta-analysis of genetic association studies with overlapping subjects. Bioinformatics 2017; 33:3947-3954. [PMID: 29036405 PMCID: PMC5860085 DOI: 10.1093/bioinformatics/btx463] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2016] [Accepted: 07/19/2017] [Indexed: 11/26/2022] Open
Abstract
Motivation In genetic association studies, meta-analyses are widely used to increase the statistical power by aggregating information from multiple studies. In meta-analyses, participating studies often share the same individuals due to the shared use of publicly available control data or accidental recruiting of the same subjects. As such overlapping can inflate false positive rate, overlapping subjects are traditionally split in the studies prior to meta-analysis, which requires access to genotype data and is not always possible. Fortunately, recently developed meta-analysis methods can systematically account for overlapping subjects at the summary statistics level. Results We identify and report a phenomenon that these methods for overlapping subjects can yield low power. For instance, in our simulation involving a meta-analysis of five studies that share 20% of individuals, whereas the traditional splitting method achieved 80% power, none of the new methods exceeded 32% power. We found that this low power resulted from the unaccounted differences between shared and unshared individuals in terms of their contributions towards the final statistic. Here, we propose an optimal summary-statistic-based method termed as FOLD that increases the power of meta-analysis involving studies with overlapping subjects. Availability and implementation Our method is available at http://software.buhmhan.com/FOLD. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Emma E Kim
- Asan Institute for Life Sciences, Asan Medical Center, Seoul 138-736, Korea.,Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| | - Seunghoon Lee
- Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| | | | - Hyunjung Oh
- Department of Biochemistry and Molecular Biology, University of Ulsan College of Medicine, Seoul 138-736, Korea
| | - Kyuyoung Song
- Department of Biochemistry and Molecular Biology, University of Ulsan College of Medicine, Seoul 138-736, Korea
| | - Buhm Han
- Asan Institute for Life Sciences, Asan Medical Center, Seoul 138-736, Korea.,Department of Convergence Medicine
| |
Collapse
|
6
|
Johnson EO, Hancock DB, Levy JL, Gaddis NC, Page GP, Glasheen C, Saccone NL, Bierut LJ, Kral AH. KAT2B polymorphism identified for drug abuse in African Americans with regulatory links to drug abuse pathways in human prefrontal cortex. Addict Biol 2016; 21:1217-1232. [PMID: 26202629 PMCID: PMC4724343 DOI: 10.1111/adb.12286] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2015] [Revised: 05/12/2015] [Accepted: 06/20/2015] [Indexed: 12/21/2022]
Abstract
Drug abuse is a common and heritable set of disorders, but the underlying genetic factors are largely unknown. We conducted genome-wide association studies of drug abuse using 7 million imputed single nucleotide polymorphisms (SNPs) and insertions/deletions in African Americans (AAs; n = 3742) and European Americans (EAs; n = 6845). Cases were drawn from the Urban Health Study of street-recruited people, who injected drugs and reported abusing opioids, cocaine, marijuana, stimulants and/or other drugs 10 or more times in the past 30 days, and were compared with population controls. Independent replication testing was conducted in 755 AAs and 1131 EAs from the Genetic Association Information Network. An intronic SNP (rs9829896) in the K(lysine) acetyltransferase 2B (KAT2B) gene was significantly associated with drug abuse in AAs (P = 4.63 × 10-8 ) and independently replicated in AAs (P = 0.0019). The rs9829896-C allele (frequency = 12%) had odds ratios of 0.68 and 0.53 across the AA cohorts: meta-analysis P = 3.93 × 10-10 . Rs9829896-C was not associated with drug abuse across the EA cohorts: frequency = 36% and meta-analysis P = 0.12. Using dorsolateral prefrontal cortex data from the BrainCloud cohort, we found that rs9829896-C was associated with reduced KAT2B expression in AAs (n = 113, P = 0.050) but not EAs (n = 110, P = 0.39). KAT2B encodes a transcriptional regulator in the cyclic adenosine monophosphate and dopamine signaling pathways, and rs9829896-C was associated with expression of genes in these pathways: reduced CREBBP expression (P = 0.011) and increased OPRM1 expression (P = 0.016), both in AAs only. Our study identified the KAT2B SNP rs9829896 as having novel and biologically plausible associations with drug abuse and gene expression in AAs but not EAs, suggesting ancestry-specific effects.
Collapse
Affiliation(s)
- Eric O Johnson
- Fellow Program and Behavioral Health and Criminal Justice Division, RTI International, Research Triangle Park, NC, USA.
| | - Dana B Hancock
- Behavioral and Urban Health Program, Behavioral Health and Criminal Justice Division, RTI International, Research Triangle Park, NC, USA
| | - Joshua L Levy
- Research Computing Division, RTI International, Research Triangle Park, NC, USA
| | - Nathan C Gaddis
- Research Computing Division, RTI International, Research Triangle Park, NC, USA
| | - Grier P Page
- Fellow Program, Center for Genomics in Public Health and Medicine, and Genomics, Statistical Genetics, and Environmental Research Program, RTI International, Atlanta, GA, USA
| | - Cristie Glasheen
- Behavioral and Urban Health Program, Behavioral Health and Criminal Justice Division, RTI International, Research Triangle Park, NC, USA
| | - Nancy L Saccone
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Laura J Bierut
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, USA
| | - Alex H Kral
- Behavioral and Urban Health Program, Behavioral Health and Criminal Justice Division, RTI International, San Francisco, CA, USA
| |
Collapse
|
7
|
Zhou YH, Wright FA. Hypothesis testing at the extremes: fast and robust association for high-throughput data. Biostatistics 2015; 16:611-25. [PMID: 25792622 DOI: 10.1093/biostatistics/kxv007] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2014] [Accepted: 02/16/2015] [Indexed: 01/16/2023] Open
Abstract
A number of biomedical problems require performing many hypothesis tests, with an attendant need to apply stringent thresholds. Often the data take the form of a series of predictor vectors, each of which must be compared with a single response vector, perhaps with nuisance covariates. Parametric tests of association are often used, but can result in inaccurate type I error at the extreme thresholds, even for large sample sizes. Furthermore, standard two-sided testing can reduce power compared with the doubled [Formula: see text]-value, due to asymmetry in the null distribution. Exact (permutation) testing is attractive, but can be computationally intensive and cumbersome. We present an approximation to exact association tests of trend that is accurate and fast enough for standard use in high-throughput settings, and can easily provide standard two-sided or doubled [Formula: see text]-values. The approach is shown to be equivalent under permutation to likelihood ratio tests for the most commonly used generalized linear models (GLMs). For linear regression, covariates are handled by working with covariate-residualized responses and predictors. For GLMs, stratified covariates can be handled in a manner similar to exact conditional testing. Simulations and examples illustrate the wide applicability of the approach. The accompanying mcc package is available on CRAN http://cran.r-project.org/web/packages/mcc/index.html.
Collapse
Affiliation(s)
- Yi-Hui Zhou
- Bioinformatics Research Center, Department of Biological Sciences, North Carolina State University, Raleigh, NC 27695, USA
| | - Fred A Wright
- Bioinformatics Research Center, Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
| |
Collapse
|
8
|
Willis JA, Mukherjee S, Orlow I, Viale A, Offit K, Kurtz RC, Olson SH, Klein RJ. Genome-wide analysis of the role of copy-number variation in pancreatic cancer risk. Front Genet 2014; 5:29. [PMID: 24592275 PMCID: PMC3923159 DOI: 10.3389/fgene.2014.00029] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2013] [Accepted: 01/26/2014] [Indexed: 12/20/2022] Open
Abstract
Although family history is a risk factor for pancreatic adenocarcinoma, much of the genetic etiology of this disease remains unknown. While genome-wide association studies have identified some common single nucleotide polymorphisms (SNPs) associated with pancreatic cancer risk, these SNPs do not explain all the heritability of this disease. We hypothesized that copy number variation (CNVs) in the genome may play a role in genetic predisposition to pancreatic adenocarcinoma. Here, we report a genome-wide analysis of CNVs in a small hospital-based, European ancestry cohort of pancreatic cancer cases and controls. Germline CNV discovery was performed using the Illumina Human CNV370 platform in 223 pancreatic cancer cases (both sporadic and familial) and 169 controls. Following stringent quality control, we asked if global CNV burden was a risk factor for pancreatic cancer. Finally, we performed in silico CNV genotyping and association testing to discover novel CNV risk loci. When we examined the global CNV burden, we found no strong evidence that CNV burden plays a role in pancreatic cancer risk either overall or specifically in individuals with a family history of the disease. Similarly, we saw no significant evidence that any particular CNV is associated with pancreatic cancer risk. Taken together, these data suggest that CNVs do not contribute substantially to the genetic etiology of pancreatic cancer, though the results are tempered by small sample size and large experimental variability inherent in array-based CNV studies.
Collapse
Affiliation(s)
- Jason A Willis
- Department of Medicine, Memorial Sloan-Kettering Cancer Center New York, NY, USA ; Program in Cancer Biology and Genetics, Memorial Sloan-Kettering Cancer Center New York, NY, USA
| | - Semanti Mukherjee
- Program in Cancer Biology and Genetics, Memorial Sloan-Kettering Cancer Center New York, NY, USA
| | - Irene Orlow
- Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center New York, NY, USA
| | - Agnes Viale
- Genomics Core Laboratory, Memorial Sloan-Kettering Cancer Center New York, NY, USA
| | - Kenneth Offit
- Department of Medicine, Memorial Sloan-Kettering Cancer Center New York, NY, USA ; Program in Cancer Biology and Genetics, Memorial Sloan-Kettering Cancer Center New York, NY, USA
| | - Robert C Kurtz
- Department of Medicine, Memorial Sloan-Kettering Cancer Center New York, NY, USA
| | - Sara H Olson
- Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center New York, NY, USA
| | - Robert J Klein
- Department of Medicine, Memorial Sloan-Kettering Cancer Center New York, NY, USA ; Program in Cancer Biology and Genetics, Memorial Sloan-Kettering Cancer Center New York, NY, USA
| |
Collapse
|
9
|
Johnson EO, Hancock DB, Levy JL, Gaddis NC, Saccone NL, Bierut LJ, Page GP. Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy. Hum Genet 2013; 132:509-22. [PMID: 23334152 PMCID: PMC3628082 DOI: 10.1007/s00439-013-1266-7] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2012] [Accepted: 01/07/2013] [Indexed: 12/20/2022]
Abstract
A great promise of publicly sharing genome-wide association data is the potential to create composite sets of controls. However, studies often use different genotyping arrays, and imputation to a common set of SNPs has shown substantial bias: a problem which has no broadly applicable solution. Based on the idea that using differing genotyped SNP sets as inputs creates differential imputation errors and thus bias in the composite set of controls, we examined the degree to which each of the following occurs: (1) imputation based on the union of genotyped SNPs (i.e., SNPs available on one or more arrays) results in bias, as evidenced by spurious associations (type 1 error) between imputed genotypes and arbitrarily assigned case/control status; (2) imputation based on the intersection of genotyped SNPs (i.e., SNPs available on all arrays) does not evidence such bias; and (3) imputation quality varies by the size of the intersection of genotyped SNP sets. Imputations were conducted in European Americans and African Americans with reference to HapMap phase II and III data. Imputation based on the union of genotyped SNPs across the Illumina 1M and 550v3 arrays showed spurious associations for 0.2 % of SNPs: ~2,000 false positives per million SNPs imputed. Biases remained problematic for very similar arrays (550v1 vs. 550v3) and were substantial for dissimilar arrays (Illumina 1M vs. Affymetrix 6.0). In all instances, imputing based on the intersection of genotyped SNPs (as few as 30 % of the total SNPs genotyped) eliminated such bias while still achieving good imputation quality.
Collapse
Affiliation(s)
- Eric O Johnson
- Behavioral Health Epidemiology Program, RTI International, 3040 Cornwallis Road, PO Box 12194, Research Triangle Park, NC 27709-12194, USA.
| | | | | | | | | | | | | |
Collapse
|
10
|
Vijai J, Kirchhoff T, Schrader KA, Brown J, Dutra-Clarke AV, Manschreck C, Hansen N, Rau-Murthy R, Sarrel K, Przybylo J, Shah S, Cheguri S, Stadler Z, Zhang L, Paltiel O, Ben-Yehuda D, Viale A, Portlock C, Straus D, Lipkin SM, Lacher M, Robson M, Klein RJ, Zelenetz A, Offit K. Susceptibility loci associated with specific and shared subtypes of lymphoid malignancies. PLoS Genet 2013; 9:e1003220. [PMID: 23349640 PMCID: PMC3547842 DOI: 10.1371/journal.pgen.1003220] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2012] [Accepted: 10/18/2012] [Indexed: 12/31/2022] Open
Abstract
The genetics of lymphoma susceptibility reflect the marked heterogeneity of diseases that comprise this broad phenotype. However, multiple subtypes of lymphoma are observed in some families, suggesting shared pathways of genetic predisposition to these pathologically distinct entities. Using a two-stage GWAS, we tested 530,583 SNPs in 944 cases of lymphoma, including 282 familial cases, and 4,044 public shared controls, followed by genotyping of 50 SNPs in 1,245 cases and 2,596 controls. A novel region on 11q12.1 showed association with combined lymphoma (LYM) subtypes. SNPs in this region included rs12289961 near LPXN, (P(LYM) = 3.89×10(-8), OR = 1.29) and rs948562 (P(LYM) = 5.85×10(-7), OR = 1.29). A SNP in a novel non-HLA region on 6p23 (rs707824, P(NHL) = 5.72×10(-7)) was suggestive of an association conferring susceptibility to lymphoma. Four SNPs, all in a previously reported HLA region, 6p21.32, showed genome-wide significant associations with follicular lymphoma. The most significant association with follicular lymphoma was for rs4530903 (P(FL) = 2.69×10(-12), OR = 1.93). Three novel SNPs near the HLA locus, rs9268853, rs2647046, and rs2621416, demonstrated additional variation contributing toward genetic susceptibility to FL associated with this region. Genes implicated by GWAS were also found to be cis-eQTLs in lymphoblastoid cell lines; candidate genes in these regions have been implicated in hematopoiesis and immune function. These results, showing novel susceptibility regions and allelic heterogeneity, point to the existence of pathways of susceptibility to both shared as well as specific subtypes of lymphoid malignancy.
Collapse
Affiliation(s)
- Joseph Vijai
- Clinical Genetics Service, Department of Medicine, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
- Cancer Biology and Genetics Program, Sloan-Kettering Institute, New York, New York, United States of America
| | - Tomas Kirchhoff
- New York University Cancer Institute, New York University School of Medicine, New York, New York, United States of America
| | - Kasmintan A. Schrader
- Clinical Genetics Service, Department of Medicine, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
- Cancer Biology and Genetics Program, Sloan-Kettering Institute, New York, New York, United States of America
| | - Jennifer Brown
- Dana Farber Cancer Institute, Boston, Massachusetts, United States of America
| | - Ana Virginia Dutra-Clarke
- Clinical Genetics Service, Department of Medicine, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| | - Christopher Manschreck
- Clinical Genetics Service, Department of Medicine, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| | - Nichole Hansen
- Clinical Genetics Service, Department of Medicine, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| | - Rohini Rau-Murthy
- Clinical Genetics Service, Department of Medicine, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| | - Kara Sarrel
- Clinical Genetics Service, Department of Medicine, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| | - Jennifer Przybylo
- Clinical Genetics Service, Department of Medicine, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| | - Sohela Shah
- Clinical Genetics Service, Department of Medicine, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
- Cancer Biology and Genetics Program, Sloan-Kettering Institute, New York, New York, United States of America
| | - Srujana Cheguri
- Clinical Genetics Service, Department of Medicine, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| | - Zsofia Stadler
- Clinical Genetics Service, Department of Medicine, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| | - Liying Zhang
- Diagnostic Molecular Genetics Laboratory, Department of Pathology, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| | - Ora Paltiel
- Department of Hematology, Hadassah-Hebrew University Medical Center, Jerusalem, Israel
| | - Dina Ben-Yehuda
- Department of Hematology, Hadassah-Hebrew University Medical Center, Jerusalem, Israel
| | - Agnes Viale
- Genomics Core, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| | - Carol Portlock
- Lymphoma Service, Department of Medicine, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| | - David Straus
- Lymphoma Service, Department of Medicine, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| | - Steven M. Lipkin
- Weill Cornell Medical Center, New York, New York, United States of America
| | - Mortimer Lacher
- Lymphoma Service, Department of Medicine, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| | - Mark Robson
- Clinical Genetics Service, Department of Medicine, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| | - Robert J. Klein
- Cancer Biology and Genetics Program, Sloan-Kettering Institute, New York, New York, United States of America
| | - Andrew Zelenetz
- Lymphoma Service, Department of Medicine, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
| | - Kenneth Offit
- Clinical Genetics Service, Department of Medicine, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
- Cancer Biology and Genetics Program, Sloan-Kettering Institute, New York, New York, United States of America
- Lymphoma Service, Department of Medicine, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
- * E-mail:
| |
Collapse
|
11
|
Abstract
This unit provides an overview of the design and analysis of population-based case-control studies of genetic risk factors for complex disease. Considerations specific to genetic studies are emphasized. The unit reviews basic study designs differentiating case-control studies from others, presents different genetic association strategies (candidate gene, genome-wide association, and high-throughput sequencing), introduces basic methods of statistical analysis for case-control data and approaches to combining case-control studies, and discusses measures of association and impact. Admixed populations, controlling for confounding (including population stratification), consideration of multiple loci and environmental risk factors, and complementary analyses of haplotypes, genes, and pathways are briefly discussed. Readers are referred to basic texts on epidemiology for more details on general conduct of case-control studies.
Collapse
Affiliation(s)
- Dana B Hancock
- Research Triangle Institute International, Research Triangle Park, North Carolina, USA
| | | |
Collapse
|