1
|
YING N, WANG S, XU H, WANG Y. Association between FEN1 Polymorphisms -69G>A and 4150G>T with Susceptibility in Human Disease: A Meta-Analysis. IRANIAN JOURNAL OF PUBLIC HEALTH 2015; 44:1574-9. [PMID: 26811808 PMCID: PMC4724730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
BACKGROUND As a DNA repair protein, flap endonuclease 1 is a key enzyme in maintaining genomic instability and preventing carcinogenesis. Two single nucleotide polymorphisms (SNPs), -69G>A and 4150G>T are associated with DNA damage. This meta-analysis is to evaluate the genetic effects of FEN1 gene SNPs (-69G/A and 4150G/T) and the susceptibility to diseases, including glioma risk, breast cancer, lung cancer, keratoconus (KC) and fuchs' endothelial corneal dystrophy (FECD). METHODS A literature search of PubMed and Embase was conducted to identify all eligible published studies. Five case-control studies were included with a total of 5612 cases and 6703 controls in this meta-analysis. Crude odds ratios (ORs) with their corresponding confidence intervals (95%CI) were used to assess the strength of the association. RESULTS The FEN1 -69G/A and 4150G/T polymorphisms were significantly associated with the disease risk. Our meta-analysis showed the FEN1 -69GG genotype was correlated to increase risk for the contained diseases compared with the -69AG genotype (OR=0.77, 95%CI=0.71∼0.83). Moreover, the FEN1 4150GG genotype could increase diseases risk compared with the 4150TG genotype (OR=0.81, 95%CI=0.75∼0.87). CONCLUSION The variant genotypes of the FEN1 -69G/A and FEN1 4150G/T polymorphisms may be associated with diseases susceptibility. However, more studies are needed to detect the disease risk in different ethnic populations.
Collapse
Affiliation(s)
- Nanjiao YING
- Inst. of Biomedical Engineering, Hangzhou Dianzi University, Hangzhou, China,Inst. of Nuclear-Agricultural Sciences, Zhejiang University, Hangzhou, China
| | - Shuo WANG
- Inst. of Biomedical Engineering, Hangzhou Dianzi University, Hangzhou, China
| | - Hong XU
- Inst. of Nuclear-Agricultural Sciences, Zhejiang University, Hangzhou, China,Corresponding Author:
| | - Yanyi WANG
- Inst. of Biomedical Engineering, Hangzhou Dianzi University, Hangzhou, China
| |
Collapse
|
2
|
Jin L, Zhu W, Yu Y, Kou C, Meng X, Tao Y, Guo J. Nonparametric tests of associations with disease based on U-statistics. Ann Hum Genet 2013; 78:141-53. [PMID: 24328673 DOI: 10.1111/ahg.12049] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2013] [Accepted: 09/01/2013] [Indexed: 11/25/2022]
Abstract
In case-control studies, association analysis was designed to test whether genetic variants were associated with human diseases. To evaluate the association, analysing one genetic marker at a time suffered from weak power, because of the correction for multiple testing and possibly small genetic effects. An alternative strategy was to test simultaneous effects of multiple markers, which was believed to be more powerful. However, when the number of markers under investigation was large, they would be subjected to weak power as well, because of the greater degrees of freedom. To conquer these limitations in case-control studies, we proposed a novel method that could test joint association of several loci (i.e. haplotype), with only a single degree of freedom. In this research, we developed a nonparametric approach, which was based on U-statistics. We also introduced a new kernel for U-statistic, which could combine the haplotype structure information, and was expected to enhance the power. Simulations indicated that our proposed approach offered merits in identifying the associations between diseases and haplotypes. Application of our method to a study of candidate genes for internalising disorder illustrated its virtue in utility and interpretation, and provided an excellent result in detecting the associations.
Collapse
Affiliation(s)
- Lina Jin
- Key Laboratory for Applied Statistics of MOE and School of Mathematics and Statistics, Northeast Normal University, Changchun, Jilin, 130024, China; School of Public Health, Jilin University, Changchun, Jilin, 130021, China
| | | | | | | | | | | | | |
Collapse
|
3
|
Lee D, Bacanu SA. Association testing strategy for data from dense marker panels. PLoS One 2013; 8:e80540. [PMID: 24265830 PMCID: PMC3827222 DOI: 10.1371/journal.pone.0080540] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2013] [Accepted: 10/14/2013] [Indexed: 01/31/2023] Open
Abstract
Genome wide association studies have been usually analyzed in a univariate manner. The commonly used univariate tests have one degree of freedom and assume an additive mode of inheritance. The experiment-wise significance of these univariate statistics is obtained by adjusting for multiple testing. Next generation sequencing studies, which assay 10-20 million variants, are beginning to come online. For these studies, the strategy of additive univariate testing and multiple testing adjustment is likely to result in a loss of power due to (1) the substantial multiple testing burden and (2) the possibility of a non-additive causal mode of inheritance. To reduce the power loss we propose: a new method (1) to summarize in a single statistic the strength of the association signals coming from all not-very-rare variants in a linkage disequilibrium block and (2) to incorporate, in any linkage disequilibrium block statistic, the strength of the association signals under multiple modes of inheritance. The proposed linkage disequilibrium block test consists of the sum of squares of nominally significant univariate statistics. We compare the performance of this method to the performance of existing linkage disequilibrium block/gene-based methods. Simulations show that (1) extending methods to combine testing for multiple modes of inheritance leads to substantial power gains, especially for a recessive mode of inheritance, and (2) the proposed method has a good overall performance. Based on simulation results, we provide practical advice on choosing suitable methods for applied analyses.
Collapse
Affiliation(s)
- Donghyung Lee
- Department of Psychiatry, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, Virginia, United States of America
- *E-mail:
| | - Silviu-Alin Bacanu
- Department of Psychiatry, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, Virginia, United States of America
| |
Collapse
|
4
|
Kussmann M, Morine MJ, Hager J, Sonderegger B, Kaput J. Perspective: a systems approach to diabetes research. Front Genet 2013; 4:205. [PMID: 24187547 PMCID: PMC3807566 DOI: 10.3389/fgene.2013.00205] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2013] [Accepted: 09/24/2013] [Indexed: 12/17/2022] Open
Abstract
We review here the status of human type 2 diabetes studies from a genetic, epidemiological, and clinical (intervention) perspective. Most studies limit analyses to one or a few omic technologies providing data of components of physiological processes. Since all chronic diseases are multifactorial and arise from complex interactions between genetic makeup and environment, type 2 diabetes mellitus (T2DM) is a collection of sub-phenotypes resulting in high fasting glucose. The underlying gene–environment interactions that produce these classes of T2DM are imperfectly characterized. Based on assessments of the complexity of T2DM, we propose a systems biology approach to advance the understanding of origin, onset, development, prevention, and treatment of this complex disease. This systems-based strategy is based on new study design principles and the integrated application of omics technologies: we pursue longitudinal studies in which each subject is analyzed at both homeostasis and after (healthy and safe) challenges. Each enrolled subject functions thereby as their own case and control and this design avoids assigning the subjects a priori to case and control groups based on limited phenotyping. Analyses at different time points along this longitudinal investigation are performed with a comprehensive set of omics platforms. These data sets are generated in a biological context, rather than biochemical compound class-driven manner, which we term “systems omics.”
Collapse
Affiliation(s)
- Martin Kussmann
- Nestlé Institute of Health Sciences SA Lausanne, Switzerland ; Faculty of Life Sciences, Ecole Polytechnique Fédérale Lausanne, Switzerland ; Faculty of Science, Aarhus University Aarhus, Denmark
| | | | | | | | | |
Collapse
|
5
|
Li Z, Zhang H. Analyzing Interaction of μ-, δ- and κ-opioid Receptor Gene Variants on Alcohol or Drug Dependence Using a Pattern Discovery-based Method. JOURNAL OF ADDICTION RESEARCH & THERAPY 2013; Suppl 7:007. [PMID: 24533225 PMCID: PMC3921888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
BACKGROUND Polymorphisms in the μ-, δ- and κ-opioid receptor genes (OPRM1, OPRD1 and OPRK1) have been reported to be associated with substance (alcohol or drug) dependence. The influence of an individual gene on a disease trait should be more evident when analyzed in the context of gene-gene interactions. Thus, we assessed the joint effect of variants in these three opioid receptor genes on alcohol, cocaine, or opioid dependence. METHODS Genotype data for 13 OPRM1 Single Nucleotide Polymorphisms (SNPs), 11 OPRD1 SNPs and seven OPRK1 SNPs were obtained from 382 European Americans (EAs) affected with substance dependence [among them, 318 with Alcohol Dependence (AD), 171 with Cocaine Dependence (CD), and 91 with Opioid Dependence (OD)] and 338 EA control subjects. We assessed the joint effect of OPRM1, OPRD1 and OPRK1 variants on AD, CD, or OD using a pattern discovery-based association test. Specific marker patterns (consisting of alleles of OPRM1, OPRD1 and OPRK1) that were significantly more frequent in AD, CD, or OD cases than in controls were identified. RESULTS 12 significant patterns in the AD dataset, four significant patterns in the CD dataset, and 18 significant patterns in the OD dataset were identified. Moreover, the significance of most marker patterns was due primarily to OPRM1 variants and, to a lesser degree, OPRD1 variants. CONCLUSION Our findings suggest that variation in the above three opioid receptor genes can jointly influence the vulnerability of individuals to alcohol or drug dependence. Evidence provided by this study also supports previous biological findings that the interaction of the three opioid receptors can modulate the action of opioid and non-opioid drugs and alcohol.
Collapse
Affiliation(s)
- Zhong Li
- Department of Computational Genetics, High Throughput Biology Inc., Summit, NJ, USA
| | - Huiping Zhang
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA,VA Connecticut Healthcare System, West Haven Campus, CT, USA,Corresponding author: Huiping Zhang, Department of Psychiatry, Yale University School of Medicine, VA Medical Center/116A2, 950 Campbell Avenue, West Haven, CT 06516, USA, Tel: (203) 932-5711 ext. 5245; Fax: (203) 937-4741;
| |
Collapse
|
6
|
Cornelis MC, Tchetgen EJT, Liang L, Qi L, Chatterjee N, Hu FB, Kraft P. Gene-environment interactions in genome-wide association studies: a comparative study of tests applied to empirical studies of type 2 diabetes. Am J Epidemiol 2012; 175:191-202. [PMID: 22199026 PMCID: PMC3261439 DOI: 10.1093/aje/kwr368] [Citation(s) in RCA: 94] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2010] [Accepted: 06/08/2011] [Indexed: 12/28/2022] Open
Abstract
The question of which statistical approach is the most effective for investigating gene-environment (G-E) interactions in the context of genome-wide association studies (GWAS) remains unresolved. By using 2 case-control GWAS (the Nurses' Health Study, 1976-2006, and the Health Professionals Follow-up Study, 1986-2006) of type 2 diabetes, the authors compared 5 tests for interactions: standard logistic regression-based case-control; case-only; semiparametric maximum-likelihood estimation of an empirical-Bayes shrinkage estimator; and 2-stage tests. The authors also compared 2 joint tests of genetic main effects and G-E interaction. Elevated body mass index was the exposure of interest and was modeled as a binary trait to avoid an inflated type I error rate that the authors observed when the main effect of continuous body mass index was misspecified. Although both the case-only and the semiparametric maximum-likelihood estimation approaches assume that the tested markers are independent of exposure in the general population, the authors did not observe any evidence of inflated type I error for these tests in their studies with 2,199 cases and 3,044 controls. Both joint tests detected markers with known marginal effects. Loci with the most significant G-E interactions using the standard, empirical-Bayes, and 2-stage tests were strongly correlated with the exposure among controls. Study findings suggest that methods exploiting G-E independence can be efficient and valid options for investigating G-E interactions in GWAS.
Collapse
Affiliation(s)
- Marilyn C Cornelis
- Department of Nutrition, Harvard School of Public Health, Boston, Massachusetts, USA.
| | | | | | | | | | | | | |
Collapse
|
7
|
Weinberg CR, Shi M, Umbach DM. A sibling-augmented case-only approach for assessing multiplicative gene-environment interactions. Am J Epidemiol 2011; 174:1183-9. [PMID: 22021562 DOI: 10.1093/aje/kwr231] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Family-based designs protect analyses of genetic effects from bias that is due to population stratification. Investigators have assumed that this robustness extends to assessments of gene-environment interaction. Unfortunately, this assumption fails for the common scenario in which the genotyped variant is related to risk through linkage with a causative allele. Bias also plagues other methods of assessment of gene-environment interaction. When testing against multiplicative joint effects, the case-only design offers excellent power, but it is invalid if genotype and exposure are correlated in the population. The authors describe 4 mechanisms that produce genotype-exposure dependence: exposure-related genetic population stratification, effects of family history on behavior, genotype effects on exposure, and selective attrition. They propose a sibling-augmented case-only (SACO) design that protects against the former 2 mechanisms and is therefore valid for studying young-onset disease in which genotype does not influence exposure. A SACO design allows the ascertainment of genotype and exposure for cases and exposure for 1 or more unaffected siblings selected randomly. Conditional logistic regression permits assessment of exposure effects and gene-environment interactions. Via simulations, the authors compare the likelihood-based inference on interactions using the SACO design with that based on other designs. They also show that robust analyses of interactions using tetrads or disease-discordant sibling pairs are equivalent to analyses using the SACO design.
Collapse
Affiliation(s)
- Clarice R Weinberg
- National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA.
| | | | | |
Collapse
|
8
|
Basu S, Pan W, Oetting WS. A dimension reduction approach for modeling multi-locus interaction in case-control studies. Hum Hered 2011; 71:234-45. [PMID: 21734407 DOI: 10.1159/000328842] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2010] [Accepted: 04/12/2011] [Indexed: 01/01/2023] Open
Abstract
Studying one locus or one single nucleotide polymorphism (SNP) at a time may not be sufficient to understand complex diseases because they are unlikely to result from the effect of only one SNP. Each SNP alone may have little or no effect on the risk of the disease, but together they may increase the risk substantially. Analyses focusing on individual SNPs ignore the possibility of interaction among SNPs. In this paper, we propose a parsimonious model to assess the joint effect of a group of SNPs in a case-control study. The model implements a data reduction strategy within a likelihood framework and uses a test to assess the statistical significance of the effect of the group of SNPs on the binary trait. The primary advantage of the proposed approach is that the dimension reduction technique produces a test statistic with degrees of freedom significantly lower than a multiple logistic regression with only main effects of the SNPs, and our parsimonious model can incorporate the possibility of interaction among the SNPs. Moreover, the proposed approach estimates the direction of association of each SNP with the disease and provides an estimate of the average effect of the group of SNPs positively and negatively associated with the disease in the given SNP set. We illustrate the proposed model on simulated and real data, and compare its performance with a few other existing approaches. Our proposed approach appeared to outperform the other approaches for independent SNPs in our simulation studies.
Collapse
Affiliation(s)
- Saonli Basu
- Division of Biostatistics, University of Minnesota, Minneapolis, USA. saonli @ umn.edu
| | | | | |
Collapse
|
9
|
Nicodemus KK, Callicott JH, Higier RG, Luna A, Nixon DC, Lipska BK, Vakkalanka R, Giegling I, Rujescu D, St Clair D, Muglia P, Shugart YY, Weinberger DR. Evidence of statistical epistasis between DISC1, CIT and NDEL1 impacting risk for schizophrenia: biological validation with functional neuroimaging. Hum Genet 2011; 127:441-52. [PMID: 20084519 DOI: 10.1007/s00439-009-0782-y] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2009] [Accepted: 12/24/2009] [Indexed: 02/05/2023]
Abstract
The etiology of schizophrenia likely involves genetic interactions. DISC1, a promising candidate susceptibility gene, encodes a protein which interacts with many other proteins, including CIT, NDEL1, NDE1, FEZ1 and PAFAH1B1, some of which also have been associated with psychosis. We tested for epistasis between these genes in a schizophrenia case-control study using machine learning algorithms (MLAs: random forest, generalized boosted regression andMonteCarlo logic regression). Convergence of MLAs revealed a subset of seven SNPs that were subjected to 2-SNP interaction modeling using likelihood ratio tests for nested unconditional logistic regression models. Of the 7C2 = 21 interactions, four were significant at the α = 0.05 level: DISC1 rs1411771-CIT rs10744743 OR = 3.07 (1.37, 6.98) p = 0.007; CIT rs3847960-CIT rs203332 OR = 2.90 (1.45, 5.79) p = 0.003; CIT rs3847960-CIT rs440299 OR = 2.16 (1.04, 4.46) p = 0.038; one survived Bonferroni correction (NDEL1 rs4791707-CIT rs10744743 OR = 4.44 (2.22, 8.88) p = 0.00013). Three of four interactions were validated via functional magnetic resonance imaging (fMRI) in an independent sample of healthy controls; risk associated alleles at both SNPs predicted prefrontal cortical inefficiency during the N-back task, a schizophrenia-linked intermediate biological phenotype: rs3847960-rs440299; rs1411771-rs10744743, rs4791707-rs10744743 (SPM5 p < 0.05, corrected), although we were unable to statistically replicate the interactions in other clinical samples. Interestingly, the CIT SNPs are proximal to exons that encode theDISC1 interaction domain. In addition, the 3' UTR DISC1 rs1411771 is predicted to be an exonic splicing enhancer and the NDEL1 SNP is ~3,000 bp from the exon encoding the region of NDEL1 that interacts with the DISC1 protein, giving a plausible biological basis for epistasis signals validated by fMRI.
Collapse
Affiliation(s)
- Kristin K Nicodemus
- Genes, Cognition and Psychosis Program, Intramural Research Program, National Institute of Mental Health, National Institutes of Health, Bethesda, MD 20892, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Shriner D, Vaughan LK. A unified framework for multi-locus association analysis of both common and rare variants. BMC Genomics 2011; 12:89. [PMID: 21281506 PMCID: PMC3040731 DOI: 10.1186/1471-2164-12-89] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2010] [Accepted: 01/31/2011] [Indexed: 11/10/2022] Open
Abstract
Background Common, complex diseases are hypothesized to result from a combination of common and rare genetic variants. We developed a unified framework for the joint association testing of both types of variants. Within the framework, we developed a union-intersection test suitable for genome-wide analysis of single nucleotide polymorphisms (SNPs), candidate gene data, as well as medical sequencing data. The union-intersection test is a composite test of association of genotype frequencies and differential correlation among markers. Results We demonstrated by computer simulation that the false positive error rate was controlled at the expected level. We also demonstrated scenarios in which the multi-locus test was more powerful than traditional single marker analysis. To illustrate use of the union-intersection test with real data, we analyzed a publically available data set of 319,813 autosomal SNPs genotyped for 938 cases of Parkinson disease and 863 neurologically normal controls for which no genome-wide significant results were found by traditional single marker analysis. We also analyzed an independent follow-up sample of 183 cases and 248 controls for replication. Conclusions We identified a single risk haplotype with a directionally consistent effect in both samples in the gene GAK, which is involved in clathrin-mediated membrane trafficking. We also found suggestive evidence that directionally inconsistent marginal effects from single marker analysis appeared to result from risk being driven by different haplotypes in the two samples for the genes SYN3 and NGLY1, which are involved in neurotransmitter release and proteasomal degradation, respectively. These results illustrate the utility of our unified framework for genome-wide association analysis of common, complex diseases.
Collapse
Affiliation(s)
- Daniel Shriner
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, Bethesda, MD 20892, USA.
| | | |
Collapse
|
11
|
Longmate JA, Larson GP, Krontiris TG, Sommer SS. Three ways of combining genotyping and resequencing in case-control association studies. PLoS One 2010; 5:e14318. [PMID: 21187953 PMCID: PMC3004857 DOI: 10.1371/journal.pone.0014318] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2010] [Accepted: 11/15/2010] [Indexed: 11/18/2022] Open
Abstract
We describe three statistical results that we have found to be useful in case-control genetic association testing. All three involve combining the discovery of novel genetic variants, usually by sequencing, with genotyping methods that recognize previously discovered variants. We first consider expanding the list of known variants by concentrating variant-discovery in cases. Although the naive inclusion of cases-only sequencing data would create a bias, we show that some sequencing data may be retained, even if controls are not sequenced. Furthermore, for alleles of intermediate frequency, cases-only sequencing with bias-correction entails little if any loss of power, compared to dividing the same sequencing effort among cases and controls. Secondly, we investigate more strongly focused variant discovery to obtain a greater enrichment for disease-related variants. We show how case status, family history, and marker sharing enrich the discovery set by increments that are multiplicative with penetrance, enabling the preferential discovery of high-penetrance variants. A third result applies when sequencing is the primary means of counting alleles in both cases and controls, but a supplementary pooled genotyping sample is used to identify the variants that are very rare. We show that this raises no validity issues, and we evaluate a less expensive and more adaptive approach to judging rarity, based on group-specific variants. We demonstrate the important and unusual caveat that this method requires equal sample sizes for validity. These three results can be used to more efficiently detect the association of rare genetic variants with disease.
Collapse
Affiliation(s)
- Jeffrey A Longmate
- Division of Biostatistics, City of Hope, Duarte, California, United States of America.
| | | | | | | |
Collapse
|
12
|
Wu J, Devlin B, Ringquist S, Trucco M, Roeder K. Screen and clean: a tool for identifying interactions in genome-wide association studies. Genet Epidemiol 2010; 34:275-85. [PMID: 20088021 PMCID: PMC2915560 DOI: 10.1002/gepi.20459] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Epistasis could be an important source of risk for disease. How interacting loci might be discovered is an open question for genome-wide association studies (GWAS). Most researchers limit their statistical analyses to testing individual pairwise interactions (i.e., marginal tests for association). A more effective means of identifying important predictors is to fit models that include many predictors simultaneously (i.e., higher-dimensional models). We explore a procedure called screen and clean (SC) for identifying liability loci, including interactions, by using the lasso procedure, which is a model selection tool for high-dimensional regression. We approach the problem by using a varying dictionary consisting of terms to include in the model. In the first step the lasso dictionary includes only main effects. The most promising single-nucleotide polymorphisms (SNPs) are identified using a screening procedure. Next the lasso dictionary is adjusted to include these main effects and the corresponding interaction terms. Again, promising terms are identified using lasso screening. Then significant terms are identified through the cleaning process. Implementation of SC for GWAS requires algorithms to explore the complex model space induced by the many SNPs genotyped and their interactions. We propose and explore a set of algorithms and find that SC successfully controls Type I error while yielding good power to identify risk loci and their interactions. When the method is applied to data obtained from the Wellcome Trust Case Control Consortium study of Type 1 Diabetes it uncovers evidence supporting interaction within the HLA class II region as well as within Chromosome 12q24.
Collapse
Affiliation(s)
- Jing Wu
- Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213
| | - Bernie Devlin
- Department of Psychiatry University of Pittsburgh School of Medicine Pittsburgh, PA 15213
| | - Steven Ringquist
- Division of Immunogenetics Department of Pediatrics Children’s Hospital of Pittsburgh of UPMC Pittsburgh, PA 15201
| | - Massimo Trucco
- Division of Immunogenetics Department of Pediatrics Children’s Hospital of Pittsburgh of UPMC Pittsburgh, PA 15201
| | - Kathryn Roeder
- Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213
| |
Collapse
|
13
|
Kim S, Morris NJ, Won S, Elston RC. Single-marker and two-marker association tests for unphased case-control genotype data, with a power comparison. Genet Epidemiol 2010; 34:67-77. [PMID: 19557751 DOI: 10.1002/gepi.20436] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
In case-control single nucleotide polymorphism (SNP) data, the allele frequency, Hardy Weinberg Disequilibrium, and linkage disequilibrium (LD) contrast tests are three distinct sources of information about genetic association. While all three tests are typically developed in a retrospective context, we show that prospective logistic regression models may be developed that correspond conceptually to the retrospective tests. This approach provides a flexible framework for conducting a systematic series of association analyses using unphased genotype data and any number of covariates. For a single stage study, two single-marker tests and four two-marker tests are discussed. The true association models are derived and they allow us to understand why a model with only a linear term will generally fit well for a SNP in weak LD with a causal SNP, whatever the disease model, but not for a SNP in high LD with a non-additive disease SNP. We investigate the power of the association tests using real LD parameters from chromosome 11 in the HapMap CEU population data. Among the single-marker tests, the allelic test has on average the most power in the case of an additive disease, but for dominant, recessive, and heterozygote disadvantage diseases, the genotypic test has the most power. Among the four two-marker tests, the Allelic-LD contrast test, which incorporates linear terms for two markers and their interaction term, provides the most reliable power overall for the cases studied. Therefore, our result supports incorporating an interaction term as well as linear terms in multi-marker tests.
Collapse
Affiliation(s)
- Sulgi Kim
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH 44106-7281, USA
| | | | | | | |
Collapse
|
14
|
Tomoyasu Y, Yamaguchi T, Tajima A, Nakajima T, Inoue I, Maki K. Further evidence for an association between mandibular height and the growth hormone receptor gene in a Japanese population. Am J Orthod Dentofacial Orthop 2009; 136:536-41. [DOI: 10.1016/j.ajodo.2007.10.054] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2007] [Revised: 10/01/2007] [Accepted: 10/01/2007] [Indexed: 01/01/2023]
|
15
|
Lindström S, Yen YC, Spiegelman D, Kraft P. The impact of gene-environment dependence and misclassification in genetic association studies incorporating gene-environment interactions. Hum Hered 2009; 68:171-81. [PMID: 19521099 DOI: 10.1159/000224637] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2008] [Accepted: 02/17/2009] [Indexed: 11/19/2022] Open
Abstract
The possibility of gene-environment interaction can be exploited to identify genetic variants associated with disease using a joint test of genetic main effect and gene-environment interaction. We consider how exposure misclassification and dependence between the true exposure E and the tested genetic variant G affect this joint test in absolute terms and relative to three other tests: the marginal test (G), the standard test for multiplicative gene-environment interaction (GE), and the case-only test for interaction (GE-CO). All tests can have inflated Type I error rate when E and G are correlated in the underlying population. For the GE and G-GE tests this inflation is only noticeable when the gene-environment dependence is unusually strong; the inflation can be large for the GE-CO test even for modest correlation. The joint G-GE test has greater power than the GE test generally, and greater power than the G test when there is no genetic main effect and the measurement error is small to moderate. The joint G-GE test is an attractive test for assessing genetic association when there is limited knowledge about casual mechanisms a priori, even in the presence of misclassification in environmental exposure measurement and correlation between exposure and genetic variants.
Collapse
Affiliation(s)
- Sara Lindström
- Department of Epidemiology, Harvard School of Public Health, Boston, Mass, USA.
| | | | | | | |
Collapse
|
16
|
Fry AE, Ghansa A, Small KS, Palma A, Auburn S, Diakite M, Green A, Campino S, Teo YY, Clark TG, Jeffreys AE, Wilson J, Jallow M, Sisay-Joof F, Pinder M, Griffiths MJ, Peshu N, Williams TN, Newton CR, Marsh K, Molyneux ME, Taylor TE, Koram KA, Oduro AR, Rogers WO, Rockett KA, Sabeti PC, Kwiatkowski DP. Positive selection of a CD36 nonsense variant in sub-Saharan Africa, but no association with severe malaria phenotypes. Hum Mol Genet 2009; 18:2683-92. [PMID: 19403559 PMCID: PMC2701331 DOI: 10.1093/hmg/ddp192] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
The prevalence of CD36 deficiency in East Asian and African populations suggests that the causal variants are under selection by severe malaria. Previous analysis of data from the International HapMap Project indicated that a CD36 haplotype bearing a nonsense mutation (T1264G; rs3211938) had undergone recent positive selection in the Yoruba of Nigeria. To investigate the global distribution of this putative selection event, we genotyped T1264G in 3420 individuals from 66 populations. We confirmed the high frequency of 1264G in the Yoruba (26%). However, the 1264G allele is less common in other African populations and absent from all non-African populations without recent African admixture. Using long-range linkage disequilibrium, we studied two West African groups in depth. Evidence for recent positive selection at the locus was demonstrable in the Yoruba, although not in Gambians. We screened 70 variants from across CD36 for an association with severe malaria phenotypes, employing a case–control study of 1350 subjects and a family study of 1288 parent–offspring trios. No marker was significantly associated with severe malaria. We focused on T1264G, genotyping 10 922 samples from four African populations. The nonsense allele was not associated with severe malaria (pooled allelic odds ratio 1.0; 95% confidence interval 0.89–1.12; P = 0.98). These results suggest a range of possible explanations including the existence of alternative selection pressures on CD36, co-evolution between host and parasite or confounding caused by allelic heterogeneity of CD36 deficiency.
Collapse
Affiliation(s)
- Andrew E Fry
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford OX3 7BN, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Clark TG, Fry AE, Auburn S, Campino S, Diakite M, Green A, Richardson A, Teo YY, Small K, Wilson J, Jallow M, Sisay-Joof F, Pinder M, Sabeti P, Kwiatkowski DP, Rockett KA. Allelic heterogeneity of G6PD deficiency in West Africa and severe malaria susceptibility. Eur J Hum Genet 2009; 17:1080-5. [PMID: 19223928 DOI: 10.1038/ejhg.2009.8] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Several lines of evidence link glucose-6-phosphate dehydrogenase (G6PD) deficiency to protection from severe malaria. Early reports suggested most G6PD deficiency in sub-Saharan Africa was because of the 202A/376G G6PD A- allele, and recent association studies of G6PD deficiency have employed genotyping as a convenient way to determine enzyme status. However, further work has suggested that other G6PD deficiency alleles are relatively common in some regions of West Africa. To investigate the consequences of unrecognized allelic heterogeneity on association studies, in particular studies of G6PD deficiency and malaria, we carried out a case-control analysis of 2488 Gambian children with severe malaria and 3875 controls. No significant association was found between severe malaria and the 202A/376G G6PD A- allele when analyzed alone, but pooling 202A/376G with other deficiency alleles revealed the signal of protection (male odds ratio (OR) 0.77, 95% CI 0.62-0.95, P=0.016; female OR 0.71, 95% CI 0.56-0.89, P=0.004). We have identified the 968C mutation as the most common G6PD A- allele in The Gambia. Our results highlight some of the consequences of allelic heterogeneity, particularly the increased type I error. They also suggest that G6PD-deficient male hemizygotes and female heterozygotes are protected from severe malaria.
Collapse
Affiliation(s)
- Taane G Clark
- Wellcome Trust Centre for Human Genetics, University of Oxford, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Li M, Li C. Assessing departure from Hardy-Weinberg equilibrium in the presence of disease association. Genet Epidemiol 2009; 32:589-99. [PMID: 18449919 DOI: 10.1002/gepi.20335] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Assessing Hardy-Weinberg equilibrium (HWE) is often employed as an important initial step for genotype data quality checking in genetics studies. Tests for HWE often assume that the genotypes are randomly sampled from the general population. However, in many human genetics studies, subjects are ascertained through their disease status, and affected individuals (and their relatives in family-based studies) are overly represented in the ascertained sample than in the general population. As a result, when a marker is associated with the disease, the type I error rate in the HWE tests can be inflated, leading to false exclusion of associated markers from future analysis. Here we develop a general likelihood framework that allows assessment of departure from HWE while taking into account potential association with the disease. Our method can differentiate HWE departure caused by disease association from departure caused by other reasons, such as genotyping errors. The framework can be used for various data structures, including unrelated cases and controls, nuclear families with one or more offspring, or a mixture of them. The type I error rate of our test is under control for a broad range of scenarios. For case-control data, compared to the traditional HWE test that uses only controls, our test is more powerful to detect HWE departure for common diseases and has comparable power for rare diseases. For case-parents trios, our test is more powerful than the traditional HWE test that uses parents only.
Collapse
Affiliation(s)
- Mingyao Li
- Department of Biostatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104, USA.
| | | |
Collapse
|
19
|
Bacanu SA, Nelson MR, Ehm MG. Comparison of association methods for dense marker data. Genet Epidemiol 2008; 32:791-9. [DOI: 10.1002/gepi.20347] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
20
|
Cheung CL, Sham PC, Chan V, Paterson AD, Luk KDK, Kung AWC. Identification of LTBP2 on chromosome 14q as a novel candidate gene for bone mineral density variation and fracture risk association. J Clin Endocrinol Metab 2008; 93:4448-55. [PMID: 18697872 DOI: 10.1210/jc.2007-2836] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
CONTEXT Low bone mineral density (BMD) is a major risk factor for osteoporotic fracture. Chromosome 14q has previously been linked to BMD variation in several genome-wide linkage scans in Caucasian populations. OBJECTIVE Our objective was to replicate and identify the novel candidate genes in the quantitative trait loci (QTL) at chromosome 14q QTL. SUBJECTS AND METHODS Eighteen microsatellite markers were genotyped for a 117-cM interval in 306 Southern Chinese pedigrees with 1459 subjects. Successful replication of the QTL was confirmed within this region for trochanter and total hip BMD. Using a gene prioritization approach as implemented in the Endeavour program, we genotyped 65 single-nucleotide polymorphisms in the top five ranking candidate genes within the linkage peak in 706 and 760 case-control subject pairs with extremely high and low trochanter and total hip BMD, respectively. RESULTS Single-marker and haplotype analyses revealed that ESR2 and latent TGF-beta binding protein 2 (LTBP2) had significant associations with trochanter and total hip BMD. Multiple logistic regression revealed a strong genetic association between LTBP2 gene locus and total hip BMD variation (P=0.0004) and prevalent fracture (P=0.01). Preliminary in vitro study showed differential expression of LTBP2 gene in MC3T3-E1 mouse preosteoblastic cells in culture. CONCLUSIONS Apart from ESR2, LTBP2 is a novel positional candidate gene in chromosome 14q QTL for BMD variation and fracture.
Collapse
Affiliation(s)
- Ching-Lung Cheung
- Department of Medicine, The University of Hong Kong, Pokfulam, Hong Kong, China
| | | | | | | | | | | |
Collapse
|
21
|
Wei Z, Li M, Rebbeck T, Li H. U-statistics-based tests for multiple genes in genetic association studies. Ann Hum Genet 2008; 72:821-33. [PMID: 18691161 DOI: 10.1111/j.1469-1809.2008.00473.x] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
As our understanding of biological pathways and the genes that regulate these pathways increases, consideration of these biological pathways has become an increasingly important part of genetic and molecular epidemiology. Pathway-based genetic association studies often involve genotyping of variants in genes acting in certain biological pathways. Such pathway-based genetic association studies can potentially capture the highly heterogeneous nature of many complex traits, with multiple causative loci and multiple alleles at some of the causative loci. In this paper, we develop two nonparametric test statistics that consider simultaneously the effects of multiple markers. Our approach, which is based on data-adaptive U-statistics, can handle both qualitative data such as case-control data and quantitative continuous phenotype data. Simulations demonstrate that our proposed methods are more powerful than standard methods, especially when there are multiple risk loci each with small genetic effects. When the number of disease-predisposing genes is small, the data-adaptive weighting of the U-statistics over all the markers produces similar power to commonly used single marker tests. We further illustrate the potential merits of our proposed tests in the analysis of a data set from a pathway-based candidate gene association study of breast cancer and hormone metabolism pathways. Finally, potential applications of the proposed tests to genome-wide association studies are also discussed.
Collapse
Affiliation(s)
- Zhi Wei
- Department of Computer Science, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA
| | | | | | | |
Collapse
|
22
|
Fisher SA, Lewis CM. Power of genetic association studies in the presence of linkage disequilibrium and allelic heterogeneity. Hum Hered 2008; 66:210-22. [PMID: 18612206 DOI: 10.1159/000143404] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2007] [Accepted: 10/30/2007] [Indexed: 12/21/2022] Open
Abstract
OBJECTIVES The calculation of the power and sample size required for association studies is essential, particularly for follow-up of genome-wide association studies, where much genotyping is required to replicate the original finding and identify the true disease susceptibility mutation. METHODS In this paper, we derive equations for estimation of sample sizes for the transmission disequilibrium test (TDT) and for case-control studies, in the presence of allelic heterogeneity and indirect association - where the genotyped tagging SNP is in linkage disequilibrium (LD) with the true mutation. Using data from NOD2 and PTPN22, we show that the true sample sizes required to detect association may be incorrect when calculated under the assumption of a single mutation and complete LD with the genotyped marker. RESULTS The true sample sizes may be lower when allelic heterogeneity acts in a recessive model across mutations, or increased when mutations lie on different alleles of a common tagging SNP. CONCLUSION Calculating power and sample size under a range of realistic models of LD and allelic heterogeneity is essential to ensure that association studies have sufficient power to detect mutations.
Collapse
Affiliation(s)
- Sheila A Fisher
- Division of Genetics and Molecular Medicine, Institute of Psychiatry, King's College London, London, UK
| | | |
Collapse
|
23
|
Abstract
Sudden cardiac arrest (SCA) due to ventricular arrhythmias is a major cause of mortality in western populations with up to 450,000 deaths in the United States each year. Although environmental factors clearly contribute to the determinants of SCA, familial aggregation studies and advances in the molecular genetics of inherited arrhythmias suggest that genetic factors confer susceptibility to SCA in the general population. Research in this area typically has focused on association of common genetic variants with intermediate phenotypes that predispose to SCA risk, such as QT interval, but few studies have examined genetic risk factors for SCA. We review the evidence for genetic susceptibility to SCA in the general population and focus on the studies published to date that have explored genetic risk factors.
Collapse
|
24
|
Grinyó J, Vanrenterghem Y, Nashan B, Vincenti F, Ekberg H, Lindpaintner K, Rashford M, Nasmyth-Miller C, Voulgari A, Spleiss O, Truman M, Essioux L. Association of four DNA polymorphisms with acute rejection after kidney transplantation. Transpl Int 2008; 21:879-91. [PMID: 18444945 DOI: 10.1111/j.1432-2277.2008.00679.x] [Citation(s) in RCA: 93] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
Renal transplant outcomes exhibit large inter-individual variability, possibly on account of genetic variation in immune-response mediators and genes influencing the pharmacodynamics/pharmacokinetics of immunosuppressants. We examined 21 polymorphisms from 10 genes in 237 de novo renal transplant recipients participating in an open-label, multicenter study [Cyclosporine Avoidance Eliminates Serious Adverse Renal-toxicity (CAESAR)] investigating renal function and biopsy-proven acute rejection (BPAR) with different cyclosporine A regimens and mycophenolate mofetil. Genes were selected for their immune response and pharmacodynamic/pharmacokinetic relevance and were tested for association with BPAR. Four polymorphisms were significantly associated with BPAR. The ABCB1 2677T allele tripled the odds of developing BPAR (OR: 3.16, 95% CI [1.50-6.67]; P=0.003), as did the presence of at least one IMPDH2 3757C allele (OR: 3.39, 95% CI [1.42-8.09]; P=0.006). BPAR was almost fivefold more likely in patients homozygous for IL-10 -592A (OR: 4.71, 95% CI [1.52-14.55]; P=0.007) and twice as likely in patients with at least one A allele of TNF-alpha G-308A (OR: 2.18, 95% CI [1.08-4.41]; P=0.029). There were no statistically significant interactions between polymorphisms, or the different treatment regimens. Variation in genes of immune response and pharmacodynamic/pharmacokinetic relevance may be important in understanding acute rejection after renal transplant.
Collapse
Affiliation(s)
- Josep Grinyó
- Department of Nephrology, Hospital de Bellvitge, University of Barcelona, Barcelona, Spain.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Li C, Zhang G, Li X, Rao S, Gong B, Jiang W, Hao D, Wu P, Wu C, Du L, Xiao Y, Wang Y. A systematic method for mapping multiple loci: An application to construct a genetic network for rheumatoid arthritis. Gene 2008; 408:104-11. [DOI: 10.1016/j.gene.2007.10.028] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2007] [Revised: 10/01/2007] [Accepted: 10/19/2007] [Indexed: 01/04/2023]
|
26
|
Rice TK, Schork NJ, Rao D. Methods for Handling Multiple Testing. GENETIC DISSECTION OF COMPLEX TRAITS 2008; 60:293-308. [DOI: 10.1016/s0065-2660(07)00412-9] [Citation(s) in RCA: 127] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
|
27
|
Power to detect risk alleles using genome-wide tag SNP panels. PLoS Genet 2007; 3:1827-37. [PMID: 17922574 PMCID: PMC2000969 DOI: 10.1371/journal.pgen.0030170] [Citation(s) in RCA: 81] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2007] [Accepted: 08/21/2007] [Indexed: 12/02/2022] Open
Abstract
Advances in high-throughput genotyping and the International HapMap Project have enabled association studies at the whole-genome level. We have constructed whole-genome genotyping panels of over 550,000 (HumanHap550) and 650,000 (HumanHap650Y) SNP loci by choosing tag SNPs from all populations genotyped by the International HapMap Project. These panels also contain additional SNP content in regions that have historically been overrepresented in diseases, such as nonsynonymous sites, the MHC region, copy number variant regions and mitochondrial DNA. We estimate that the tag SNP loci in these panels cover the majority of all common variation in the genome as measured by coverage of both all common HapMap SNPs and an independent set of SNPs derived from complete resequencing of genes obtained from SeattleSNPs. We also estimate that, given a sample size of 1,000 cases and 1,000 controls, these panels have the power to detect single disease loci of moderate risk (λ ∼ 1.8–2.0). Relative risks as low as λ ∼ 1.1–1.3 can be detected using 10,000 cases and 10,000 controls depending on the sample population and disease model. If multiple loci are involved, the power increases significantly to detect at least one locus such that relative risks 20%–35% lower can be detected with 80% power if between two and four independent loci are involved. Although our SNP selection was based on HapMap data, which is a subset of all common SNPs, these panels effectively capture the majority of all common variation and provide high power to detect risk alleles that are not represented in the HapMap data. Advances in high-throughput genotyping technology and the International HapMap Project have enabled genetic association studies at the whole-genome level. Our paper describes two genome-wide SNP panels that contain tag SNPs derived from the International HapMap Project. Tag SNPs are proxies for groups of highly correlated SNPs. Information can be captured for the entire group of correlated SNPs by genotyping only one representative SNP, the tag SNP. These whole-genome SNP panels also contain additional content thought to be overrepresented in disease, such as amino acid–changing nonsynonymous SNPs and mitochondrial SNPs. We show that these panels cover the genome with very high efficiency as measured by coverage of all HapMap SNPs and a set of SNPs derived from completely resequenced genes from the Seattle SNPs database. We also show that these panels have high power to detect disease risk alleles for both HapMap and non-HapMap SNPs. In complex disease where multiple risk alleles are believed to be involved, we show that the ability to detect at least one risk allele with the tag SNP panels is also high.
Collapse
|
28
|
Saito YA, Locke GR, Zimmerman JM, Holtmann G, Slusser JP, de Andrade M, Petersen GM, Talley NJ. A genetic association study of 5-HTT LPR and GNbeta3 C825T polymorphisms with irritable bowel syndrome. Neurogastroenterol Motil 2007; 19:465-70. [PMID: 17564628 DOI: 10.1111/j.1365-2982.2007.00905.x] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
A pharmacogenetic study suggests the 5-HTT LPR polymorphism predicts response to alosetron, and another study describes a possible association of the GNbeta3 C825T polymorphism with IBS in patients with dyspepsia. We performed a case-control association study to determine whether these polymorphisms are associated with irritable bowel syndrome (IBS). The study aim was to compare allele and genotype frequencies between cases and controls for the 5-HTT LPR and the GNbeta3 C825T polymorphism. Cases were 50 GI outpatients; controls were 53 General Medicine outpatients matched to cases for age, gender and race at a major medical centre. Participants completed a questionnaire and donated blood. DNA was genotyped using polymerase chain reaction based assays. Eighty-two per cent of cases met Rome II criteria for IBS: 12% constipation-, 46% diarrhoea-, and 42% mixed-IBS. Genotype and allele frequencies for both polymorphisms did not differ between cases and controls. However, the allele frequency of the short (S) allele of the 5-HTT LPR polymorphism was greater in those with mixed-IBS compared with controls (68%vs 45%, P < 0.05). This study suggests that the 5-HTT LPR polymorphism may be associated with mixed-IBS, but not IBS overall. No association was observed for the GNbeta3 C825T polymorphism with IBS overall or subtypes.
Collapse
Affiliation(s)
- Y A Saito
- Division of Gastroenterology and Hepatology, Mayo Clinic College of Medicine, Rochester, MN 55905, USA.
| | | | | | | | | | | | | | | |
Collapse
|
29
|
Chen J, Yu K, Hsing A, Therneau TM. A partially linear tree-based regression model for assessing complex joint gene-gene and gene-environment effects. Genet Epidemiol 2007; 31:238-51. [PMID: 17266115 DOI: 10.1002/gepi.20205] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The success of genetic dissection of complex diseases may greatly benefit from judicious exploration of joint gene effects, which, in turn, critically depends on the power of statistical tools. Standard regression models are convenient for assessing main effects and low-order gene-gene interactions but not for exploring complex higher-order interactions. Tree-based methodology is an attractive alternative for disentangling possible interactions, but it has difficulty in modeling additive main effects. This work proposes a new class of semiparametric regression models, termed partially linear tree-based regression (PLTR) models, which exhibit the advantages of both generalized linear regression and tree models. A PLTR model quantifies joint effects of genes and other risk factors by a combination of linear main effects and a non-parametric tree -structure. We propose an iterative algorithm to fit the PLTR model, and a unified resampling approach for identifying and testing the significance of the optimal "pruned" tree nested within the tree resultant from the fitting algorithm. Simulation studies showed that the resampling procedure maintained the correct type I error rate. We applied the PLTR model to assess the association between biliary stone risk and 53 single nucleotide polymorphisms (SNPs) in the inflammation pathway in a population-based case-control study. The analysis yielded an interesting parsimonious summary of the joint effect of all SNPs. The proposed model is also useful for exploring gene-environment interactions and has broad implications for applying the tree methodology to genetic epidemiology research.
Collapse
Affiliation(s)
- Jinbo Chen
- Department of Biostatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104, USA.
| | | | | | | |
Collapse
|
30
|
Guedj M, Della-Chiesa E, Picard F, Nuel G. Computing power in case-control association studies through the use of quadratic approximations: application to meta-statistics. Ann Hum Genet 2007; 71:262-70. [PMID: 17032289 DOI: 10.1111/j.1469-1809.2006.00316.x] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
In the framework of case-control studies many different test statistics are available to measure the association of a marker with a given disease. Nevertheless, choosing one particular statistic can lead to very different conclusions. In the absence of a consensus for this choice, a tempting option is to evaluate the power of these different statistics prior to make any decision. We review the available methods dedicated to power computation and assess their respective reliability in treating a wide range of tests on a wide range of alternative models. Considering Monte-Carlo, non-central chi-square and Delta-Method estimates, we evaluate empirical, asymptotic and numerical approaches. Additionally we introduce the use of the Delta-Method, extended to order 2, intended to provide better results than the traditional order-1 Delta-Method. Supplementary data can be found at: http://stat.genopole.cnrs.fr/software/dm2.
Collapse
Affiliation(s)
- M Guedj
- Laboratoire Statistique et Genome, 523 place des terrasses de l'Agora, 91000 Evry, France.
| | | | | | | |
Collapse
|
31
|
Methodological aspects of the assessment of gene-nutrient interactions at the population level. Nutr Metab Cardiovasc Dis 2007; 17:82-8. [PMID: 17306733 DOI: 10.1016/j.numecd.2006.01.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/16/2005] [Revised: 01/04/2006] [Accepted: 01/09/2006] [Indexed: 12/21/2022]
Abstract
Nutritional-related diseases are the result of complex interactions between genes and diet. The understanding of these interactions will provide the rationale for dietary interventions based on the individual's genetic constitution. However, the approach to this kind of study is not easy, the complexity of the interactions increasing exponentially the dimensionality of the problem. The aim of this review is to analyze the major problems that arise in approaching complex interactions at the population level. Furthermore, several statistical tools available for this type of analysis are discussed. In conclusion, although analytic techniques able to reduce the dimensionality of the problem are suggested, sample size requirement seems to remain an inescapable challenge for the researcher. A synergy between traditional and nontraditional statistical approaches could be useful.
Collapse
|
32
|
Malison RT, Kranzler HR, Yang BZ, Gelernter J. Human clock, PER1 and PER2 polymorphisms: lack of association with cocaine dependence susceptibility and cocaine-induced paranoia. Psychiatr Genet 2007; 16:245-9. [PMID: 17106427 DOI: 10.1097/01.ypg.0000242198.59020.ca] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Considerable research points to the importance of genetic mechanisms in psychostimulant addiction. Behavioral sensitization, a well-documented response to repeated stimulant exposure, may be mechanistically important in clinical features of the disorder, including escalating patterns of drug use, craving and drug-induced paranoia. Basic studies in both Drosophila melanogaster and mice have suggested the importance of circadian rhythm genes in locomotor sensitization and reward. The primary objective of the current study was to assess the potential involvement of three human orthologs (CLOCK, PER1 and PER2) in clinical phenotypes of the disorder. Allelic associations of three single nucleotide polymorphisms (SNPs) were assessed for both cocaine dependence and cocaine-induced paranoia in 186 cases and 273 controls. Potential population stratification biases were controlled for by means of within-population comparisons, and by structured association methods (using all populations). No differences in allele frequencies were found for any of the three single nucleotide polymorphisms studied between cocaine dependent and control subjects or between paranoid and nonparanoid cocaine users. These results do not support the involvement of genetic variation in these three circadian gene SNPs for influencing risks for either of these cocaine phenotypes.
Collapse
Affiliation(s)
- Robert T Malison
- Department of Psychiatry, Division of Human Genetics, Yale University School of Medicine, New Haven, Connecticut 06519, USA.
| | | | | | | |
Collapse
|
33
|
Kraft P, Yen YC, Stram DO, Morrison J, Gauderman WJ. Exploiting gene-environment interaction to detect genetic associations. Hum Hered 2007; 63:111-9. [PMID: 17283440 DOI: 10.1159/000099183] [Citation(s) in RCA: 327] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Complex disease by definition results from the interplay of genetic and environmental factors. However, it is currently unclear how gene-environment interaction can best be used to locate complex disease susceptibility loci, particularly in the context of studies where between 1,000 and 1,000,000 markers are scanned for association with disease. We present a joint test of marginal association and gene-environment interaction for case-control data. We compare the power and sample size requirements of this joint test to other analyses: the marginal test of genetic association, the standard test for gene-environment interaction based on logistic regression, and the case-only test for interaction that exploits gene-environment independence. Although for many penetrance models the joint test of genetic marginal effect and interaction is not the most powerful, it is nearly optimal across all penetrance models we considered. In particular, it generally has better power than the marginal test when the genetic effect is restricted to exposed subjects and much better power than the tests of gene-environment interaction when the genetic effect is not restricted to a particular exposure level. This makes the joint test an attractive tool for large-scale association scans where the true gene-environment interaction model is unknown.
Collapse
Affiliation(s)
- Peter Kraft
- Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA.
| | | | | | | | | |
Collapse
|
34
|
Schork NJ, Greenwood TA, Braff DL. Statistical genetics concepts and approaches in schizophrenia and related neuropsychiatric research. Schizophr Bull 2007; 33:95-104. [PMID: 17035359 PMCID: PMC2632283 DOI: 10.1093/schbul/sbl045] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Statistical genetics is a research field that focuses on mathematical models and statistical inference methodologies that relate genetic variations (ie, naturally occurring human DNA sequence variations or "polymorphisms") to particular traits or diseases (phenotypes) usually from data collected on large samples of families or individuals. The ultimate goal of such analysis is the identification of genes and genetic variations that influence disease susceptibility. Although of extreme interest and importance, the fact that many genes and environmental factors contribute to neuropsychiatric diseases of public health importance (eg, schizophrenia, bipolar disorder, and depression) complicates relevant studies and suggests that very sophisticated mathematical and statistical modeling may be required. In addition, large-scale contemporary human DNA sequencing and related projects, such as the Human Genome Project and the International HapMap Project, as well as the development of high-throughput DNA sequencing and genotyping technologies have provided statistical geneticists with a great deal of very relevant and appropriate information and resources. Unfortunately, the use of these resources and their interpretation are not straightforward when applied to complex, multifactorial diseases such as schizophrenia. In this brief and largely nonmathematical review of the field of statistical genetics, we describe many of the main concepts, definitions, and issues that motivate contemporary research. We also provide a discussion of the most pressing contemporary problems that demand further research if progress is to be made in the identification of genes and genetic variations that predispose to complex neuropsychiatric diseases.
Collapse
Affiliation(s)
- Nicholas J Schork
- Department of Psychiatry, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0603, USA.
| | | | | |
Collapse
|
35
|
Wang H, Stram DO. Optimal two-stage genome-wide association designs based on false discovery rate. Comput Stat Data Anal 2006. [DOI: 10.1016/j.csda.2006.04.034] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
36
|
Sillanpää MJ, Bhattacharjee M. Association mapping of complex trait loci with context-dependent effects and unknown context variable. Genetics 2006; 174:1597-611. [PMID: 17028339 PMCID: PMC1667093 DOI: 10.1534/genetics.106.061275] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2006] [Accepted: 08/28/2006] [Indexed: 11/18/2022] Open
Abstract
A novel method for Bayesian analysis of genetic heterogeneity and multilocus association in random population samples is presented. The method is valid for quantitative and binary traits as well as for multiallelic markers. In the method, individuals are stochastically assigned into two etiological groups that can have both their own, and possibly different, subsets of trait-associated (disease-predisposing) loci or alleles. The method is favorable especially in situations when etiological models are stratified by the factors that are unknown or went unmeasured, that is, if genetic heterogeneity is due to, for example, unknown genes x environment or genes x gene interactions. Additionally, a heterogeneity structure for the phenotype does not need to follow the structure of the general population; it can have a distinct selection history. The performance of the method is illustrated with simulated example of genes x environment interaction (quantitative trait with loosely linked markers) and compared to the results of single-group analysis in the presence of missing data. Additionally, example analyses with previously analyzed cystic fibrosis and type 2 diabetes data sets (binary traits with closely linked markers) are presented. The implementation (written in WinBUGS) is freely available for research purposes from http://www.rni.helsinki.fi/ approximately mjs/.
Collapse
|
37
|
Nicodemus KK, Kolachana BS, Vakkalanka R, Straub RE, Giegling I, Egan MF, Rujescu D, Weinberger DR. Evidence for statistical epistasis between catechol-O-methyltransferase (COMT) and polymorphisms in RGS4, G72 (DAOA), GRM3, and DISC1: influence on risk of schizophrenia. Hum Genet 2006; 120:889-906. [PMID: 17006672 DOI: 10.1007/s00439-006-0257-3] [Citation(s) in RCA: 105] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2006] [Accepted: 08/31/2006] [Indexed: 10/24/2022]
Abstract
Catechol-O-methyltransferase (COMT) regulates dopamine degradation and is located in a genomic region that is deleted in a syndrome associated with psychosis, making it a promising candidate gene for schizophrenia. COMT also has been shown to influence prefrontal cortex processing efficiency. Prefrontal processing dysfunction is a common finding in schizophrenia, and a background of inefficient processing may modulate the effect of other candidate genes. Using the NIMH sibling study (SS), a non-independent case-control set, and an independent German (G) case-control set, we performed conditional/unconditional logistic regression to test for epistasis between SNPs in COMT (rs2097603, Val158Met (rs4680), rs165599) and polymorphisms in other schizophrenia susceptibility genes. Evidence for interaction was evaluated using a likelihood ratio test (LRT) between nested models. SNPs in RGS4, G72, GRM3, and DISC1 showed evidence for significant statistical epistasis with COMT. A striking result was found in RGS4: three of five SNPs showed a significant increase in risk [LRT P-values: 90387 = 0.05 (SS); SNP4 = 0.02 (SS), 0.02 (G); SNP18 = 0.04 (SS), 0.008 (G)] in interaction with COMT; main effects for RGS4 SNPs were null. Significant results for SNP4 and SNP18 were also found in the German study. We were able to detect statistical interaction between COMT and polymorphisms in candidate genes for schizophrenia, many of which had no significant main effect. In addition, we were able to replicate other studies, including allelic directionality. The use of epistatic models may improve replication of psychiatric candidate gene studies.
Collapse
Affiliation(s)
- Kristin K Nicodemus
- Clinical Brain Disorders Branch, National Institute of Mental Health, National Institute of Health, Bethesda, MD 20892, USA
| | | | | | | | | | | | | | | |
Collapse
|
38
|
Guedj M, Wojcik J, Della-Chiesa E, Nuel G, Forner K. A fast, unbiased and exact allelic test for case-control association studies. Hum Hered 2006; 61:210-21. [PMID: 16877868 DOI: 10.1159/000094776] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2006] [Accepted: 06/01/2006] [Indexed: 11/19/2022] Open
Abstract
Association studies are traditionally performed in the case-control framework. As a first step in the analysis process, comparing allele frequencies using the Pearson's chi-square statistic is often invoked. However such an approach assumes the independence of alleles under the hypothesis of no association, which may not always be the case. Consequently this method introduces a bias that deviates the expected type I error-rate. In this article we first propose an unbiased and exact test as an alternative to the biased allelic test. Available data require to perform thousands of such tests so we focused on its fast execution. Since the biased allelic test is still widely used in the community, we illustrate its pitfalls in the context of genome-wide association studies and particularly in the case of low-level tests. Finally, we compare the unbiased and exact test with the Cochran-Armitage test for trend and show it perfoms similarly in terms of power. The fast, unbiased and exact allelic test code is available in R, C++ and Perl at: http://stat.genopole.cnrs.fr/software/fueatest.
Collapse
Affiliation(s)
- M Guedj
- Statistique et Genome Laboratory, CNRS UMR 8071, Evry, France
| | | | | | | | | |
Collapse
|
39
|
Eberle MA, Rieder MJ, Kruglyak L, Nickerson DA. Allele frequency matching between SNPs reveals an excess of linkage disequilibrium in genic regions of the human genome. PLoS Genet 2006; 2:e142. [PMID: 16965180 PMCID: PMC1560400 DOI: 10.1371/journal.pgen.0020142] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2006] [Accepted: 07/25/2006] [Indexed: 02/04/2023] Open
Abstract
Significant interest has emerged in mapping genetic susceptibility for complex traits through whole-genome association studies. These studies rely on the extent of association, i.e., linkage disequilibrium (LD), between single nucleotide polymorphisms (SNPs) across the human genome. LD describes the nonrandom association between SNP pairs and can be used as a metric when designing maximally informative panels of SNPs for association studies in human populations. Using data from the 1.58 million SNPs genotyped by Perlegen, we explored the allele frequency dependence of the LD statistic r2 both empirically and theoretically. We show that average r2 values between SNPs unmatched for allele frequency are always limited to much less than 1 (theoretical
approximately 0.46 to 0.57 for this dataset). Frequency matching of SNP pairs provides a more sensitive measure for assessing the average decay of LD and generates average r2 values across nearly the entire informative range (from 0 to 0.89 through 0.95). Additionally, we analyzed the extent of perfect LD (r2 = 1.0) using frequency-matched SNPs and found significant differences in the extent of LD in genic regions versus intergenic regions. The SNP pairs exhibiting perfect LD showed a significant bias for derived, nonancestral alleles, providing evidence for positive natural selection in the human genome.
One of the primary goals for geneticists is isolating regions of the genome that convey increased risk of disease through the association of genetic polymorphisms with phenotypic traits. The recent availability of genome-wide polymorphism data (i.e., single nucleotide polymorphisms [SNPs]) has made association studies possible on an unprecedented scale, and the characterization and selection of these polymorphisms for these studies has been a topic of major interest. One method for choosing informative SNPs has been to compare the correlation between SNPs (a term called linkage disequilibrium), but this can create confounding problems when comparing SNPs of different frequencies. In this study, the authors show that if SNPs are compared to other SNPs of equal or near equal frequency, the correlation between them more accurately represents the true correlation. This also produces a more sensitive method for determining linkage disequilibrium. Using this method, SNPs were compared both within and outside of gene regions to examine the overall correlation between SNPs in each region. Matching SNPs according to their frequency greatly increased the maximum possible correlation and showed significantly higher correlations between SNPs within genes (intragenic) versus between genes (intergenic). Using the recently completed chimpanzee sequence, a larger fraction of high frequency human specific SNPs was found within the perfectly correlated SNP pairs in genic regions compared to intergenic regions. These observations suggest that regions of the genome around genes have been under selective pressure, leading to a greater correlation between SNPs. Genes found in regions with the highest correlations between SNPs will be of particular interest for future genotype-phenotype association studies.
Collapse
Affiliation(s)
- Michael A Eberle
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America.
| | | | | | | |
Collapse
|
40
|
lonita L, Man M. Optimal two-stage strategy for detecting interacting genes in complex diseases. BMC Genet 2006; 7:39. [PMID: 16776843 PMCID: PMC1523196 DOI: 10.1186/1471-2156-7-39] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2005] [Accepted: 06/15/2006] [Indexed: 11/25/2022] Open
Abstract
Background The mapping of complex diseases is one of the most important problems in human genetics today. The rapid development of technology for genetic research has led to the discovery of millions of polymorphisms across the human genome, making it possible to conduct genome-wide association studies with hundreds of thousands of markers. Given the large number of markers to be tested in such studies, a two-stage strategy may be a reasonable and powerful approach: in the first stage, a small subset of promising loci is identified using single-locus testing, and, in the second stage, multi-locus methods are used while taking into account the loci selected in the first stage. In this report, we investigate and compare two possible two-stage strategies for genome-wide association studies: a conditional approach and a simultaneous approach. Results We investigate the power of both the conditional and the simultaneous approach to detect the disease loci for a range of two-locus disease models in a case-control study design. Our results suggest that, overall, the conditional approach is more robust and more powerful than the simultaneous approach; the conditional approach can greatly outperform the simultaneous approach when one of the two disease loci has weak marginal effect, but interacts strongly with the other, stronger locus (easily detectable using single-locus methods in the first stage). Conclusion Genome-wide association studies hold the promise of finding new genes implicated in complex diseases. Two-stage strategies are likely to be employed in these large-scale studies. Therefore we compared two natural two-stage approaches: the conditional approach and the simultaneous approach. Our power studies suggest that, when doing genome-wide association studies, a two-stage conditional approach is likely to be more powerful than a two-stage simultaneous approach.
Collapse
Affiliation(s)
- luliana lonita
- Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY, 10012, USA
| | - Michael Man
- Nonclinical Statistics, Pfizer PGRD, 2800 Plymouth Rd, Ann Arbor, Ml, 48105, USA
| |
Collapse
|
41
|
Abstract
Evaluation of the association of haplotypes with either quantitative traits or disease status is common practice, and under some situations provides greater power than the evaluation of individual marker loci. The focus on haplotype analyses will increase as more single nucleotide polymorphisms (SNPs) are discovered, either because of interest in candidate gene regions, or because of interest in genome-wide association studies. However, there is little guidance on the determination of the sample size needed to achieve the desired power for a study, particularly when linkage phase of the haplotypes is unknown, and when a subset of tag-SNP markers is measured. There is a growing wealth of information on the distribution of haplotypes in different populations, and it is not unusual for investigators to measure genetic markers in pilot studies in order to gain knowledge of the distribution of haplotypes in the target population. Starting with this basic information on the distribution of haplotypes, we derive analytic methods to determine sample size or power to test the association of haplotypes with either a quantitative trait or disease status (e.g., a case-control study design), assuming that all subjects are unrelated. Our derivations cover both phase-known and phase-unknown haplotypes, allowing evaluation of the loss of efficiency due to unknown phase. We also extend our methods to when a subset of tag-SNPs is chosen, allowing investigators to explore the impact of tag-SNPs on power. Simulations illustrate that the theoretical power predictions are quite accurate over a broad range of conditions. Our theoretical formulae should provide useful guidance when planning haplotype association studies.
Collapse
Affiliation(s)
- Daniel J Schaid
- Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, MN 55905, USA.
| |
Collapse
|
42
|
Millstein J, Conti DV, Gilliland FD, Gauderman WJ. A testing framework for identifying susceptibility genes in the presence of epistasis. Am J Hum Genet 2006; 78:15-27. [PMID: 16385446 PMCID: PMC1380213 DOI: 10.1086/498850] [Citation(s) in RCA: 152] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2005] [Accepted: 10/05/2005] [Indexed: 01/17/2023] Open
Abstract
An efficient testing strategy called the "focused interaction testing framework" (FITF) was developed to identify susceptibility genes involved in epistatic interactions for case-control studies of candidate genes. In the FITF approach, likelihood-ratio tests are performed in stages that increase in the order of interaction considered. Joint tests of main effects and interactions are performed conditional on significant lower-order effects. A reduction in the number of tests performed is achieved by prescreening gene combinations with a goodness-of-fit chi2 statistic that depends on association among candidate genes in the pooled case-control group. Multiple testing is accounted for by controlling false-discovery rates. Simulation analysis demonstrated that the FITF approach is more powerful than marginal tests of candidate genes. FITF also outperformed multifactor dimensionality reduction when interactions involved additive, dominant, or recessive genes. In an application to asthma case-control data from the Children's Health Study, FITF identified a significant multilocus effect between the nicotinamide adenine dinucleotide (phosphate) reduced:quinone oxidoreductase gene (NQO1), myeloperoxidase gene (MPO), and catalase gene (CAT) (unadjusted P = .00026), three genes that are involved in the oxidative stress pathway. In an independent data set consisting primarily of African American and Asian American children, these three genes also showed a significant association with asthma status (P = .0008).
Collapse
Affiliation(s)
- Joshua Millstein
- National Oceanic and Atmospheric Administration/National Marine Fisheries Service, Alaska Fisheries Science Center, Seattle, WA 98115, USA.
| | | | | | | |
Collapse
|
43
|
Schaid DJ, McDonnell SK, Hebbring SJ, Cunningham JM, Thibodeau SN. Nonparametric tests of association of multiple genes with human disease. Am J Hum Genet 2005; 76:780-93. [PMID: 15786018 PMCID: PMC1199368 DOI: 10.1086/429838] [Citation(s) in RCA: 108] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2004] [Accepted: 02/21/2005] [Indexed: 11/03/2022] Open
Abstract
The genetic basis of many common human diseases is expected to be highly heterogeneous, with multiple causative loci and multiple alleles at some of the causative loci. Analyzing the association of disease with one genetic marker at a time can have weak power, because of relatively small genetic effects and the need to correct for multiple testing. Testing the simultaneous effects of multiple markers by multivariate statistics might improve power, but they too will not be very powerful when there are many markers, because of the many degrees of freedom. To overcome some of the limitations of current statistical methods for case-control studies of candidate genes, we develop a new class of nonparametric statistics that can simultaneously test the association of multiple markers with disease, with only a single degree of freedom. Our approach, which is based on U-statistics, first measures a score over all markers for pairs of subjects and then compares the averages of these scores between cases and controls. Genetic scoring for a pair of subjects is measured by a "kernel" function, which we allow to be fairly general. However, we provide guidelines on how to choose a kernel for different types of genetic effects. Our global statistic has the advantage of having only one degree of freedom and achieves its greatest power advantage when the contrasts of average genotype scores between cases and controls are in the same direction across multiple markers. Simulations illustrate that our proposed methods have the anticipated type I-error rate and that they can be more powerful than standard methods. Application of our methods to a study of candidate genes for prostate cancer illustrates their potential merits, and offers guidelines for interpretation.
Collapse
Affiliation(s)
- Daniel J Schaid
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA.
| | | | | | | | | |
Collapse
|
44
|
Abstract
Disappointments in replicating initial findings in gene mapping for complex traits are often attributed to small sample sizes and inadequate techniques to determine the threshold value. This is clearly not the whole truth. More fundamental reasons lie in the inherent heterogeneity related to disease, including genetic heterogeneity, differences in allele frequencies, and context-dependency in genetic architecture. There are also other reasons related to the data collection and analysis. Replication may remain a source of frustration unless more emphasis is put on controlling these sources of heterogeneity between studies.
Collapse
Affiliation(s)
- M J Sillanpää
- Rolf Nevanlinna Institute, Department of Mathematics and Statistics, P.O. Box 68, FIN-00014 University of Helsinki, Finland.
| | | |
Collapse
|
45
|
North BV, Curtis D, Sham PC. Application of logistic regression to case-control association studies involving two causative loci. Hum Hered 2005; 59:79-87. [PMID: 15838177 DOI: 10.1159/000085222] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2004] [Accepted: 11/14/2004] [Indexed: 11/19/2022] Open
Abstract
Models in which two susceptibility loci jointly influence the risk of developing disease can be explored using logistic regression analysis. Comparison of likelihoods of models incorporating different sets of disease model parameters allows inferences to be drawn regarding the nature of the joint effect of the loci. We have simulated case-control samples generated assuming different two-locus models and then analysed them using logistic regression. We show that this method is practicable and that, for the models we have used, it can be expected to allow useful inferences to be drawn from sample sizes consisting of hundreds of subjects. Interactions between loci can be explored, but interactive effects do not exactly correspond with classical definitions of epistasis. We have particularly examined the issue of the extent to which it is helpful to utilise information from a previously identified locus when investigating a second, unknown locus. We show that for some models conditional analysis can have substantially greater power while for others unconditional analysis can be more powerful. Hence we conclude that in general both conditional and unconditional analyses should be performed when searching for additional loci.
Collapse
Affiliation(s)
- Bernard V North
- Academic Department of Psychiatry, Queen Mary's School of Medicine and Dentistry, London E1 1BB, UK
| | | | | |
Collapse
|
46
|
Shen H, Liu Y, Liu P, Recker RR, Deng HW. Nonreplication in genetic studies of complex diseases--lessons learned from studies of osteoporosis and tentative remedies. J Bone Miner Res 2005; 20:365-76. [PMID: 15746981 DOI: 10.1359/jbmr.041129] [Citation(s) in RCA: 55] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/16/2004] [Revised: 08/29/2004] [Accepted: 10/15/2004] [Indexed: 12/17/2022]
Abstract
Inconsistent results have accumulated in genetic studies of complex diseases/traits over the past decade. Using osteoporosis as an example, we address major potential factors for the nonreplication results and propose some potential remedies. Over the past decade, numerous linkage and association studies have been performed to search for genes predisposing to complex human diseases. However, relatively little success has been achieved, and inconsistent results have accumulated. We argue that those nonreplication results are not unexpected, given the complicated nature of complex diseases and a number of confounding factors. In this article, based on our experience in genetic studies of osteoporosis, we discuss major potential factors for the inconsistent results and propose some potential remedies. We believe that one of the main reasons for this lack of reproducibility is overinterpretation of nominally significant results from studies with insufficient statistical power. We indicate that the power of a study is not only influenced by the sample size, but also by genetic heterogeneity, the extent and degree of linkage disequilibrium (LD) between the markers tested and the causal variants, and the allele frequency differences between them. We also discuss the effects of other confounding factors, including population stratification, phenotype difference, genotype and phenotype quality control, multiple testing, and genuine biological differences. In addition, we note that with low statistical power, even a "replicated" finding is still likely to be a false positive. We believe that with rigorous control of study design and interpretation of different outcomes, inconsistency will be largely reduced, and the chances of successfully revealing genetic components of complex diseases will be greatly improved.
Collapse
Affiliation(s)
- Hui Shen
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education and Institute of Molecular Genetics, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | | | | | | | | |
Collapse
|
47
|
Kelly PJ, Stallard N, Whittaker JC. Statistical design and analysis of pharmacogenetic trials. Stat Med 2005; 24:1495-508. [PMID: 15706636 DOI: 10.1002/sim.2052] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Pharmacogenetic trials investigate the effect of genotype on treatment response. When there are two or more treatment groups and two or more genetic groups, investigation of gene-treatment interactions is of key interest. However, calculation of the power to detect such interactions is complicated because this depends not only on the treatment effect size within each genetic group, but also on the number of genetic groups, the size of each genetic group, and the type of genetic effect that is both present and tested for. The scale chosen to measure the magnitude of an interaction can also be problematic, especially for the binary case. Elston et al. proposed a test for detecting the presence of gene-treatment interactions for binary responses, and gave appropriate power calculations. This paper shows how the same approach can also be used for normally distributed responses. We also propose a method for analysing and performing sample size calculations based on a generalized linear model (GLM) approach. The power of the Elston et al. and GLM approaches are compared for the binary and normal case using several illustrative examples. While more sensitive to errors in model specification than the Elston et al. approach, the GLM approach is much more flexible and in many cases more powerful.
Collapse
Affiliation(s)
- Patrick J Kelly
- Medical and Pharmaceutical Statistics Research Unit, The University of Reading, Reading RG6 6FN, UK.
| | | | | |
Collapse
|
48
|
Zill P, Baghai TC, Zwanzger P, Schüle C, Eser D, Rupprecht R, Möller HJ, Bondy B, Ackenheil M. SNP and haplotype analysis of a novel tryptophan hydroxylase isoform (TPH2) gene provide evidence for association with major depression. Mol Psychiatry 2004; 9:1030-6. [PMID: 15124006 DOI: 10.1038/sj.mp.4001525] [Citation(s) in RCA: 244] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Tryptophan hydroxylase (TPH), being the rate-limiting enzyme in the biosynthesis of serotonin plays a major role as candidate gene in several psychiatric disorders. Recently, a second TPH isoform (TPH2) was identified in mice, which was exclusively present in the brain. In a previous post-mortem study of our own group, we could demonstrate that TPH2 is also expressed in the human brain, but not in peripheral tissues. This is the first report of an association study between polymorphisms in the TPH2 gene and major depression (MD). We performed single-nucleotide polymorphism (SNP), haplotype and linkage disequlibrium studies on 300 depressed patients and 265 healthy controls with 10 SNPs in the TPH2 gene. Significant association was detected between one SNP (P=0.0012, global P=0.0051) and MD. Haplotype analysis produced additional support for association (P<0.0001, global P=0.0001). Our findings provide evidence for an involvement of genetic variants of the TPH2 gene in the pathogenesis of MD and might be a hint on the repeatedly discussed duality of the serotonergic system. These results may open up new research strategies for the analysis of the observed disturbances in the serotonergic system in patients suffering from several other psychiatric disorders.
Collapse
Affiliation(s)
- P Zill
- Psychiatric Hospital of the Ludwig-Maximilians-University, Munich, Munich D-80336, Germany.
| | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Qian D. Haplotype sharing correlation analysis using family data: a comparison with family-based association test in the presence of allelic heterogeneity. Genet Epidemiol 2004; 27:43-52. [PMID: 15185402 DOI: 10.1002/gepi.20005] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The haplotype-sharing correlation (HSC) method for association analysis using family data is revisited by introducing a permutation procedure for estimating region-wise significance at each marker on a study segment. In simulation studies, the HSC method has a correct type 1 error rate in both unstructured and structured populations. The HSC signals on disease segments occur in the vicinity of a true disease locus on a restricted region without recombination hotspots. However, the peak signal may not pinpoint the true disease location in a small region with dense markers. The HSC method is shown to have higher power than single- and multilocus family-based association test (FBAT) methods when the true disease locus is unobserved among the study markers, and especially under conditions of weak linkage disequilibrium and multiple ancestral disease alleles. These simulation results suggest that the HSC method has the capacity to identify true disease-associated segments under allelic heterogeneity that go undetected by the FBAT method that compares allelic or haplotypic frequencies.
Collapse
Affiliation(s)
- Dajun Qian
- Department of Biostatistics, City of Hope National Medical Center, Duarte, California 91010-3000, USA.
| |
Collapse
|
50
|
Zill P, Baghai TC, Engel R, Zwanzger P, Schüle C, Eser D, Behrens S, Rupprecht R, Möller HJ, Ackenheil M, Bondy B. The dysbindin gene in major depression: an association study. Am J Med Genet B Neuropsychiatr Genet 2004; 129B:55-8. [PMID: 15274041 DOI: 10.1002/ajmg.b.30064] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The pathophysiological mechanisms, as well as the molecular loci of antidepressant drug action have not yet been established, but recent models proposed that several adaptive mechanisms in signal transduction cascades beyond the receptor and reuptake systems are involved in antidepressant action and play an important role in the etiology of affective disorders. In this context, the dysbindin gene (dystrobrevin-binding-protein 1, DTNBP1), which was recently reported to be associated with schizophrenia seems to be an interesting candidate gene for affective disorders. Dysbindin is widely expressed in the human brain and binds to the dystrophin-associated protein complex (DPC) which appears to be involved in signal transduction pathways, which have been repeatedly investigated and described as altered or disturbed in affective disorders [McLeod et al. [2003: Psychopharmacol Bull 35:24-41]; Brambilla et al. [2003: Mol Psychiatry 8:721-737]]. Therefore, we investigated whether five SNPs in the dysbindin gene could be susceptibility factors in the ethiology of major depression or for the response to antidepressant treatment in a sample of 293 patients compared to 220 healthy controls. Applying single SNP evaluation, as well as haplotype analysis we could not detect an association between the dysbindin polymorphisms and major depression or the response to antidepressant treatment. In conclusion, our results suggest that SNPs in the dysbindin gene are unlikely to play a major role in the pathophysiology of major depression or are in linkage disequilibrium (LD) with a neighboring mutation or gene. Further analysis are needed to confirm these results.
Collapse
Affiliation(s)
- Peter Zill
- Psychiatric Hospital of the Ludwig-Maximilians-University, Munich, Germany.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|