1
|
Yang ZY, Liu W, Yuan YX, Kong YF, Zhao PZ, Fung WK, Zhou JY. Robust association tests for quantitative traits on the X chromosome. Heredity (Edinb) 2022; 129:244-256. [PMID: 36085362 PMCID: PMC9519943 DOI: 10.1038/s41437-022-00560-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 08/24/2022] [Accepted: 08/24/2022] [Indexed: 11/09/2022] Open
Abstract
The genome-wide association study is an elementary tool to assess the genetic contribution to complex human traits. However, such association tests are mainly proposed for autosomes, and less attention has been given to methods for identifying loci on the X chromosome due to their distinct biological features. In addition, the existing association tests for quantitative traits on the X chromosome either fail to incorporate the information of males or only detect variance heterogeneity. Therefore, we propose four novel methods, which are denoted as QXcat, QZmax, QMVXcat and QMVZmax. When using these methods, it is assumed that the risk alleles for females and males are the same and that the locus being studied satisfies the generalized genetic model for females. The first two methods are based on comparing the means of the trait value across different genotypes, while the latter two methods test for the difference of both means and variances. All four methods effectively incorporate the information of X chromosome inactivation. Simulation studies demonstrate that the proposed methods control the type I error rates well. Under the simulated scenarios, the proposed methods are generally more powerful than the existing methods. We also apply our proposed methods to data from the Minnesota Center for Twin and Family Research and find 10 single nucleotide polymorphisms that are statistically significantly associated with at least two traits at the significance level of 1 × 10-3.
Collapse
Affiliation(s)
- Zi-Ying Yang
- Department of Biostatistics, State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, School of Public Health, Southern Medical University, Guangzhou, China
- Guangdong-Hong Kong-Macao Joint Laboratory for Contaminants Exposure and Health, Guangzhou, China
| | - Wei Liu
- Department of Biostatistics, State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, School of Public Health, Southern Medical University, Guangzhou, China
- Guangdong-Hong Kong-Macao Joint Laboratory for Contaminants Exposure and Health, Guangzhou, China
| | - Yu-Xin Yuan
- Department of Biostatistics, State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, School of Public Health, Southern Medical University, Guangzhou, China
- Guangdong-Hong Kong-Macao Joint Laboratory for Contaminants Exposure and Health, Guangzhou, China
| | - Yi-Fan Kong
- Department of Biostatistics, State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, School of Public Health, Southern Medical University, Guangzhou, China
| | - Pei-Zhen Zhao
- Department of Biostatistics, State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, School of Public Health, Southern Medical University, Guangzhou, China
| | - Wing Kam Fung
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong, China.
| | - Ji-Yuan Zhou
- Department of Biostatistics, State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, School of Public Health, Southern Medical University, Guangzhou, China.
- Guangdong-Hong Kong-Macao Joint Laboratory for Contaminants Exposure and Health, Guangzhou, China.
| |
Collapse
|
2
|
CMAX3: A Robust Statistical Test for Genetic Association Accounting for Covariates. Genes (Basel) 2021; 12:genes12111723. [PMID: 34828328 PMCID: PMC8622598 DOI: 10.3390/genes12111723] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 10/21/2021] [Accepted: 10/26/2021] [Indexed: 12/13/2022] Open
Abstract
The additive genetic model as implemented in logistic regression has been widely used in genome-wide association studies (GWASs) for binary outcomes. Unfortunately, for many complex diseases, the underlying genetic models are generally unknown and a mis-specification of the genetic model can result in a substantial loss of power. To address this issue, the MAX3 test (the maximum of three separate test statistics) has been proposed as a robust test that performs plausibly regardless of the underlying genetic model. However, the original implementation of MAX3 utilizes the trend test so it cannot adjust for any covariates such as age and gender. This drawback has significantly limited the application of the MAX3 in GWASs, as covariates account for a considerable amount of variability in these disorders. In this paper, we extended the MAX3 and proposed the CMAX3 (covariate-adjusted MAX3) based on logistic regression. The proposed test yielded a similar robust efficiency as the original MAX3 while easily adjusting for any covariate based on the likelihood framework. The asymptotic formula to calculate the p-value of the proposed test was also developed in this paper. The simulation results showed that the proposed test performed desirably under both the null and alternative hypotheses. For the purpose of illustration, we applied the proposed test to re-analyze a case-control GWAS dataset from the Collaborative Studies on Genetics of Alcoholism (COGA). The R code to implement the proposed test is also introduced in this paper and is available for free download.
Collapse
|
3
|
Moore CM, Jacobson SA, Fingerlin TE. Power and Sample Size Calculations for Genetic Association Studies in the Presence of Genetic Model Misspecification. Hum Hered 2020; 84:256-271. [PMID: 32721961 DOI: 10.1159/000508558] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 05/11/2020] [Indexed: 01/02/2023] Open
Abstract
INTRODUCTION When analyzing data from large-scale genetic association studies, such as targeted or genome-wide resequencing studies, it is common to assume a single genetic model, such as dominant or additive, for all tests of association between a given genetic variant and the phenotype. However, for many variants, the chosen model will result in poor model fit and may lack statistical power due to model misspecification. OBJECTIVE We develop power and sample size calculations for tests of gene and gene × environment interaction, allowing for misspecification of the true mode of genetic susceptibility. METHODS The power calculations are based on a likelihood ratio test framework and are implemented in an open-source R package ("genpwr"). RESULTS We use these methods to develop an analysis plan for a resequencing study in idiopathic pulmonary fibrosis and show that using a 2-degree of freedom test can increase power to detect recessive genetic effects while maintaining power to detect dominant and additive effects. CONCLUSIONS Understanding the impact of model misspecification can aid in study design and developing analysis plans that maximize power to detect a range of true underlying genetic effects. In particular, these calculations help identify when a multiple degree of freedom test or other robust test of association may be advantageous.
Collapse
Affiliation(s)
- Camille M Moore
- Center for Genes, Environment, and Health, National Jewish Health, Denver, Colorado, USA,
| | - Sean A Jacobson
- Center for Genes, Environment, and Health, National Jewish Health, Denver, Colorado, USA
| | - Tasha E Fingerlin
- Center for Genes, Environment, and Health, National Jewish Health, Denver, Colorado, USA
| |
Collapse
|
4
|
Chen Z, Liu Q, Wang K. A genetic association test through combining two independent tests. Genomics 2018; 111:1152-1159. [PMID: 30009923 DOI: 10.1016/j.ygeno.2018.07.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Revised: 06/25/2018] [Accepted: 07/11/2018] [Indexed: 12/21/2022]
Abstract
Gene- and pathway-based variant association tests are important tools in finding genetic variants that are associated with phenotypes of interest. Although some methods have been proposed in the literature, powerful and robust statistical tests are still desirable in this area. In this study, we propose a statistical test based on decomposing the genotype data into orthogonal parts from which powerful and robust independent p-value combination approaches can be utilized. Through a comprehensive simulation study, we compare the proposed test with some existing popular ones. Our simulation results show that the new test has great performance in terms of controlling type I error rate and statistical power. Real data applications are also conducted to illustrate the performance and usefulness of the proposed test.
Collapse
Affiliation(s)
- Zhongxue Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Indiana University Bloomington, 1025 E. 7th street, Bloomington, IN 47405, USA.
| | - Qingzhong Liu
- Department of Computer Science, Sam Houston State University, 1803 Avenue I, Huntsville, TX 77341, USA
| | - Kai Wang
- Department of Biostatistics, College of Public Health, University of Iowa, 145 N. Riverside Drive, Iowa City, IA 52242, USA
| |
Collapse
|
5
|
Chen Z, Lu Y, Lin T, Liu Q, Wang K. Gene-based genetic association test with adaptive optimal weights. Genet Epidemiol 2017; 42:95-103. [DOI: 10.1002/gepi.22098] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Accepted: 10/22/2017] [Indexed: 12/13/2022]
Affiliation(s)
- Zhongxue Chen
- Department of Epidemiology and Biostatistics; School of Public Health; Indiana University Bloomington; Bloomington Indiana United States of America
| | - Yan Lu
- Department of Mathematics and Statistics; University of New Mexico; Albuquerque New Mexico United States of America
| | - Tong Lin
- The Key Laboratory of Machine Perception (Ministry of Education); School of EECS; Peking University; Beijing China
| | - Qingzhong Liu
- Department of Computer Science; Sam Houston State University; Huntsville Texas United States of America
| | - Kai Wang
- Department of Biostatistics; College of Public Health; University of Iowa; Iowa City Iowa United States of America
| |
Collapse
|
6
|
Gaye A, Davis SK. Genetic model misspecification in genetic association studies. BMC Res Notes 2017; 10:569. [PMID: 29115983 PMCID: PMC5678796 DOI: 10.1186/s13104-017-2911-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Accepted: 11/01/2017] [Indexed: 02/08/2023] Open
Abstract
Objective The underlying model of the genetic determinant of a trait is generally not known with certainty a priori. Hence, in genetic association studies, a dominant model might be erroneously modelled as additive, an error investigated previously. We explored this question, for candidate gene studies, by evaluating the sample size required to compensate for the misspecification and improve inference at the analysis stage. Power calculations were carried out with (1) the true dominant model and (2) the incorrect additive model. Empirical power, sample size and effect size were compared between scenarios (1) and (2). In each of the scenarios the estimates were evaluated for a rare (minor allele frequency < 0.01), low frequency (0.01 ≤ minor allele frequency < 0.05) and common (minor allele frequency ≥ 0.05) single nucleotide polymorphism. Results The results confirm the detrimental effect of the misspecification error on power and effect size for any minor allele frequency. The implications of the error are not negligible; therefore, candidate gene studies should consider the more conservative sample size to compensate for the effect of error. When it is not possible to extend the sample size, methods that help mitigate the impact of the error should be systematically used. Electronic supplementary material The online version of this article (10.1186/s13104-017-2911-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Amadou Gaye
- Metabolic, Cardiovascular and Inflammatory Disease Genomics Branch, Social Epidemiology Research Unit, National Institutes of Health, National Human Genome Research Institute, Bethesda, USA.
| | - Sharon K Davis
- Metabolic, Cardiovascular and Inflammatory Disease Genomics Branch, Social Epidemiology Research Unit, National Institutes of Health, National Human Genome Research Institute, Bethesda, USA
| |
Collapse
|
7
|
A Powerful Variant-Set Association Test Based on Chi-Square Distribution. Genetics 2017; 207:903-910. [PMID: 28912342 DOI: 10.1534/genetics.117.300287] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Accepted: 09/10/2017] [Indexed: 01/19/2023] Open
Abstract
Detecting the association between a set of variants and a given phenotype has attracted a large amount of attention in the scientific community, although it is a difficult task. Recently, several related statistical approaches have been proposed in the literature; powerful statistical tests are still highly desired and yet to be developed in this area. In this paper, we propose a powerful test that combines information from each individual single nucleotide polymorphism (SNP) based on principal component analysis without relying on the eigenvalues associated with the principal components. We compare the proposed approach with some popular tests through a simulation study and real data applications. Our results show that, in general, the new test is more powerful than its competitors considered in this study; the gain in detecting power can be substantial in many situations.
Collapse
|
8
|
A gene-based test of association through an orthogonal decomposition of genotype scores. Hum Genet 2017; 136:1385-1394. [PMID: 28864915 DOI: 10.1007/s00439-017-1839-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 08/26/2017] [Indexed: 10/18/2022]
Abstract
The burden test and the sequence kernel association test (SKAT) are two popular methods for detecting association with rare variants. Treated as two different sources of association information, they are adaptively combined to form an optimal SKAT (SKAT-O) method for optimal power. We show that the burden test is part of rather than independent of the SKAT. We introduce a new test statistic that is the sum of the burden statistic and a statistic asymptotically independent of the burden statistic. The performance of this new test statistic is demonstrated through extensive simulation studies and applications to a Genetic Analysis Workshop 17 data set and the Ocular Hypertension Treatment Study data.
Collapse
|
9
|
Chen Z, Han S, Wang K. Genetic association test based on principal component analysis. Stat Appl Genet Mol Biol 2017; 16:189-198. [DOI: 10.1515/sagmb-2016-0061] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
AbstractMany gene- and pathway-based association tests have been proposed in the literature. Among them, the SKAT is widely used, especially for rare variants association studies. In this paper, we investigate the connection between SKAT and a principal component analysis. This investigation leads to a procedure that encompasses SKAT as a special case. Through simulation studies and real data applications, we compare the proposed method with some existing tests.
Collapse
|
10
|
Pan DD, Li ZB, Li QZ, Kam Fung W. A Novel Powerful Joint Analysis with Data Fusion in Two-stage Case–Control Genome-wide Association Studies. COMMUN STAT-SIMUL C 2016. [DOI: 10.1080/03610918.2014.901360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
11
|
Chen Z, Yang W, Liu Q, Yang JY, Li J, Yang M. A new statistical approach to combining p-values using gamma distribution and its application to genome-wide association study. BMC Bioinformatics 2014; 15 Suppl 17:S3. [PMID: 25559433 PMCID: PMC4304193 DOI: 10.1186/1471-2105-15-s17-s3] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Background Combining information from different studies is an important and useful practice in bioinformatics, including genome-wide association study, rare variant data analysis and other set-based analyses. Many statistical methods have been proposed to combine p-values from independent studies. However, it is known that there is no uniformly most powerful test under all conditions; therefore, finding a powerful test in specific situation is important and desirable. Results In this paper, we propose a new statistical approach to combining p-values based on gamma distribution, which uses the inverse of the p-value as the shape parameter in the gamma distribution. Conclusions Simulation study and real data application demonstrate that the proposed method has good performance under some situations.
Collapse
|
12
|
Chen Z, Ng HKT, Li J, Liu Q, Huang H. Detecting associated single-nucleotide polymorphisms on the X chromosome in case control genome-wide association studies. Stat Methods Med Res 2014; 26:567-582. [PMID: 25253574 DOI: 10.1177/0962280214551815] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
In the past decade, hundreds of genome-wide association studies have been conducted to detect the significant single-nucleotide polymorphisms that are associated with certain diseases. However, most of the data from the X chromosome were not analyzed and only a few significant associated single-nucleotide polymorphisms from the X chromosome have been identified from genome-wide association studies. This is mainly due to the lack of powerful statistical tests. In this paper, we propose a novel statistical approach that combines the information of single-nucleotide polymorphisms on the X chromosome from both males and females in an efficient way. The proposed approach avoids the need of making strong assumptions about the underlying genetic models. Our proposed statistical test is a robust method that only makes the assumption that the risk allele is the same for both females and males if the single-nucleotide polymorphism is associated with the disease for both genders. Through simulation study and a real data application, we show that the proposed procedure is robust and have excellent performance compared to existing methods. We expect that many more associated single-nucleotide polymorphisms on the X chromosome will be identified if the proposed approach is applied to current available genome-wide association studies data.
Collapse
Affiliation(s)
- Zhongxue Chen
- 1 Department of Epidemiology and Biostatistics, School of Public Health, Indiana University Bloomington, Bloomington, IN, USA
| | - Hon Keung Tony Ng
- 2 Department of Statistical Science, Southern Methodist University, Dallas, TX, USA
| | - Jing Li
- 1 Department of Epidemiology and Biostatistics, School of Public Health, Indiana University Bloomington, Bloomington, IN, USA
| | - Qingzhong Liu
- 3 Department of Computer Science, Sam Houston State Uiversity, Huntsville, TX, USA
| | - Hanwen Huang
- 4 Department of Epidemiology and Biostatistics, College of Public Health, University of Georgia, Athens, GA, USA
| |
Collapse
|
13
|
|
14
|
Chen Z, Huang H, Liu Q. Detecting differentially methylated loci for multiple treatments based on high-throughput methylation data. BMC Bioinformatics 2014; 15:142. [PMID: 24884464 PMCID: PMC4026834 DOI: 10.1186/1471-2105-15-142] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2013] [Accepted: 05/06/2014] [Indexed: 12/31/2022] Open
Abstract
Background Because of its important effects, as an epigenetic factor, on gene expression and disease development, DNA methylation has drawn much attention from researchers. Detecting differentially methylated loci is an important but challenging step in studying the regulatory roles of DNA methylation in a broad range of biological processes and diseases. Several statistical approaches have been proposed to detect significant methylated loci; however, most of them were designed specifically for case-control studies. Results Noticing that the age is associated with methylation level and the methylation data are not normally distributed, in this paper, we propose a nonparametric method to detect differentially methylated loci under multiple conditions with trend for Illumina Array Methylation data. The nonparametric method, Cuzick test is used to detect the differences among treatment groups with trend for each age group; then an overall p-value is calculated based on the method of combining those independent p-values each from one age group. Conclusions We compare the new approach with other methods using simulated and real data. Our study shows that the proposed method outperforms other methods considered in this paper in term of power: it detected more biological meaningful differentially methylated loci than others.
Collapse
Affiliation(s)
- Zhongxue Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Indiana University Bloomington, 1025 E, 7th street, PH C104, Bloomington, IN 47405, USA.
| | | | | |
Collapse
|
15
|
Chen Z. A new association test based on disease allele selection for case-control genome-wide association studies. BMC Genomics 2014; 15:358. [PMID: 24886381 PMCID: PMC4059871 DOI: 10.1186/1471-2164-15-358] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2014] [Accepted: 05/06/2014] [Indexed: 12/20/2022] Open
Abstract
Background Current robust association tests for case–control genome-wide association study (GWAS) data are mainly based on the assumption of some specific genetic models. Due to the richness of the genetic models, this assumption may not be appropriate. Therefore, robust but powerful association approaches are desirable. Results In this paper, we propose a new approach to testing for the association between the genotype and phenotype for case–control GWAS. This method assumes a generalized genetic model and is based on the selected disease allele to obtain a p-value from the more powerful one-sided test. Through a comprehensive simulation study we assess the performance of the new test by comparing it with existing methods. Some real data applications are also used to illustrate the use of the proposed test. Conclusions Based on the simulation results and real data application, the proposed test is powerful and robust. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-358) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zhongxue Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Indiana University Bloomington, 1025 E, 7th street, PH C104, Bloomington, IN 47405, USA.
| |
Collapse
|
16
|
Chen Z, Nadarajah S. On the optimally weighted z-test for combining probabilities from independent studies. Comput Stat Data Anal 2014. [DOI: 10.1016/j.csda.2013.09.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
17
|
Lee D, Bacanu SA. Association testing strategy for data from dense marker panels. PLoS One 2013; 8:e80540. [PMID: 24265830 PMCID: PMC3827222 DOI: 10.1371/journal.pone.0080540] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2013] [Accepted: 10/14/2013] [Indexed: 01/31/2023] Open
Abstract
Genome wide association studies have been usually analyzed in a univariate manner. The commonly used univariate tests have one degree of freedom and assume an additive mode of inheritance. The experiment-wise significance of these univariate statistics is obtained by adjusting for multiple testing. Next generation sequencing studies, which assay 10-20 million variants, are beginning to come online. For these studies, the strategy of additive univariate testing and multiple testing adjustment is likely to result in a loss of power due to (1) the substantial multiple testing burden and (2) the possibility of a non-additive causal mode of inheritance. To reduce the power loss we propose: a new method (1) to summarize in a single statistic the strength of the association signals coming from all not-very-rare variants in a linkage disequilibrium block and (2) to incorporate, in any linkage disequilibrium block statistic, the strength of the association signals under multiple modes of inheritance. The proposed linkage disequilibrium block test consists of the sum of squares of nominally significant univariate statistics. We compare the performance of this method to the performance of existing linkage disequilibrium block/gene-based methods. Simulations show that (1) extending methods to combine testing for multiple modes of inheritance leads to substantial power gains, especially for a recessive mode of inheritance, and (2) the proposed method has a good overall performance. Based on simulation results, we provide practical advice on choosing suitable methods for applied analyses.
Collapse
Affiliation(s)
- Donghyung Lee
- Department of Psychiatry, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, Virginia, United States of America
- *E-mail:
| | - Silviu-Alin Bacanu
- Department of Psychiatry, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, Virginia, United States of America
| |
Collapse
|
18
|
|
19
|
Huang H, Chen Z, Huang X. Age-adjusted nonparametric detection of differential DNA methylation with case-control designs. BMC Bioinformatics 2013; 14:86. [PMID: 23497201 PMCID: PMC3599607 DOI: 10.1186/1471-2105-14-86] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2012] [Accepted: 02/20/2013] [Indexed: 12/31/2022] Open
Abstract
Background DNA methylation profiles differ among disease types and, therefore, can be used in disease diagnosis. In addition, large-scale whole genome DNA methylation data offer tremendous potential in understanding the role of DNA methylation in normal development and function. However, due to the unique feature of the methylation data, powerful and robust statistical methods are very limited in this area. Results In this paper, we proposed and examined a new statistical method to detect differentially methylated loci for case control designs that is fully nonparametric and does not depend on any assumption for the underlying distribution of the data. Moreover, the proposed method adjusts for the age effect that has been shown to be highly correlated with DNA methylation profiles. Using simulation studies and a real data application, we have demonstrated the advantages of our method over existing commonly used methods. Conclusions Compared to existing methods, our method improved the detection power for differentially methylated loci for case control designs and controlled the type I error well. Its applications are not limited to methylation data; it can be extended to many other case–control studies.
Collapse
Affiliation(s)
- Hanwen Huang
- Department of Epidemiology and Biostatistics, University of Georgia, Athens, GA 30605, USA
| | | | | |
Collapse
|
20
|
Chen Z, Huang H, Ng HKT. Testing for association in case-control genome-wide association studies with shared controls. Stat Methods Med Res 2013; 25:954-67. [DOI: 10.1177/0962280212474061] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The statistical analysis of genome-wide association studies (GWASs) with multiple diseases and shared controls (SCs) is discussed. The usual method for analyzing data from these studies is to compare each individual disease with either the SCs or the pooled controls which include other diseases. We observed that applying individual association tests can be problematic because these tests may suffer from power loss in detecting significant associations between diseases and single-nucleotide polymorphism or copy number variant. We propose here a two-stage procedure wherein we first apply an overall chi-square test for multiple diseases with SCs; if the overall test is rejected, then individual tests using the chi-square partition method will be applied to each disease against SCs. A real GWAS data set with SCs and a Monte Carlo simulation study are used to demonstrate that the proposed method is more effective and preferable than other existing methods for analyzing data from GWASs with multiple diseases and SCs.
Collapse
Affiliation(s)
- Zhongxue Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Indiana University Bloomington, Bloomington, IN, USA
| | - Hanwen Huang
- Center for Clinical and Translational Sciences, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Hon Keung Tony Ng
- Department of Statistical Science, Southern Methodist University, Dallas, TX, USA
| |
Collapse
|
21
|
Chen Z, Huang H, Liu J, Tony Ng HK, Nadarajah S, Huang X, Deng Y. Detecting differentially methylated loci for Illumina Array methylation data based on human ovarian cancer data. BMC Med Genomics 2013; 6 Suppl 1:S9. [PMID: 23369576 PMCID: PMC3552689 DOI: 10.1186/1755-8794-6-s1-s9] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Background It is well known that DNA methylation, as an epigenetic factor, has an important effect on gene expression and disease development. Detecting differentially methylated loci under different conditions, such as cancer types or treatments, is of great interest in current research as it is important in cancer diagnosis and classification. However, inappropriate testing approaches can result in large false positives and/or false negatives. Appropriate and powerful statistical methods are desirable but very limited in the literature. Results In this paper, we propose a nonparametric method to detect differentially methylated loci under multiple conditions for Illumina Array Methylation data. We compare the new method with other methods using simulated and real data. Our study shows that the proposed one outperforms other methods considered in this paper. Conclusions Due to the unique feature of the Illumina Array Methylation data, commonly used statistical tests will lose power or give misleading results. Therefore, appropriate statistical methods are crucial for this type of data. Powerful statistical approaches remain to be developed. Availability R codes are available upon request.
Collapse
Affiliation(s)
- Zhongxue Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Indiana University Bloomington, 1025 E, 7th Street, Bloomington, IN 47405, USA.
| | | | | | | | | | | | | |
Collapse
|
22
|
Yu Z, Gillen D, Li CF, Demetriou M. Incorporating parental information into family-based association tests. Biostatistics 2012; 14:556-72. [PMID: 23266418 DOI: 10.1093/biostatistics/kxs048] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Assumptions regarding the true underlying genetic model, or mode of inheritance, are necessary when quantifying genetic associations with disease phenotypes. Here we propose new methods to ascertain the underlying genetic model from parental data in family-based association studies. Specifically, for parental mating-type data, we propose a novel statistic to test whether the underlying genetic model is additive, dominant, or recessive; for parental genotype-phenotype data, we propose three strategies to determine the true mode of inheritance. We illustrate how to incorporate the information gleaned from these strategies into family-based association tests. Because family-based association tests are conducted conditional on parental genotypes, the type I error rate of these procedures is not inflated by the information learned from parental data. This result holds even if such information is weak or when the assumption of Hardy-Weinberg equilibrium is violated. Our simulations demonstrate that incorporating parental data into family-based association tests can improve power under common inheritance models. The application of our proposed methods to a candidate-gene study of type 1 diabetes successfully detects a recessive effect in MGAT5 that would otherwise be missed by conventional family-based association tests.
Collapse
Affiliation(s)
- Zhaoxia Yu
- Department of Statistics, University of California at Irvine, Irvine, CA 92697, USA.
| | | | | | | |
Collapse
|
23
|
Chen Z, Huang H, Ng HKT. Design and analysis of multiple diseases genome-wide association studies without controls. Gene 2012; 510:87-92. [PMID: 22951808 PMCID: PMC3463729 DOI: 10.1016/j.gene.2012.07.089] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2012] [Revised: 07/19/2012] [Accepted: 07/30/2012] [Indexed: 12/31/2022]
Abstract
In genome-wide association studies (GWAS), multiple diseases with shared controls is one of the case-control study designs. If data obtained from these studies are appropriately analyzed, this design can have several advantages such as improving statistical power in detecting associations and reducing the time and cost in the data collection process. In this paper, we propose a study design for GWAS which involves multiple diseases but without controls. We also propose corresponding statistical data analysis strategy for GWAS with multiple diseases but no controls. Through a simulation study, we show that the statistical association test with the proposed study design is more powerful than the test with single disease sharing common controls, and it has comparable power to the overall test based on the whole dataset including the controls. We also apply the proposed method to a real GWAS dataset to illustrate the methodologies and the advantages of the proposed design. Some possible limitations of this study design and testing method and their solutions are also discussed. Our findings indicate that the proposed study design and statistical analysis strategy could be more efficient than the usual case-control GWAS as well as those with shared controls.
Collapse
Affiliation(s)
- Zhongxue Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Indiana University Bloomington, 1025 E. 7th Street, Bloomington, IN 47405-7109, USA.
| | | | | |
Collapse
|
24
|
Chen Z, Liu Q, Nadarajah S. A new statistical approach to detecting differentially methylated loci for case control Illumina array methylation data. Bioinformatics 2012; 28:1109-13. [PMID: 22368244 DOI: 10.1093/bioinformatics/bts093] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION As an epigenetic alteration, DNA methylation plays an important role in epigenetic controls of gene transcription. Recent advances in genome-wide scan of DNA methylation provide great opportunities in studying the impact of DNA methylation on many human diseases including various types of cancer. Due to the unique feature of this type of data, applicable statistical methods are limited and new sophisticated approaches are desirable. RESULTS In this article, we propose a new statistical test to detect differentially methylated loci for case control methylation data generated by Illumina arrays. This new method utilizes the important finding that DNA methylation is highly correlated with age. The proposed method estimates the overall P-value by combining the P-values from independent individual tests each for one age group. Through real data application and simulation study, we show that the proposed test is robust and usually more powerful than other methods.
Collapse
Affiliation(s)
- Zhongxue Chen
- Center for Clinical and Translational Sciences, University of Texas Health Science Center at Houston, Houston, Texas 77030, USA.
| | | | | |
Collapse
|