1
|
Thas O, Yuan A, Ng HKT, Zheng G. A proportional score test over the nuisance parameter space: Properties and applications. Stat Probab Lett 2015. [DOI: 10.1016/j.spl.2015.07.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
2
|
Qu L. Combining dependent F-tests for robust association of quantitative traits under genetic model uncertainty. Stat Appl Genet Mol Biol 2014; 13:123-39. [PMID: 24603842 DOI: 10.1515/sagmb-2013-0001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In association mapping of quantitative traits, the F-test based on an assumed genetic model is a basic statistical tool for testing association of each candidate locus with the trait of interest. However, the true underlying genetic model is often unknown, and using an incorrect model may cause serious loss of power. For case-control studies, it is known that the combination of several tests that are optimal for different models is robust to model misspecification. In this paper, we extend the test combination approach to quantitative trait association. We first derive the exact correlations among transformed test statistics and discuss interesting special cases. We then propose and evaluate a multivariate normality based approximation to the joint distribution of test statistics, such that the marginal distributions and pairwise correlations among test statistics are accounted for. Through simulations, we show that the sizes of the resulting approximate combined tests are accurate for practical purposes under a variety of situations. We find that the combination of the tests from the additive model and the genotypic model performs well, because it demonstrates both robustness to incorrect models and satisfactory power. A mouse lipoprotein data set is used to demonstrate the method.
Collapse
|
3
|
Zheng G, Li Q, Yuan A. Some Statistical Properties of Efficiency Robust Tests with Applications to Genetic Association Studies. Scand Stat Theory Appl 2014. [DOI: 10.1111/sjos.12060] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- Gang Zheng
- Office of Biostatistics Research; National Heart, Lung and Blood Institute
| | - Qizhai Li
- Academy of Mathematics and Systems Science; Chinese Academy of Sciences
| | - Ao Yuan
- National Human Genome Center, Howard University
| |
Collapse
|
4
|
Qian M, Shao Y. A likelihood ratio test for goodness-of-fit of recessive and dominant models for case-control studies. CAN J STAT 2013. [DOI: 10.1002/cjs.11171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
5
|
Yu Z, Gillen D, Li CF, Demetriou M. Incorporating parental information into family-based association tests. Biostatistics 2012; 14:556-72. [PMID: 23266418 DOI: 10.1093/biostatistics/kxs048] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Assumptions regarding the true underlying genetic model, or mode of inheritance, are necessary when quantifying genetic associations with disease phenotypes. Here we propose new methods to ascertain the underlying genetic model from parental data in family-based association studies. Specifically, for parental mating-type data, we propose a novel statistic to test whether the underlying genetic model is additive, dominant, or recessive; for parental genotype-phenotype data, we propose three strategies to determine the true mode of inheritance. We illustrate how to incorporate the information gleaned from these strategies into family-based association tests. Because family-based association tests are conducted conditional on parental genotypes, the type I error rate of these procedures is not inflated by the information learned from parental data. This result holds even if such information is weak or when the assumption of Hardy-Weinberg equilibrium is violated. Our simulations demonstrate that incorporating parental data into family-based association tests can improve power under common inheritance models. The application of our proposed methods to a candidate-gene study of type 1 diabetes successfully detects a recessive effect in MGAT5 that would otherwise be missed by conventional family-based association tests.
Collapse
Affiliation(s)
- Zhaoxia Yu
- Department of Statistics, University of California at Irvine, Irvine, CA 92697, USA.
| | | | | | | |
Collapse
|
6
|
Zheng G, Jinfeng X, Yuan A, Colin OW. Impact on modes of inheritance and relative risks of using extreme sampling when designing genetic association studies. Ann Hum Genet 2012; 77:80-4. [PMID: 23163532 DOI: 10.1111/j.1469-1809.2012.00733.x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2012] [Accepted: 08/28/2012] [Indexed: 11/29/2022]
Abstract
Using extreme phenotypes for association studies can improve statistical power . We study the impact of using samples with extremely high or low traits on the alternative model space, the genotype relative risks, and the genetic models in association studies. We prove the following results: when the risk allele causes high-trait values, the more extreme the high traits, the larger the genotype relative risks, which is not always true for using extreme low traits; we also prove that a genetic model theoretically changes with more extreme trait except for the recessive or dominant models. Practically, however, the impact of deviations from the true genetic model at a functional locus due to selective sampling is virtually negligible. The implications of our findings are discussed. Numerical values are reported for illustrations.
Collapse
Affiliation(s)
- Gang Zheng
- Office of Biostatistics Research, National Heart, Lung and Blood Institute, Bethesda, MD 20892, USA.
| | | | | | | |
Collapse
|
7
|
Xu J, Yuan A, Zheng G. Bayes factor based on the trend test incorporating Hardy-Weinberg disequilibrium: more power to detect genetic association. Ann Hum Genet 2012; 76:301-11. [PMID: 22607017 DOI: 10.1111/j.1469-1809.2012.00714.x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
In the analysis of case-control genetic association, the trend test and Pearson's test are the two most commonly used tests. In genome-wide association studies (GWAS), Bayes factor (BF) is a useful tool to support significant P-values, and a better measure than P-value when results are compared across studies with different sample sizes. When reporting the P-value of the trend test, we propose a BF directly based on the trend test. To improve the power to detect association under recessive or dominant genetic models, we propose a BF based on the trend test and incorporating Hardy-Weinberg disequilibrium in cases. When the true model is unknown, or both the trend test and Pearson's test or other robust tests are applied in genome-wide scans, we propose a joint BF, combining the previous two BFs. All three BFs studied in this paper have closed forms and are easy to compute without integrations, so they can be reported along with P-values, especially in GWAS. We discuss how to use each of them and how to specify priors. Simulation studies and applications to three GWAS are provided to illustrate their usefulness to detect nonadditive gene susceptibility in practice.
Collapse
Affiliation(s)
- Jinfeng Xu
- Department of Statistics and Applied Probability, National University of Singapore, Singapore
| | | | | |
Collapse
|
8
|
Zheng G, Wu CO, Kwak M, Jiang W, Joo J, Lima JAC. Joint analysis of binary and quantitative traits with data sharing and outcome-dependent sampling. Genet Epidemiol 2012; 36:263-73. [PMID: 22460626 DOI: 10.1002/gepi.21619] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2011] [Revised: 12/23/2011] [Accepted: 01/02/2012] [Indexed: 11/07/2022]
Abstract
We study the analysis of a joint association between a genetic marker with both binary (case-control) and quantitative (continuous) traits, where the quantitative trait values are only available for the cases due to data sharing and outcome-dependent sampling. Data sharing becomes common in genetic association studies, and the outcome-dependent sampling is the consequence of data sharing, under which a phenotype of interest is not measured for some subgroup. The trend test (or Pearson's test) and F-test are often, respectively, used to analyze the binary and quantitative traits. Because of the outcome-dependent sampling, the usual F-test can be applied using the subgroup with the observed quantitative traits. We propose a modified F-test by also incorporating the genotype frequencies of the subgroup whose traits are not observed. Further, a combination of this modified F-test and Pearson's test is proposed by Fisher's combination of their P-values as a joint analysis. Because of the correlation of the two analyses, we propose to use a Gamma (scaled chi-squared) distribution to fit the asymptotic null distribution for the joint analysis. The proposed modified F-test and the joint analysis can also be applied to test single trait association (either binary or quantitative trait). Through simulations, we identify the situations under which the proposed tests are more powerful than the existing ones. Application to a real dataset of rheumatoid arthritis is presented.
Collapse
Affiliation(s)
- Gang Zheng
- National Heart, Lung and Blood Institute, 6701 Rockledge Drive, Bethesda, MD 20892, USA.
| | | | | | | | | | | |
Collapse
|
9
|
Zheng G, Yuan A, Jeffries N. Hybrid Bayes factors for genome-wide association studies when a robust test is used. Comput Stat Data Anal 2011. [DOI: 10.1016/j.csda.2011.03.021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
10
|
Thompson JR, Attia J, Minelli C. The meta-analysis of genome-wide association studies. Brief Bioinform 2011; 12:259-69. [PMID: 21546449 DOI: 10.1093/bib/bbr020] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The pressure to publish novel genetic associations has meant that meta-analysis has been applied to genome-wide association studies without the time for a careful consideration of the methods that are used. This review distinguishes between the use of meta-analysis to validate previously reported genetic associations and its use for gene discovery, and advocates viewing gene discovery as an exploratory screen that requires independent replication instead of treating it as the application of hundreds of thousands of statistical tests. The review considers the use of fixed and random effects meta-analyses, the investigation of between-study heterogeneity, adjustment for confounding, assessing the combined evidence and genomic control, and comments on alternative approaches that have been used in the literature.
Collapse
|
11
|
Vukcevic D, Hechter E, Spencer C, Donnelly P. Disease model distortion in association studies. Genet Epidemiol 2011; 35:278-90. [PMID: 21416505 PMCID: PMC3110308 DOI: 10.1002/gepi.20576] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2010] [Revised: 12/15/2010] [Accepted: 01/12/2011] [Indexed: 12/27/2022]
Abstract
Most findings from genome-wide association studies (GWAS) are consistent with a simple disease model at a single nucleotide polymorphism, in which each additional copy of the risk allele increases risk by the same multiplicative factor, in contrast to dominance or interaction effects. As others have noted, departures from this multiplicative model are difficult to detect. Here, we seek to quantify this both analytically and empirically. We show that imperfect linkage disequilibrium (LD) between causal and marker loci distorts disease models, with the power to detect such departures dropping off very quickly: decaying as a function of r4, where r2 is the usual correlation between the causal and marker loci, in contrast to the well-known result that power to detect a multiplicative effect decays as a function of r2. We perform a simulation study with empirical patterns of LD to assess how this disease model distortion is likely to impact GWAS results. Among loci where association is detected, we observe that there is reasonable power to detect substantial deviations from the multiplicative model, such as for dominant and recessive models. Thus, it is worth explicitly testing for such deviations routinely.
Collapse
Affiliation(s)
- Damjan Vukcevic
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | | | | | | |
Collapse
|
12
|
Zaykin DV, Kozbur DO. P-value based analysis for shared controls design in genome-wide association studies. Genet Epidemiol 2011; 34:725-38. [PMID: 20976797 DOI: 10.1002/gepi.20536] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
An appealing genome-wide association study design compares one large control group against several disease samples. A pioneering study by the Wellcome Trust Case Control Consortium that employed such a design has identified multiple susceptibility regions, many of which have been independently replicated. While reusing a control sample provides effective utilization of data, it also creates correlation between association statistics across diseases. An observation of a large association statistic for one of the diseases may greatly increase chances of observing a spuriously large association for a different disease. Accounting for the correlation is also particularly important when screening for SNPs that might be involved in a set of diseases with overlapping etiology. We describe methods that correct association statistics for dependency due to shared controls, and we describe ways to obtain a measure of overall evidence and to combine association signals across multiple diseases. The methods we describe require no access to individual subject data, instead, they efficiently utilize information contained in P-values for association reported for individual diseases. P-value based combined tests for association are flexible and essentially as powerful as the approach based on aggregating the individual subject data.
Collapse
Affiliation(s)
- Dmitri V Zaykin
- Biostatistics Branch, National Institute of Environmental Health Sciences, National Institutes of Health, North Carolina, USA.
| | | |
Collapse
|
13
|
Abstract
Identifying the risk factors for mental illnesses is of significant public health importance. Diagnosis, stigma associated with mental illnesses, comorbidity, and complex etiologies, among others, make it very challenging to study mental disorders. Genetic studies of mental illnesses date back at least a century ago, beginning with descriptive studies based on Mendelian laws of inheritance. A variety of study designs including twin studies, family studies, linkage analysis, and more recently, genomewide association studies have been employed to study the genetics of mental illnesses, or complex diseases in general. In this paper, I will present the challenges and methods from a statistical perspective and focus on genetic association studies.
Collapse
Affiliation(s)
- Heping Zhang
- Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT 06520-8034
| |
Collapse
|