2
|
Sun R, Weng H, Hu I, Guo J, Wu WKK, Zee BCY, Wang MH. A W-test collapsing method for rare-variant association testing in exome sequencing data. Genet Epidemiol 2016; 40:591-596. [PMID: 27531462 DOI: 10.1002/gepi.22000] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Revised: 06/06/2016] [Accepted: 07/17/2016] [Indexed: 12/20/2022]
Abstract
Advancement in sequencing technology enables the study of association between complex disorder phenotypes and single-nucleotide polymorphisms with rare mutations. However, the rare genetic variant has extremely small variance and impairs testing power of traditional statistical methods. We introduce a W-test collapsing method to evaluate rare-variant association by measuring the distributional differences between cases and controls through combined log of odds ratio within a genomic region. The method is model-free and inherits chi-squared distribution with degrees of freedom estimated from bootstrapped samples of the data, and allows for fast and accurate P-value calculation without the need of permutations. The proposed method is compared with the Weighted-Sum Statistic and Sequence Kernel Association Test on simulation datasets, and showed good performances and significantly faster computing speed. In the application of real next-generation sequencing dataset of hypertensive disorder, it identified genes of interesting biological functions associated to metabolism disorder and inflammation, including the MACROD1, NLRP7, AGK, PAK6, and APBB1. The proposed method offers an efficient and effective way for testing rare genetic variants in whole exome sequencing datasets.
Collapse
Affiliation(s)
- Rui Sun
- Division of Biostatistics, Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, Chinese University of Hong Kong, Shatin, NT, Hong Kong SAR.,Centre for Clinical Trials and Biostatistics, CUHK Shenzhen Research Institute, Shenzhen, China
| | - Haoyi Weng
- Division of Biostatistics, Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, Chinese University of Hong Kong, Shatin, NT, Hong Kong SAR.,Centre for Clinical Trials and Biostatistics, CUHK Shenzhen Research Institute, Shenzhen, China
| | - Inchi Hu
- ISOM Department, Biomedical Engineering Division, Hong Kong University of Science and Technology, Kowloon, Hong Kong SAR
| | - Junfeng Guo
- Division of Biostatistics, Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, Chinese University of Hong Kong, Shatin, NT, Hong Kong SAR.,Centre for Clinical Trials and Biostatistics, CUHK Shenzhen Research Institute, Shenzhen, China.,Australian National University, Canberra, Australia
| | - William K K Wu
- Department of Anesthesia and Intensive Care, Chinese University of Hong Kong, Hong Kong, Hong Kong SAR
| | - Benny Chung-Ying Zee
- Division of Biostatistics, Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, Chinese University of Hong Kong, Shatin, NT, Hong Kong SAR.,Centre for Clinical Trials and Biostatistics, CUHK Shenzhen Research Institute, Shenzhen, China
| | - Maggie Haitian Wang
- Division of Biostatistics, Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, Chinese University of Hong Kong, Shatin, NT, Hong Kong SAR. .,Centre for Clinical Trials and Biostatistics, CUHK Shenzhen Research Institute, Shenzhen, China.
| |
Collapse
|
3
|
Yan Q, Weeks DE, Tiwari HK, Yi N, Zhang K, Gao G, Lin WY, Lou XY, Chen W, Liu N. Rare-Variant Kernel Machine Test for Longitudinal Data from Population and Family Samples. Hum Hered 2016; 80:126-38. [PMID: 27161037 DOI: 10.1159/000445057] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2015] [Accepted: 02/24/2016] [Indexed: 01/12/2023] Open
Abstract
OBJECTIVE The kernel machine (KM) test reportedly performs well in the set-based association test of rare variants. Many studies have been conducted to measure phenotypes at multiple time points, but the standard KM methodology has only been available for phenotypes at a single time point. In addition, family-based designs have been widely used in genetic association studies; therefore, the data analysis method used must appropriately handle familial relatedness. A rare-variant test does not currently exist for longitudinal data from family samples. Therefore, in this paper, we aim to introduce an association test for rare variants, which includes multiple longitudinal phenotype measurements for either population or family samples. METHODS This approach uses KM regression based on the linear mixed model framework and is applicable to longitudinal data from either population (L-KM) or family samples (LF-KM). RESULTS In our population-based simulation studies, L-KM has good control of Type I error rate and increased power in all the scenarios we considered compared with other competing methods. Conversely, in the family-based simulation studies, we found an inflated Type I error rate when L-KM was applied directly to the family samples, whereas LF-KM retained the desired Type I error rate and had the best power performance overall. Finally, we illustrate the utility of our proposed LF-KM approach by analyzing data from an association study between rare variants and blood pressure from the Genetic Analysis Workshop 18 (GAW18). CONCLUSION We propose a method for rare-variant association testing in population and family samples using phenotypes measured at multiple time points for each subject. The proposed method has the best power performance compared to competing approaches in our simulation study.
Collapse
Affiliation(s)
- Qi Yan
- Division of Pulmonary Medicine, Allergy and Immunology, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, Pittsburgh, Pa., USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Datta AS, Biswas S. Comparison of haplotype-based statistical tests for disease association with rare and common variants. Brief Bioinform 2015; 17:657-71. [PMID: 26338417 DOI: 10.1093/bib/bbv072] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Indexed: 01/26/2023] Open
Abstract
Recent literature has highlighted the advantages of haplotype association methods for detecting rare variants associated with common diseases. As several new haplotype association methods have been proposed in the past few years, a comparison of new and standard methods is important and timely for guidance to the practitioners. We consider nine methods-Haplo.score, Haplo.glm, Hapassoc, Bayesian hierarchical Generalized Linear Model (BhGLM), Logistic Bayesian LASSO (LBL), regularized GLM (rGLM), Haplotype Kernel Association Test, wei-SIMc-matching and Weighted Haplotype and Imputation-based Tests. These can be divided into two types-individual haplotype-specific tests and global tests depending on whether there is just one overall test for a haplotype region (global) or there is an individual test for each haplotype in the region. Haplo.score is the only method that tests for both; Haplo.glm, Hapassoc, BhGLM and LBL are individual haplotype-specific, while the rest are global tests. For comparison, we also apply a popular collapsing method-Sequence Kernel Association Test (SKAT) and its two variants-SKAT-O (Optimal) and SKAT-C (Combined). We carry out an extensive comparison on our simulated data sets as well as on the Genetic Analysis Workshop (GAW) 18 simulated data. Further, we apply the methods to GAW18 real hypertension data and Dallas Heart Study sequence data. We find that LBL, Haplo.score (global test) and rGLM perform well over the scenarios considered here. Also, haplotype methods are more powerful (albeit more computationally intensive) than SKAT and its variants in scenarios where multiple causal variants act interactively to produce haplotype effects.
Collapse
|
5
|
Yan Q, Tiwari HK, Yi N, Gao G, Zhang K, Lin WY, Lou XY, Cui X, Liu N. A Sequence Kernel Association Test for Dichotomous Traits in Family Samples under a Generalized Linear Mixed Model. Hum Hered 2015; 79:60-8. [PMID: 25791389 DOI: 10.1159/000375409] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2014] [Accepted: 01/21/2015] [Indexed: 01/15/2023] Open
Abstract
OBJECTIVE The existing methods for identifying multiple rare variants underlying complex diseases in family samples are underpowered. Therefore, we aim to develop a new set-based method for an association study of dichotomous traits in family samples. METHODS We introduce a framework for testing the association of genetic variants with diseases in family samples based on a generalized linear mixed model. Our proposed method is based on a kernel machine regression and can be viewed as an extension of the sequence kernel association test (SKAT and famSKAT) for application to family data with dichotomous traits (F-SKAT). RESULTS Our simulation studies show that the original SKAT has inflated type I error rates when applied directly to family data. By contrast, our proposed F-SKAT has the correct type I error rate. Furthermore, in all of the considered scenarios, F-SKAT, which uses all family data, has higher power than both SKAT, which uses only unrelated individuals from the family data, and another method, which uses all family data. CONCLUSION We propose a set-based association test that can be used to analyze family data with dichotomous phenotypes while handling genetic variants with the same or opposite directions of effects as well as any types of family relationships.
Collapse
Affiliation(s)
- Qi Yan
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Ala., USA
| | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Paterson AD. Drinking from the Holy Grail: analysis of whole-genome sequencing from the Genetic Analysis Workshop 18. Genet Epidemiol 2014; 38 Suppl 1:S1-4. [PMID: 25112182 DOI: 10.1002/gepi.21818] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
The Genetic Analysis Workshops distribute real and simulated human genetic data to allow the development and comparison of methods to detect genetic variants and genes related to biological traits; the results are then presented and discussed at a biennial meeting. The data made available for Genetic Analysis Workshop 18 (GAW18) included whole-genome sequence data for odd-numbered autosomes from 20 large Mexican American pedigrees selected through probands with type 2 diabetes. Real and simulated blood pressure phenotype data were provided to allow the comparison of methods to detect variants and genes associated with blood pressure. Some of the complexity present in the data includes related individuals, repeated quantitative trait outcomes, covariates, medication effects, pharmacokinetic effects, missing data, admixed population, and imputed genotypes. A wide range of analytic approaches were applied to the data. Contributions that focused only on a subset of up to 155 unrelated subjects from the pedigrees were faced with low power. One recommendation for future analysis is the use of the provided null phenotype to allow comparison of type I error across methods. Collaboration between statistical geneticists and molecular biologists or bioinformaticians would provide helpful input to place variants in genes for gene-based association tests.
Collapse
Affiliation(s)
- Andrew D Paterson
- Genetics and Genome Biology Program, The Hospital for Sick Children Research Institute, Toronto, Ontario, Canada; Divisions of Epidemiology and Biostatistics, Dalla Lana School of Public Health, Department of Psychiatry, Institute of Medical Sciences, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|