1
|
Juvinao-Quintero DL, Cardenas A, Perron P, Bouchard L, Lutz SM, Hivert MF. Associations between an integrated component of maternal glycemic regulation in pregnancy and cord blood DNA methylation. Epigenomics 2021; 13:1459-1472. [PMID: 34596421 DOI: 10.2217/epi-2021-0220] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Background: Previous studies suggest that fetal programming to hyperglycemia in pregnancy is due to modulation of DNA methylation (DNAm), but they have been limited in their maternal glycemic characterization. Methods: In the Gen3G study, we used a principal component analysis to integrate multiple glucose and insulin values measured during the second trimester oral glucose tolerance test. We investigated associations between principal components and cord blood DNAm levels in an epigenome-wide analysis among 430 mother-child pairs. Results: The first principal component was robustly associated with lower DNAm at cg26974062 (TXNIP; p = 9.9 × 10-9) in cord blood. TXNIP is a well-known DNAm marker for type 2 diabetes in adults. Conclusion: We hypothesize that abnormal glucose metabolism in pregnancy may program dysregulation of TXNIP across the life course.
Collapse
Affiliation(s)
- Diana L Juvinao-Quintero
- Division of Chronic Disease Research Across the Life Course, Department of Population Medicine, Harvard Pilgrim Health Care Institute, Harvard Medical School, Boston, MA 02215, USA
| | - Andres Cardenas
- Division of Environmental Health Sciences, School of Public Health & Center for Computational Biology, University of California, Berkeley, CA 94720-7360, USA
| | - Patrice Perron
- Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, QC, J1H 5N4, Canada.,Department of Medicine, Université de Sherbrooke, Sherbrooke, QC, J1H 5N4, Canada
| | - Luigi Bouchard
- Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, QC, J1H 5N4, Canada.,Department of Medical Biology, Centre Intégré Universitaire en Santé et Services Sociaux Saguenay-Lac-Saint-Jean, Hôpital Universitaire de Chicoutimi, Saguenay, QC, G7H 5H6, Canada.,Department of Biochemistry & Functional Genomics, Université de Sherbrooke, Sherbrooke, QC, J1K 2R1, Canada
| | - Sharon M Lutz
- Division of Chronic Disease Research Across the Life Course, Department of Population Medicine, Harvard Pilgrim Health Care Institute, Harvard Medical School, Boston, MA 02215, USA.,Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA 02215, USA
| | - Marie-France Hivert
- Division of Chronic Disease Research Across the Life Course, Department of Population Medicine, Harvard Pilgrim Health Care Institute, Harvard Medical School, Boston, MA 02215, USA.,Department of Medicine, Université de Sherbrooke, Sherbrooke, QC, J1H 5N4, Canada.,Diabetes Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| |
Collapse
|
2
|
Davenport CA, Maity A, Sullivan PF, Tzeng JY. A Powerful Test for SNP Effects on Multivariate Binary Outcomes using Kernel Machine Regression. STATISTICS IN BIOSCIENCES 2018; 10:117-138. [PMID: 30420901 PMCID: PMC6226013 DOI: 10.1007/s12561-017-9189-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2016] [Revised: 12/20/2016] [Accepted: 03/15/2017] [Indexed: 10/19/2022]
Abstract
Evaluating multiple binary outcomes is common in genetic studies of complex diseases. These outcomes are often correlated because they are collected from the same individual and they may share common marker effects. In this paper, we propose a procedure to test for effect of a SNP-set on multiple, possibly correlated, binary responses. We develop a score-based test using a nonparametric modeling framework that jointly models the global effect of the marker set. We account for the nonlinear effects and potentially complicated interaction between markers using reproducing kernels. Our testing procedure only requires estimation under the null hypothesis and we use multivariate generalized estimating equations (GEEs) to estimate the model components to account for the correlation among the outcomes. We evaluate finite sample performance of our test via simulation study and demonstrated our methods using the CATIE antibody study data and the CoLaus Study data.
Collapse
Affiliation(s)
- Clemontina A Davenport
- Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, NC 27707, USA
| | - Arnab Maity
- Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
| | - Patrick F Sullivan
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jung-Ying Tzeng
- Department of Statistics, Bioinformatics Research Center, North Carolina State University, Raleigh, NC 27695, USA. Department of Statistics, National Cheng-Kung University, Tainan, Taiwan Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
3
|
Ayati M, Koyutürk M. PoCos: Population Covering Locus Sets for Risk Assessment in Complex Diseases. PLoS Comput Biol 2016; 12:e1005195. [PMID: 27835645 PMCID: PMC5105987 DOI: 10.1371/journal.pcbi.1005195] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2016] [Accepted: 10/11/2016] [Indexed: 12/17/2022] Open
Abstract
Susceptibility loci identified by GWAS generally account for a limited fraction of heritability. Predictive models based on identified loci also have modest success in risk assessment and therefore are of limited practical use. Many methods have been developed to overcome these limitations by incorporating prior biological knowledge. However, most of the information utilized by these methods is at the level of genes, limiting analyses to variants that are in or proximate to coding regions. We propose a new method that integrates protein protein interaction (PPI) as well as expression quantitative trait loci (eQTL) data to identify sets of functionally related loci that are collectively associated with a trait of interest. We call such sets of loci “population covering locus sets” (PoCos). The contributions of the proposed approach are three-fold: 1) We consider all possible genotype models for each locus, thereby enabling identification of combinatorial relationships between multiple loci. 2) We develop a framework for the integration of PPI and eQTL into a heterogenous network model, enabling efficient identification of functionally related variants that are associated with the disease. 3) We develop a novel method to integrate the genotypes of multiple loci in a PoCo into a representative genotype to be used in risk assessment. We test the proposed framework in the context of risk assessment for seven complex diseases, type 1 diabetes (T1D), type 2 diabetes (T2D), psoriasis (PS), bipolar disorder (BD), coronary artery disease (CAD), hypertension (HT), and multiple sclerosis (MS). Our results show that the proposed method significantly outperforms individual variant based risk assessment models as well as the state-of-the-art polygenic score. We also show that incorporation of eQTL data improves the performance of identified POCOs in risk assessment. We also assess the biological relevance of PoCos for three diseases that have similar biological mechanisms and identify novel candidate genes. The resulting software is publicly available at http://compbio.case.edu/pocos/. Several studies try to predict the individual disease risk using genetic data obtained from genome wide association studies (GWAS). Earlier studies only focus on individual genetic variants. However, studies on disease mechanisms suggest the aggregation of genomic variants may contribute to diseases. For this reason, researchers commonly use prior biological knowledge to identify genetic variants that are functionally related. However, these approaches are often limited to variants that are in the coding regions of genes. However, several risk variants are in the regulatory region. Here, we incorporate known regulatory and functional interactions to find sets of genetic variants which are informative features for risk assessment. Our result on seven complex diseases show that our method outperforms individual variant based risk assessment models, as well as other methods that integrate multiple genetic variants.
Collapse
Affiliation(s)
- Marzieh Ayati
- Electrical Engineering and Computer Science Department, Case Western Reserve University, Cleveland, Ohio, United States of America
- * E-mail:
| | - Mehmet Koyutürk
- Electrical Engineering and Computer Science Department, Case Western Reserve University, Cleveland, Ohio, United States of America
- Center of Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, Ohio, United States of America
| |
Collapse
|
4
|
Lin X, Barton S, Holbrook JD. How to make DNA methylome wide association studies more powerful. Epigenomics 2016; 8:1117-29. [PMID: 27052998 PMCID: PMC5066141 DOI: 10.2217/epi-2016-0017] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2016] [Accepted: 03/23/2016] [Indexed: 02/06/2023] Open
Abstract
Genome-wide association studies had a troublesome adolescence, while researchers increased statistical power, in part by increasing subject numbers. Interrogating the interaction of genetic and environmental influences raised new challenges of statistical power, which were not easily bested by the addition of subjects. Screening the DNA methylome offers an attractive alternative as methylation can be thought of as a proxy for the combined influences of genetics and environment. There are statistical challenges unique to DNA methylome data and also multiple features, which can be exploited to increase power. We anticipate the development of DNA methylome association study designs and new analytical methods, together with integration of data from other molecular species and other studies, which will boost statistical power and tackle causality. In this way, the molecular trajectories that underlie disease development will be uncovered.
Collapse
Affiliation(s)
- Xinyi Lin
- Singapore Institute for Clinical Sciences (SICS), Agency for Science & Technology Research (A*STAR), Brenner Centre for Molecular Medicine, 30 Medical Drive, 117609, Singapore
| | - Sheila Barton
- MRC Lifecourse Epidemiology Unit, Faculty of Medicine, University of Southampton, Southampton, SO16 6YD, UK
| | - Joanna D Holbrook
- Singapore Institute for Clinical Sciences (SICS), Agency for Science & Technology Research (A*STAR), Brenner Centre for Molecular Medicine, 30 Medical Drive, 117609, Singapore
| |
Collapse
|
5
|
Zhang Q, Zhao Y, Zhang R, Wei Y, Yi H, Shao F, Chen F. A Comparative Study of Five Association Tests Based on CpG Set for Epigenome-Wide Association Studies. PLoS One 2016; 11:e0156895. [PMID: 27258058 PMCID: PMC4892473 DOI: 10.1371/journal.pone.0156895] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2016] [Accepted: 05/20/2016] [Indexed: 11/19/2022] Open
Abstract
An epigenome-wide association study (EWAS) is a large-scale study of human disease-associated epigenetic variation, specifically variation in DNA methylation. High throughput technologies enable simultaneous epigenetic profiling of DNA methylation at hundreds of thousands of CpGs across the genome. The clustering of correlated DNA methylation at CpGs is reportedly similar to that of linkage-disequilibrium (LD) correlation in genetic single nucleotide polymorphisms (SNP) variation. However, current analysis methods, such as the t-test and rank-sum test, may be underpowered to detect differentially methylated markers. We propose to test the association between the outcome (e.g case or control) and a set of CpG sites jointly. Here, we compared the performance of five CpG set analysis approaches: principal component analysis (PCA), supervised principal component analysis (SPCA), kernel principal component analysis (KPCA), sequence kernel association test (SKAT), and sliced inverse regression (SIR) with Hotelling's T2 test and t-test using Bonferroni correction. The simulation results revealed that the first six methods can control the type I error at the significance level, while the t-test is conservative. SPCA and SKAT performed better than other approaches when the correlation among CpG sites was strong. For illustration, these methods were also applied to a real methylation dataset.
Collapse
Affiliation(s)
- Qiuyi Zhang
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China, 211166
| | - Yang Zhao
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China, 211166
| | - Ruyang Zhang
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China, 211166
| | - Yongyue Wei
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China, 211166
| | - Honggang Yi
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China, 211166
| | - Fang Shao
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China, 211166
| | - Feng Chen
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China, 211166
| |
Collapse
|
6
|
Yi H, Wo H, Zhao Y, Zhang R, Dai J, Jin G, Ma H, Wu T, Hu Z, Lin D, Shen H, Chen F. Comparison of dimension reduction-based logistic regression models for case-control genome-wide association study: principal components analysis vs. partial least squares. J Biomed Res 2015; 29:298-307. [PMID: 26243516 PMCID: PMC4547378 DOI: 10.7555/jbr.29.20140043] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2014] [Revised: 09/29/2014] [Accepted: 01/15/2015] [Indexed: 12/18/2022] Open
Abstract
With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the performance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.
Collapse
Affiliation(s)
- Honggang Yi
- Department of Epidemiology and Biostatistics, School of Public Health
| | - Hongmei Wo
- Department of Public Service Management, School of KangDa
| | - Yang Zhao
- Department of Epidemiology and Biostatistics, School of Public Health
| | - Ruyang Zhang
- Department of Epidemiology and Biostatistics, School of Public Health
| | - Junchen Dai
- Department of Epidemiology and Biostatistics, School of Public Health
| | - Guangfu Jin
- Department of Epidemiology and Biostatistics, School of Public Health.,Section of Clinical Epidemiology, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Cancer Center
| | - Hongxia Ma
- Department of Epidemiology and Biostatistics, School of Public Health.,Section of Clinical Epidemiology, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Cancer Center
| | - Tangchun Wu
- Institute of Occupational Medicine and Ministry of Education, Key Laboratory for Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Zhibin Hu
- Department of Epidemiology and Biostatistics, School of Public Health.,Section of Clinical Epidemiology, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Cancer Center.,State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Dongxin Lin
- State Key Laboratory of Molecular Oncology and Department of Etiology and Carcinogenesis, Cancer Institute and Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, China
| | - Hongbing Shen
- Department of Epidemiology and Biostatistics, School of Public Health.,Section of Clinical Epidemiology, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Cancer Center.,State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Feng Chen
- Department of Epidemiology and Biostatistics, School of Public Health.
| |
Collapse
|
7
|
Zeng P, Zhao Y, Qian C, Zhang L, Zhang R, Gou J, Liu J, Liu L, Chen F. Statistical analysis for genome-wide association study. J Biomed Res 2014; 29:285-97. [PMID: 26243515 PMCID: PMC4547377 DOI: 10.7555/jbr.29.20140007] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Revised: 06/07/2014] [Accepted: 09/27/2014] [Indexed: 12/19/2022] Open
Abstract
In the past few years, genome-wide association study (GWAS) has made great successes in identifying genetic susceptibility loci underlying many complex diseases and traits. The findings provide important genetic insights into understanding pathogenesis of diseases. In this paper, we present an overview of widely used approaches and strategies for analysis of GWAS, offered a general consideration to deal with GWAS data. The issues regarding data quality control, population structure, association analysis, multiple comparison and visual presentation of GWAS results are discussed; other advanced topics including the issue of missing heritability, meta-analysis, set-based association analysis, copy number variation analysis and GWAS cohort analysis are also briefly introduced.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China.,Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical College, Xuzhou, Jiangsu 221004, China
| | - Yang Zhao
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Cheng Qian
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Liwei Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Ruyang Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Jianwei Gou
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Jin Liu
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Liya Liu
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Feng Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China.
| |
Collapse
|
8
|
Dai H, Zhao Y, Qian C, Cai M, Zhang R, Chu M, Dai J, Hu Z, Shen H, Chen F. Weighted SNP set analysis in genome-wide association study. PLoS One 2013; 8:e75897. [PMID: 24098741 PMCID: PMC3786949 DOI: 10.1371/journal.pone.0075897] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2013] [Accepted: 08/19/2013] [Indexed: 11/18/2022] Open
Abstract
Genome-wide association studies (GWAS) are popular for identifying genetic variants which are associated with disease risk. Many approaches have been proposed to test multiple single nucleotide polymorphisms (SNPs) in a region simultaneously which considering disadvantages of methods in single locus association analysis. Kernel machine based SNP set analysis is more powerful than single locus analysis, which borrows information from SNPs correlated with causal or tag SNPs. Four types of kernel machine functions and principal component based approach (PCA) were also compared. However, given the loss of power caused by low minor allele frequencies (MAF), we conducted an extension work on PCA and used a new method called weighted PCA (wPCA). Comparative analysis was performed for weighted principal component analysis (wPCA), logistic kernel machine based test (LKM) and principal component analysis (PCA) based on SNP set in the case of different minor allele frequencies (MAF) and linkage disequilibrium (LD) structures. We also applied the three methods to analyze two SNP sets extracted from a real GWAS dataset of non-small cell lung cancer in Han Chinese population. Simulation results show that when the MAF of the causal SNP is low, weighted principal component and weighted IBS are more powerful than PCA and other kernel machine functions at different LD structures and different numbers of causal SNPs. Application of the three methods to a real GWAS dataset indicates that wPCA and wIBS have better performance than the linear kernel, IBS kernel and PCA.
Collapse
Affiliation(s)
- Hui Dai
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Yang Zhao
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Cheng Qian
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Min Cai
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Ruyang Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Minjie Chu
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Juncheng Dai
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Zhibin Hu
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China
- Section of Clinical Epidemiology, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Cancer Center, Nanjing Medical University, Nanjing, China
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, China
| | - Hongbing Shen
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China
- Section of Clinical Epidemiology, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Cancer Center, Nanjing Medical University, Nanjing, China
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, China
| | - Feng Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China
- * E-mail:
| |
Collapse
|
9
|
SNP set association analysis for genome-wide association studies. PLoS One 2013; 8:e62495. [PMID: 23658731 PMCID: PMC3643925 DOI: 10.1371/journal.pone.0062495] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2012] [Accepted: 03/22/2013] [Indexed: 11/29/2022] Open
Abstract
Genome-wide association study (GWAS) is a promising approach for identifying common genetic variants of the diseases on the basis of millions of single nucleotide polymorphisms (SNPs). In order to avoid low power caused by overmuch correction for multiple comparisons in single locus association study, some methods have been proposed by grouping SNPs together into a SNP set based on genomic features, then testing the joint effect of the SNP set. We compare the performances of principal component analysis (PCA), supervised principal component analysis (SPCA), kernel principal component analysis (KPCA), and sliced inverse regression (SIR). Simulated SNP sets are generated under scenarios of 0, 1 and ≥2 causal SNPs model. Our simulation results show that all of these methods can control the type I error at the nominal significance level. SPCA is always more powerful than the other methods at different settings of linkage disequilibrium structures and minor allele frequency of the simulated datasets. We also apply these four methods to a real GWAS of non-small cell lung cancer (NSCLC) in Han Chinese population
Collapse
|
10
|
Wu MC, Maity A, Lee S, Simmons EM, Harmon QE, Lin X, Engel SM, Molldrem JJ, Armistead PM. Kernel machine SNP-set testing under multiple candidate kernels. Genet Epidemiol 2013; 37:267-75. [PMID: 23471868 PMCID: PMC3769109 DOI: 10.1002/gepi.21715] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2012] [Revised: 01/15/2013] [Accepted: 02/05/2013] [Indexed: 11/10/2022]
Abstract
Joint testing for the cumulative effect of multiple single-nucleotide polymorphisms grouped on the basis of prior biological knowledge has become a popular and powerful strategy for the analysis of large-scale genetic association studies. The kernel machine (KM)-testing framework is a useful approach that has been proposed for testing associations between multiple genetic variants and many different types of complex traits by comparing pairwise similarity in phenotype between subjects to pairwise similarity in genotype, with similarity in genotype defined via a kernel function. An advantage of the KM framework is its flexibility: choosing different kernel functions allows for different assumptions concerning the underlying model and can allow for improved power. In practice, it is difficult to know which kernel to use a priori because this depends on the unknown underlying trait architecture and selecting the kernel which gives the lowest P-value can lead to inflated type I error. Therefore, we propose practical strategies for KM testing when multiple candidate kernels are present based on constructing composite kernels and based on efficient perturbation procedures. We demonstrate through simulations and real data applications that the procedures protect the type I error rate and can lead to substantially improved power over poor choices of kernels and only modest differences in power vs. using the best candidate kernel.
Collapse
Affiliation(s)
- Michael C Wu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7420, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|