1
|
Shibata M, Terada A, Kawaguchi T, Kamatani Y, Okada D, Nagashima K, Ohmura K, Matsuda F, Kawaguchi S, Sese J, Yamada R. Identification of epistatic SNP combinations in rheumatoid arthritis using LAMPLINK and Japanese cohorts. J Hum Genet 2024; 69:541-547. [PMID: 39014190 DOI: 10.1038/s10038-024-01269-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 06/16/2024] [Accepted: 06/20/2024] [Indexed: 07/18/2024]
Abstract
Genome-wide association studies have enabled the identification of important genetic factors in many trait studies. However, only a fraction of the heritability can be explained by known genetic factors, even in the most common diseases. Genetic loci combinations, or epistatic contributions expressed by combinations of single nucleotide polymorphisms (SNPs), have been argued to be one of the critical factors explaining some of the missing heritability, especially in oligogenic/polygenic diseases. Rheumatoid arthritis (RA) is a complex disease with more than 100 reported SNP associations, as well as various HLA haplotypes and amino acids; however, many associations between RA and inter-chromosomal SNP combinations are unknown. To discover novel associations of epistatic interactions with high odds ratios in RA, we applied the LAMPLINK method, a systematic enumerative procedure for identifying high-order SNP combinations, to a Japanese RA cohort (discovery cohort; 4024 patients with RA and 7731 controls). We validated the identified associations in a different Japanese cohort (validation cohort; 810 RA patients and 6303 controls). In this study, we identified 90 significant genetic associations in the discovery cohort. Among these, 74 (82.2%) associations were replicated in the validation cohort, and eight combinations were inter-chromosomal, all of which comprised rs7765379 or rs35265698 located in the HLA region. These two SNPs exhibited strong correlations with valine at amino acid position 11 in HLA-DRB1 (HLA-DRB1-11-Val). Finally, we discovered that rs9624 showed an association with RA through an epistatic interaction with HLA-DRB1-11-Val. Overall, LAMPLINK showed high reliability for identifying epistatic genetic contributions hidden in complex traits.
Collapse
Affiliation(s)
- Mio Shibata
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
- Kyoto-McGill International Collaborative School in Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | | | - Takahisa Kawaguchi
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Yoichiro Kamatani
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Daigo Okada
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Kazuhisa Nagashima
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Koichiro Ohmura
- Department of Rheumatology and Clinical Immunology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Fumihiko Matsuda
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
- Kyoto-McGill International Collaborative School in Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Shuji Kawaguchi
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan.
- Kyoto-McGill International Collaborative School in Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan.
| | - Jun Sese
- Humanome Lab. Inc., Tokyo, Japan.
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan.
| | - Ryo Yamada
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| |
Collapse
|
2
|
Associations of HSP90AA2 gene polymorphisms with disease susceptibility, glucocorticoids efficacy and health-related quality of life in Chinese systemic lupus erythematosus patients. Genes Genomics 2018; 40:1069-1079. [PMID: 29907909 DOI: 10.1007/s13258-018-0714-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 06/07/2018] [Indexed: 01/09/2023]
Abstract
Although the current glucocorticoids (GCs) treatment for systemic lupus erythematosus (SLE) is effective to a certain extent, the difference in therapeutic effect between patients is still a widespread problem. Some patients can have repeated attacks that greatly diminish their quality of life. This study was conducted to investigate the relationship between HSP90AA2 polymorphisms and disease susceptibility, GCs efficacy and health-related quality of life (HRQoL) in Chinese SLE patients. A case-control study was performed in 470 SLE patients and 470 normal controls. Then, 444 patients in the case group were followed up for 12 weeks to observe efficacy of GCs and improvement of HRQoL. Two single nucleotide polymorphisms (SNPs) of HSP90AA2 were selected for genotyping: rs1826330 and rs6484340. HRQoL was assessed using the SF-36 questionnaire. The minor T allele of rs1826330 and the TT haplotype formed by rs1826330 and rs6484340 showed associations with decreased SLE risk (T allele: PBH = 0.022; TT haplotype: PBH = 0.033). A significant association between rs6484340 and improvement of HRQoL was revealed in the follow-up study. Five subscales of SF-36 were appeared to be influenced by rs6484340: total score of SF-36 (additive model: PBH = 0.026), physical function (additive model: PBH = 0.026), role-physical (recessive model: PBH = 0.041), mental health (dominant model: PBH = 0.047), and physical component summary (additive model: PBH = 0.026). No statistical significance was found between HSP90AA2 gene polymorphisms and GCs efficacy. These results revealed a genetic association between HSP90AA2 and SLE. Remarkably, HSP90AA2 has an impact on the improvement of HRQoL in Chinese population with SLE.
Collapse
|
3
|
Yuan Z, Zhang X, Li F, Zhao J, Xue F. Comparing partial least square approaches in a gene- or region-based association study for multiple quantitative phenotypes. Hum Biol 2014; 86:51-8. [PMID: 25401986 DOI: 10.3378/027.086.0106] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/31/2013] [Indexed: 11/05/2022]
Abstract
On thinking quantitatively of complex diseases, there are at least three statistical strategies for association studies: one single-nucleotide polymorphism (SNP) on a single trait, gene or region (with multiple SNPs) on a single trait, and gene or region on multiple traits. The third approach is the most general in dissecting genetic mechanisms underlying complex diseases underpinning multiple quantitative traits. Gene or region association methods based on partial least square (PLS) approaches have been shown to have apparent power advantage. However, few approaches have been developed for multiple quantitative phenotypes or traits underlying a condition or disease, and the performance of various PLS approaches used in association studies for multiple quantitative traits have not been assessed. Here we exploit association between multiple SNPs and multiple phenotypes or traits, from a regression perspective, through exhaustive scan statistics (sliding window) using PLS and sparse PLS regressions. Simulations were conducted to assess the performance of the proposed scan statistics and compare them with existing methods. The proposed methods were applied to 12 regions of genome-wide association study data from the European Prospective Investigation of Cancer-Norfolk study.
Collapse
Affiliation(s)
- Zhongshang Yuan
- Department of Epidemiology and Biostatistics, School of Public Health, Shandong University, Shandong, China
| | - Xiaoshuai Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Shandong University, Shandong, China
| | - Fangyu Li
- Department of Epidemiology and Biostatistics, School of Public Health, Shandong University, Shandong, China
| | - Jinghua Zhao
- MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK
| | - Fuzhong Xue
- Department of Epidemiology and Biostatistics, School of Public Health, Shandong University, Shandong, China
| |
Collapse
|
4
|
Fan R, Lo SH. A robust model-free approach for rare variants association studies incorporating gene-gene and gene-environmental interactions. PLoS One 2013; 8:e83057. [PMID: 24358248 PMCID: PMC3866272 DOI: 10.1371/journal.pone.0083057] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2013] [Accepted: 10/30/2013] [Indexed: 11/19/2022] Open
Abstract
Recently more and more evidence suggest that rare variants with much lower minor allele frequencies play significant roles in disease etiology. Advances in next-generation sequencing technologies will lead to many more rare variants association studies. Several statistical methods have been proposed to assess the effect of rare variants by aggregating information from multiple loci across a genetic region and testing the association between the phenotype and aggregated genotype. One limitation of existing methods is that they only look into the marginal effects of rare variants but do not systematically take into account effects due to interactions among rare variants and between rare variants and environmental factors. In this article, we propose the summation of partition approach (SPA), a robust model-free method that is designed specifically for detecting both marginal effects and effects due to gene-gene (G×G) and gene-environmental (G×E) interactions for rare variants association studies. SPA has three advantages. First, it accounts for the interaction information and gains considerable power in the presence of unknown and complicated G×G or G×E interactions. Secondly, it does not sacrifice the marginal detection power; in the situation when rare variants only have marginal effects it is comparable with the most competitive method in current literature. Thirdly, it is easy to extend and can incorporate more complex interactions; other practitioners and scientists can tailor the procedure to fit their own study friendly. Our simulation studies show that SPA is considerably more powerful than many existing methods in the presence of G×G and G×E interactions.
Collapse
Affiliation(s)
- Ruixue Fan
- Department of Statistics, Columbia University, New York, New York, United States of America
| | - Shaw-Hwa Lo
- Department of Statistics, Columbia University, New York, New York, United States of America
- * E-mail: (SHL)
| |
Collapse
|
5
|
Summarizing techniques that combine three non-parametric scores to detect disease-associated 2-way SNP-SNP interactions. Gene 2013; 533:304-12. [PMID: 24076437 DOI: 10.1016/j.gene.2013.09.041] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2013] [Revised: 08/30/2013] [Accepted: 09/09/2013] [Indexed: 10/26/2022]
Abstract
Identifying susceptibility genes that influence complex diseases is extremely difficult because loci often influence the disease state through genetic interactions. Numerous approaches to detect disease-associated SNP-SNP interactions have been developed, but none consistently generates high-quality results under different disease scenarios. Using summarizing techniques to combine a number of existing methods may provide a solution to this problem. Here we used three popular non-parametric methods-Gini, absolute probability difference (APD), and entropy-to develop two novel summary scores, namely principle component score (PCS) and Z-sum score (ZSS), with which to predict disease-associated genetic interactions. We used a simulation study to compare performance of the non-parametric scores, the summary scores, the scaled-sum score (SSS; used in polymorphism interaction analysis (PIA)), and the multifactor dimensionality reduction (MDR). The non-parametric methods achieved high power, but no non-parametric method outperformed all others under a variety of epistatic scenarios. PCS and ZSS, however, outperformed MDR. PCS, ZSS and SSS displayed controlled type-I-errors (<0.05) compared to GS, APDS, ES (>0.05). A real data study using the genetic-analysis-workshop 16 (GAW 16) rheumatoid arthritis dataset identified a number of interesting SNP-SNP interactions.
Collapse
|
6
|
SNP set association analysis for genome-wide association studies. PLoS One 2013; 8:e62495. [PMID: 23658731 PMCID: PMC3643925 DOI: 10.1371/journal.pone.0062495] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2012] [Accepted: 03/22/2013] [Indexed: 11/29/2022] Open
Abstract
Genome-wide association study (GWAS) is a promising approach for identifying common genetic variants of the diseases on the basis of millions of single nucleotide polymorphisms (SNPs). In order to avoid low power caused by overmuch correction for multiple comparisons in single locus association study, some methods have been proposed by grouping SNPs together into a SNP set based on genomic features, then testing the joint effect of the SNP set. We compare the performances of principal component analysis (PCA), supervised principal component analysis (SPCA), kernel principal component analysis (KPCA), and sliced inverse regression (SIR). Simulated SNP sets are generated under scenarios of 0, 1 and ≥2 causal SNPs model. Our simulation results show that all of these methods can control the type I error at the nominal significance level. SPCA is always more powerful than the other methods at different settings of linkage disequilibrium structures and minor allele frequency of the simulated datasets. We also apply these four methods to a real GWAS of non-small cell lung cancer (NSCLC) in Han Chinese population
Collapse
|
7
|
Wang KS, Liu X, Zheng S, Zeng M, Pan Y, Callahan K. A novel locus for body mass index on 5p15.2: A meta-analysis of two genome-wide association studies. Gene 2012; 500:80-4. [DOI: 10.1016/j.gene.2012.03.046] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2011] [Accepted: 03/08/2012] [Indexed: 12/13/2022]
|
8
|
Gao Q, He Y, Yuan Z, Zhao J, Zhang B, Xue F. Gene- or region-based association study via kernel principal component analysis. BMC Genet 2011; 12:75. [PMID: 21871061 PMCID: PMC3176196 DOI: 10.1186/1471-2156-12-75] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2011] [Accepted: 08/26/2011] [Indexed: 11/12/2022] Open
Abstract
Background In genetic association study, especially in GWAS, gene- or region-based methods have been more popular to detect the association between multiple SNPs and diseases (or traits). Kernel principal component analysis combined with logistic regression test (KPCA-LRT) has been successfully used in classifying gene expression data. Nevertheless, the purpose of association study is to detect the correlation between genetic variations and disease rather than to classify the sample, and the genomic data is categorical rather than numerical. Recently, although the kernel-based logistic regression model in association study has been proposed by projecting the nonlinear original SNPs data into a linear feature space, it is still impacted by multicolinearity between the projections, which may lead to loss of power. We, therefore, proposed a KPCA-LRT model to avoid the multicolinearity. Results Simulation results showed that KPCA-LRT was always more powerful than principal component analysis combined with logistic regression test (PCA-LRT) at different sample sizes, different significant levels and different relative risks, especially at the genewide level (1E-5) and lower relative risks (RR = 1.2, 1.3). Application to the four gene regions of rheumatoid arthritis (RA) data from Genetic Analysis Workshop16 (GAW16) indicated that KPCA-LRT had better performance than single-locus test and PCA-LRT. Conclusions KPCA-LRT is a valid and powerful gene- or region-based method for the analysis of GWAS data set, especially under lower relative risks and lower significant levels.
Collapse
Affiliation(s)
- Qingsong Gao
- Department of Epidemiology and Health Statistics, School of Public Health, Shandong University, Jinan 250012, China
| | | | | | | | | | | |
Collapse
|
9
|
Beyene J, Tritchler D, Asimit JL, Hamid JS. Gene- or region-based analysis of genome-wide association studies. Genet Epidemiol 2010; 33 Suppl 1:S105-10. [PMID: 19924708 DOI: 10.1002/gepi.20481] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
With rapid advances in genotyping technologies in recent years and the growing number of available markers, genome-wide association studies are emerging as promising approaches for the study of complex diseases and traits. However, there are several challenges with analysis and interpretation of such data. First, there is a massive multiple testing problem, due to the large number of markers that need to be analyzed, leading to an increased risk of false positives and decreased ability for association studies to detect truly associated markers. In particular, the ability to detect modest genetic effects can be severely compromised. Second, a genetic association of a given single-nucleotide polymorphism as determined by univariate statistical analyses does not typically explain biologically interesting features, and often requires subsequent interpretation using a higher unit, such as a gene or region, for example, as defined by haplotype blocks. Third, missing genotypes in the data set and other data quality issues can pose challenges when comparisons across platforms and replications are planned. Finally, depending on the type of univariate analysis, computational burden can arise as the number of markers continues to grow into the millions. One way to deal with these and related challenges is to consider higher units for the analysis, such as genes or regions. This article summarizes analytical methods and strategies that have been proposed and applied by Group 16 to two genome-wide association data sets made available through the Genetic Analysis Workshop 16.
Collapse
Affiliation(s)
- Joseph Beyene
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada.
| | | | | | | |
Collapse
|