Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Amos CI, Chen WV, Seldin MF, Remmers EF, Taylor KE, Criswell LA, Lee AT, Plenge RM, Kastner DL, Gregersen PK. Data for Genetic Analysis Workshop 16 Problem 1, association analysis of rheumatoid arthritis data. BMC Proc 2009;3 Suppl 7:S2. [PMID: 20018009 PMCID: PMC2795916 DOI: 10.1186/1753-6561-3-s7-s2] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open

For:	Amos CI, Chen WV, Seldin MF, Remmers EF, Taylor KE, Criswell LA, Lee AT, Plenge RM, Kastner DL, Gregersen PK. Data for Genetic Analysis Workshop 16 Problem 1, association analysis of rheumatoid arthritis data. BMC Proc 2009;3 Suppl 7:S2. [PMID: 20018009 PMCID: PMC2795916 DOI: 10.1186/1753-6561-3-s7-s2] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open

Number

Cited by Other Article(s)

Hu X, Meng Z. Using potential variable to study gene-gene and gene-environment interaction effects with genetic model uncertainty. Ann Hum Genet 2022;86:257-267. [PMID: 35582845 DOI: 10.1111/ahg.12470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 03/02/2022] [Accepted: 04/08/2022] [Indexed: 11/28/2022]

Gola D, König IR. Empowering individual trait prediction using interactions for precision medicine. BMC Bioinformatics 2021;22:74. [PMID: 33602124 PMCID: PMC7890638 DOI: 10.1186/s12859-021-04011-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Accepted: 02/08/2021] [Indexed: 11/11/2022] Open

Abstract

Background

One component of precision medicine is to construct prediction models with their predicitve ability as high as possible, e.g. to enable individual risk prediction. In genetic epidemiology, complex diseases like coronary artery disease, rheumatoid arthritis, and type 2 diabetes, have a polygenic basis and a common assumption is that biological and genetic features affect the outcome under consideration via interactions. In the case of omics data, the use of standard approaches such as generalized linear models may be suboptimal and machine learning methods are appealing to make individual predictions. However, most of these algorithms focus mostly on main or marginal effects of the single features in a dataset. On the other hand, the detection of interacting features is an active area of research in the realm of genetic epidemiology. One big class of algorithms to detect interacting features is based on the multifactor dimensionality reduction (MDR). Here, we further develop the model-based MDR (MB-MDR), a powerful extension of the original MDR algorithm, to enable interaction empowered individual prediction.

Results

Using a comprehensive simulation study we show that our new algorithm (median AUC: 0.66) can use information hidden in interactions and outperforms two other state-of-the-art algorithms, namely the Random Forest (median AUC: 0.54) and Elastic Net (median AUC: 0.50), if interactions are present in a scenario of two pairs of two features having small effects. The performance of these algorithms is comparable if no interactions are present. Further, we show that our new algorithm is applicable to real data by comparing the performance of the three algorithms on a dataset of rheumatoid arthritis cases and healthy controls. As our new algorithm is not only applicable to biological/genetic data but to all datasets with discrete features, it may have practical implications in other research fields where interactions between features have to be considered as well, and we made our method available as an R package (https://github.com/imbs-hl/MBMDRClassifieR).

Conclusions

The explicit use of interactions between features can improve the prediction performance and thus should be included in further attempts to move precision medicine forward.

Collapse

Kwak M. Genome-wide association study using truncated likelihood with incomplete information for stratum specific missingness. J Korean Stat Soc 2020. [DOI: 10.1007/s42952-020-00064-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Wright MN, König IR. Splitting on categorical predictors in random forests. PeerJ 2019;7:e6339. [PMID: 30746306 PMCID: PMC6368971 DOI: 10.7717/peerj.6339] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Accepted: 12/22/2018] [Indexed: 11/20/2022] Open

Abstract

One reason for the widespread success of random forests (RFs) is their ability to analyze most datasets without preprocessing. For example, in contrast to many other statistical methods and machine learning approaches, no recoding such as dummy coding is required to handle ordinal and nominal predictors. The standard approach for nominal predictors is to consider all 2^{k − 1} − 1 2-partitions of the k predictor categories. However, this exponential relationship produces a large number of potential splits to be evaluated, increasing computational complexity and restricting the possible number of categories in most implementations. For binary classification and regression, it was shown that ordering the predictor categories in each split leads to exactly the same splits as the standard approach. This reduces computational complexity because only k − 1 splits have to be considered for a nominal predictor with k categories. For multiclass classification and survival prediction no ordering method producing equivalent splits exists. We therefore propose to use a heuristic which orders the categories according to the first principal component of the weighted covariance matrix in multiclass classification and by log-rank scores in survival prediction. This ordering of categories can be done either in every split or a priori, that is, just once before growing the forest. With this approach, the nominal predictor can be treated as ordinal in the entire RF procedure, speeding up the computation and avoiding category limits. We compare the proposed methods with the standard approach, dummy coding and simply ignoring the nominal nature of the predictors in several simulation settings and on real data in terms of prediction performance and computational efficiency. We show that ordering the categories a priori is at least as good as the standard approach of considering all 2-partitions in all datasets considered, while being computationally faster. We recommend to use this approach as the default in RFs.

Collapse

Xue Y, Wang J, Ding J, Zhang S, Li Q. A powerful test for ordinal trait genetic association analysis. Stat Appl Genet Mol Biol 2019;18:/j/sagmb.ahead-of-print/sagmb-2017-0066/sagmb-2017-0066.xml. [PMID: 30685746 DOI: 10.1515/sagmb-2017-0066] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Saad MN, Mabrouk MS, Eldeib AM, Shaker OG. Studying the effects of haplotype partitioning methods on the RA-associated genomic results from the North American Rheumatoid Arthritis Consortium (NARAC) dataset. J Adv Res 2019;18:113-126. [PMID: 30891314 PMCID: PMC6403413 DOI: 10.1016/j.jare.2019.01.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Revised: 01/03/2019] [Accepted: 01/14/2019] [Indexed: 12/16/2022] Open

Abstract

•

Haplotype blocks methods plays a complementary role to the single-SNP approaches.

•

CIT, FGT, SSLD, and single-SNP methods should be applied to discover the markers.

•

Selection of the method used for the association has an impact on the biomarkers.

•

SSLD method detected more significant SNPs than CIT, FGT, and single-SNP methods.

•

The 383 SNPs discovered by all methods are significantly associated with RA.

The human genome, which includes thousands of genes, represents a big data challenge. Rheumatoid arthritis (RA) is a complex autoimmune disease with a genetic basis. Many single-nucleotide polymorphism (SNP) association methods partition a genome into haplotype blocks. The aim of this genome wide association study (GWAS) was to select the most appropriate haplotype block partitioning method for the North American Rheumatoid Arthritis Consortium (NARAC) dataset. The methods used for the NARAC dataset were the individual SNP approach and the following haplotype block methods: the four-gamete test (FGT), confidence interval test (CIT), and solid spine of linkage disequilibrium (SSLD). The measured parameters that reflect the strength of the association between the biomarker and RA were the P-value after Bonferroni correction and other parameters used to compare the output of each haplotype block method. This work presents a comparison among the individual SNP approach and the three haplotype block methods to select the method that can detect all the significant SNPs when applied alone. The GWAS results from the NARAC dataset obtained with the different methods are presented. The individual SNP, CIT, FGT, and SSLD methods detected 541, 1516, 1551, and 1831 RA-associated SNPs respectively, and the individual SNP, FGT, CIT, and SSLD methods detected 65, 156, 159, and 450 significant SNPs respectively, that were not detected by the other methods. Three hundred eighty-three SNPs were discovered by the haplotype block methods and the individual SNP approach, while 1021 SNPs were discovered by all three haplotype block methods. The 383 SNPs detected by all the methods are promising candidates for studying RA susceptibility. A hybrid technique involving all four methods should be applied to detect the significant SNPs associated with RA in the NARAC dataset, but the SSLD method may be preferred because of its advantages when only one method was used.

Collapse

Saad MN, Mabrouk MS, Eldeib AM, Shaker OG. Comparative study for haplotype block partitioning methods - Evidence from chromosome 6 of the North American Rheumatoid Arthritis Consortium (NARAC) dataset. PLoS One 2019;13:e0209603. [PMID: 30596705 PMCID: PMC6312333 DOI: 10.1371/journal.pone.0209603] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Accepted: 12/07/2018] [Indexed: 11/19/2022] Open

Abstract

Haplotype-based methods compete with “one-SNP-at-a-time” approaches on being preferred for association studies. Chromosome 6 contains most of the known genetic biomarkers for rheumatoid arthritis (RA) disease. Therefore, chromosome 6 serves as a benchmark for the haplotype methods testing. The aim of this study is to test the North American Rheumatoid Arthritis Consortium (NARAC) dataset to find out if haplotype block methods or single-locus approaches alone can sufficiently provide the significant single nucleotide polymorphisms (SNPs) associated with RA. In addition, could we be satisfied with only one method of the haplotype block methods for partitioning chromosome 6 of the NARAC dataset? In the NARAC dataset, chromosome 6 comprises 35,574 SNPs for 2,062 individuals (868 cases, 1,194 controls). Individual SNP approach and three haplotype block methods were applied to the NARAC dataset to identify the RA biomarkers. We employed three haplotype partitioning methods which are confidence interval test (CIT), four gamete test (FGT), and solid spine of linkage disequilibrium (SSLD). P-values after stringent Bonferroni correction for multiple testing were measured to assess the strength of association between the genetic variants and RA susceptibility. Moreover, the block size (in base pairs (bp) and number of SNPs included), number of blocks, percentage of uncovered SNPs by the block method, percentage of significant blocks from the total number of blocks, number of significant haplotypes and SNPs were used to compare among the three haplotype block methods. Individual SNP, CIT, FGT, and SSLD methods detected 432, 1,086, 1,099, and 1,322 associated SNPs, respectively. Each method identified significant SNPs that were not detected by any other method (Individual SNP: 12, FGT: 37, CIT: 55, and SSLD: 189 SNPs). 916 SNPs were discovered by all the three haplotype block methods. 367 SNPs were discovered by the haplotype block methods and the individual SNP approach. The P-values of these 367 SNPs were lower than those of the SNPs uniquely detected by only one method. The 367 SNPs detected by all the methods represent promising candidates for RA susceptibility. They should be further investigated for the European population. A hybrid technique including the four methods should be applied to detect the significant SNPs associated with RA for chromosome 6 of the NARAC dataset. Moreover, SSLD method may be preferred for its favored benefits in case of selecting only one method.

Collapse

Bao M, Wang K. Genome-wide association studies using a penalized moving-window regression. Bioinformatics 2017;33:3887-3894. [PMID: 28961706 PMCID: PMC5860090 DOI: 10.1093/bioinformatics/btx522] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Revised: 07/31/2017] [Accepted: 08/15/2017] [Indexed: 11/14/2022] Open

Friedrichs S, Manitz J, Burger P, Amos CI, Risch A, Chang-Claude J, Wichmann HE, Kneib T, Bickeböller H, Hofner B. Pathway-Based Kernel Boosting for the Analysis of Genome-Wide Association Studies. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2017;2017:6742763. [PMID: 28785300 PMCID: PMC5530424 DOI: 10.1155/2017/6742763] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Revised: 04/15/2017] [Accepted: 05/10/2017] [Indexed: 01/24/2023]

Power Calculation of Multi-step Combined Principal Components with Applications to Genetic Association Studies. Sci Rep 2016;6:26243. [PMID: 27189724 PMCID: PMC4870571 DOI: 10.1038/srep26243] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2016] [Accepted: 04/28/2016] [Indexed: 12/03/2022] Open

Xu J, Yuan Z, Ji J, Zhang X, Li H, Wu X, Xue F, Liu Y. A powerful score-based test statistic for detecting gene-gene co-association. BMC Genet 2016;17:31. [PMID: 26822525 PMCID: PMC4731962 DOI: 10.1186/s12863-016-0331-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Accepted: 01/13/2016] [Indexed: 11/10/2022] Open

Zhang W, Li H, Li Z, Li Q. A two-phase procedure for non-normal quantitative trait genetic association study. BMC Bioinformatics 2016;17:52. [PMID: 26821800 PMCID: PMC4730615 DOI: 10.1186/s12859-016-0888-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Accepted: 01/06/2016] [Indexed: 11/10/2022] Open

Zhang W, Li Q. Incorporating Hardy-Weinberg Equilibrium Law to Enhance the Association Strength for Ordinal Trait Genetic Study. Ann Hum Genet 2015;80:102-12. [DOI: 10.1111/ahg.12142] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Accepted: 10/21/2015] [Indexed: 11/25/2022]

Zhang W, Zhang Z, Li X, Li Q. Fitting Proportional Odds Model to Case-Control data with Incorporating Hardy-Weinberg Equilibrium. Sci Rep 2015;5:17286. [PMID: 26607176 PMCID: PMC4660314 DOI: 10.1038/srep17286] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2015] [Accepted: 10/28/2015] [Indexed: 01/07/2023] Open

Zhang W, Li Q. Nonparametric Risk and Nonparametric Odds in Quantitative Genetic Association Studies. Sci Rep 2015;5:12105. [PMID: 26174851 PMCID: PMC5378889 DOI: 10.1038/srep12105] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2014] [Accepted: 06/17/2015] [Indexed: 12/30/2022] Open

Li Z, Yuan A, Han G, Gao G, Li Q. Rank-based tests for identifying multiple genetic variants associated with quantitative traits. Ann Hum Genet 2015;78:306-10. [PMID: 24942081 DOI: 10.1111/ahg.12067] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Freytag S, Manitz J, Schlather M, Kneib T, Amos CI, Risch A, Chang-Claude J, Heinrich J, Bickeböller H. A network-based kernel machine test for the identification of risk pathways in genome-wide association studies. Hum Hered 2014;76:64-75. [PMID: 24434848 DOI: 10.1159/000357567] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2013] [Accepted: 11/26/2013] [Indexed: 02/06/2023] Open

Li Q, Hu J, Ding J, Zheng G. Fisher's method of combining dependent statistics using generalizations of the gamma distribution with applications to genetic pleiotropic associations. Biostatistics 2013;15:284-95. [PMID: 24174580 DOI: 10.1093/biostatistics/kxt045] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Liu J, Huang J, Ma S, Wang K. Incorporating group correlations in genome-wide association studies using smoothed group Lasso. Biostatistics 2013;14:205-19. [PMID: 22988281 PMCID: PMC3590928 DOI: 10.1093/biostatistics/kxs034] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2011] [Revised: 05/30/2012] [Accepted: 08/21/2012] [Indexed: 12/22/2022] Open

Li Q, Li Z, Zheng G, Gao G, Yu K. Rank-based robust tests for quantitative-trait genetic association studies. Genet Epidemiol 2013;37:358-65. [PMID: 23526350 DOI: 10.1002/gepi.21723] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2012] [Revised: 02/18/2013] [Accepted: 02/20/2013] [Indexed: 11/06/2022]

Wu CO, Zheng G, Kwak M. A Joint Regression Analysis for Genetic Association Studies with Outcome Stratified Samples. Biometrics 2013;69:417-26. [DOI: 10.1111/biom.12012] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2011] [Revised: 10/01/2012] [Accepted: 11/01/2012] [Indexed: 11/30/2022]

Freytag S, Bickeböller H, Amos CI, Kneib T, Schlather M. A novel kernel for correcting size bias in the logistic kernel machine test with an application to rheumatoid arthritis. Hum Hered 2013;74:97-108. [PMID: 23466369 DOI: 10.1159/000347188] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2012] [Accepted: 01/17/2013] [Indexed: 01/07/2023] Open

Liu J, Wang K, Ma S, Huang J. Accounting for linkage disequilibrium in genome-wide association studies: A penalized regression method. STATISTICS AND ITS INTERFACE 2013;6:99-115. [PMID: 25258655 PMCID: PMC4172344 DOI: 10.4310/sii.2013.v6.n1.a10] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]

Kruppa J, Ziegler A, König IR. Risk estimation and risk prediction using machine-learning methods. Hum Genet 2012;131:1639-54. [PMID: 22752090 PMCID: PMC3432206 DOI: 10.1007/s00439-012-1194-y] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2012] [Accepted: 06/14/2012] [Indexed: 01/02/2023]

Zheng G, Wu CO, Kwak M, Jiang W, Joo J, Lima JAC. Joint analysis of binary and quantitative traits with data sharing and outcome-dependent sampling. Genet Epidemiol 2012;36:263-73. [PMID: 22460626 DOI: 10.1002/gepi.21619] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2011] [Revised: 12/23/2011] [Accepted: 01/02/2012] [Indexed: 11/07/2022]

He Y, Li C, Amos CI, Xiong M, Ling H, Jin L. Accelerating haplotype-based genome-wide association study using perfect phylogeny and phase-known reference data. PLoS One 2011;6:e22097. [PMID: 21789217 PMCID: PMC3137625 DOI: 10.1371/journal.pone.0022097] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2010] [Accepted: 06/17/2011] [Indexed: 11/18/2022] Open

Alekseyenko AV, Lytkin NI, Ai J, Ding B, Padyukov L, Aliferis CF, Statnikov A. Causal graph-based analysis of genome-wide association data in rheumatoid arthritis. Biol Direct 2011;6:25. [PMID: 21592391 PMCID: PMC3118953 DOI: 10.1186/1745-6150-6-25] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2010] [Accepted: 05/18/2011] [Indexed: 01/27/2023] Open

Abstract

BACKGROUND

GWAS owe their popularity to the expectation that they will make a major impact on diagnosis, prognosis and management of disease by uncovering genetics underlying clinical phenotypes. The dominant paradigm in GWAS data analysis so far consists of extensive reliance on methods that emphasize contribution of individual SNPs to statistical association with phenotypes. Multivariate methods, however, can extract more information by considering associations of multiple SNPs simultaneously. Recent advances in other genomics domains pinpoint multivariate causal graph-based inference as a promising principled analysis framework for high-throughput data. Designed to discover biomarkers in the local causal pathway of the phenotype, these methods lead to accurate and highly parsimonious multivariate predictive models. In this paper, we investigate the applicability of causal graph-based method TIE* to analysis of GWAS data. To test the utility of TIE*, we focus on anti-CCP positive rheumatoid arthritis (RA) GWAS datasets, where there is a general consensus in the community about the major genetic determinants of the disease.

RESULTS

Application of TIE* to the North American Rheumatoid Arthritis Cohort (NARAC) GWAS data results in six SNPs, mostly from the MHC locus. Using these SNPs we develop two predictive models that can classify cases and disease-free controls with an accuracy of 0.81 area under the ROC curve, as verified in independent testing data from the same cohort. The predictive performance of these models generalizes reasonably well to Swedish subjects from the closely related but not identical Epidemiological Investigation of Rheumatoid Arthritis (EIRA) cohort with 0.71-0.78 area under the ROC curve. Moreover, the SNPs identified by the TIE* method render many other previously known SNP associations conditionally independent of the phenotype.

CONCLUSIONS

Our experiments demonstrate that application of TIE* captures maximum amount of genetic information about RA in the data and recapitulates the major consensus findings about the genetic factors of this disease. In addition, TIE* yields reproducible markers and signatures of RA. This suggests that principled multivariate causal and predictive framework for GWAS analysis empowers the community with a new tool for high-quality and more efficient discovery.

REVIEWERS

This article was reviewed by Prof. Anthony Almudevar, Dr. Eugene V. Koonin, and Prof. Marianthi Markatou.

Collapse

Yang C, Wan X, Yang Q, Xue H, Tang NLS, Yu W. A hidden two-locus disease association pattern in genome-wide association studies. BMC Bioinformatics 2011;12:156. [PMID: 21569557 PMCID: PMC3116488 DOI: 10.1186/1471-2105-12-156] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2010] [Accepted: 05/14/2011] [Indexed: 12/16/2022] Open

Abstract

BACKGROUND

Recent association analyses in genome-wide association studies (GWAS) mainly focus on single-locus association tests (marginal tests) and two-locus interaction detections. These analysis methods have provided strong evidence of associations between genetics variances and complex diseases. However, there exists a type of association pattern, which often occurs within local regions in the genome and is unlikely to be detected by either marginal tests or interaction tests. This association pattern involves a group of correlated single-nucleotide polymorphisms (SNPs). The correlation among SNPs can lead to weak marginal effects and the interaction does not play a role in this association pattern. This phenomenon is due to the existence of unfaithfulness: the marginal effects of correlated SNPs do not express their significant joint effects faithfully due to the correlation cancelation.

RESULTS

In this paper, we develop a computational method to detect this association pattern masked by unfaithfulness. We have applied our method to analyze seven data sets from the Wellcome Trust Case Control Consortium (WTCCC). The analysis for each data set takes about one week to finish the examination of all pairs of SNPs. Based on the empirical result of these real data, we show that this type of association masked by unfaithfulness widely exists in GWAS.

CONCLUSIONS

These newly identified associations enrich the discoveries of GWAS, which may provide new insights both in the analysis of tagSNPs and in the experiment design of GWAS. Since these associations may be easily missed by existing analysis tools, we can only connect some of them to publicly available findings from other association studies. As independent data set is limited at this moment, we also have difficulties to replicate these findings. More biological implications need further investigation.

AVAILABILITY

The software is freely available at http://bioinformatics.ust.hk/hidden_pattern_finder.zip.

Collapse

Shi G, Boerwinkle E, Morrison AC, Gu CC, Chakravarti A, Rao DC. Mining gold dust under the genome wide significance level: a two-stage approach to analysis of GWAS. Genet Epidemiol 2010;35:111-8. [PMID: 21254218 DOI: 10.1002/gepi.20556] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2010] [Revised: 10/27/2010] [Accepted: 11/17/2010] [Indexed: 12/14/2022]

An P, Mukherjee O, Chanda P, Yao L, Engelman CD, Huang CH, Zheng T, Kovac IP, Dubé MP, Liang X, Li J, de Andrade M, Culverhouse R, Malzahn D, Manning AK, Clarke GM, Jung J, Province MA. The challenge of detecting epistasis (G x G interactions): Genetic Analysis Workshop 16. Genet Epidemiol 2010;33 Suppl 1:S58-67. [PMID: 19924703 DOI: 10.1002/gepi.20474] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Kent JW. Analysis of multiple phenotypes. Genet Epidemiol 2010;33 Suppl 1:S33-9. [PMID: 19924720 DOI: 10.1002/gepi.20470] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Ziegler A. Genome-wide association studies: quality control and population-based measures. Genet Epidemiol 2010;33 Suppl 1:S45-50. [PMID: 19924716 DOI: 10.1002/gepi.20472] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Thomas DC. Genome-wide association studies for discrete traits. Genet Epidemiol 2010;33 Suppl 1:S8-12. [PMID: 19924710 DOI: 10.1002/gepi.20465] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]

Sarasua SM, Collins JS, Williamson DM, Satten GA, Allen AS. Effect of population stratification on the identification of significant single-nucleotide polymorphisms in genome-wide association studies. BMC Proc 2009;3 Suppl 7:S13. [PMID: 20017996 PMCID: PMC2795903 DOI: 10.1186/1753-6561-3-s7-s13] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Buil A, Martinez-Perez A, Perera-Lluna A, Rib L, Caminal P, Soria JM. A new gene-based association test for genome-wide association studies. BMC Proc 2009;3 Suppl 7:S130. [PMID: 20017997 PMCID: PMC2795904 DOI: 10.1186/1753-6561-3-s7-s130] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open

Cupples LA, Beyene J, Bickeböller H, Daw EW, Fallin MD, Gauderman WJ, Ghosh S, Goode EL, Hauser ER, Hinrichs A, Kent JW, Martin LJ, Martinez M, Neuman RJ, Province M, Szymczak S, Wilcox MA, Ziegler A, MacCluer JW, Almasy L. Genetic Analysis Workshop 16: Strategies for genome-wide association study analyses. BMC Proc 2009;3 Suppl 7:S1. [PMID: 20017962 PMCID: PMC2795869 DOI: 10.1186/1753-6561-3-s7-s1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Affiliation(s)

L Adrienne Cupples Department of Biostatistics, Boston University School of Public Health, 801 Massachusetts Avenue, Boston, MA 02130 and Framingham Heart Study, Framingham, Massachusetts, USA
Joseph Beyene Research Institute of the Hospital for Sick Children and University of Toronto, 555 University Avenue, Toronto, Ontario M5G 1X8, Canada
Heike Bickeböller Department of Genetic Epidemiology, University Medical Center Göttingen, Humboldtallee 32, 37073 Göttingen, Germany
E Warwick Daw Division of Statistical Genomics, Washington University School of Medicine, 4444 Forest Park Boulevard, Campus Box 8506, St. Louis, Missouri 63108, USA
M Daniele Fallin Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, 615 North Wolfe Street, Baltimore, Maryland 21205, USA
W James Gauderman University of Southern California, Department of Preventive Medicine, Division of Biostatistics, 1540 Alcazar Street, CHP-220, Los Angeles, California 90033, USA
Saurabh Ghosh Human Genetics Unit, Indian Statistical Institute, Kolkata 700018, India
Ellen L Goode Department of Health Sciences Research, Mayo Clinic, 200 First Street Southwest, Rochester, Minnesota 55905, USA
Elizabeth R Hauser Duke University, Durham, North Carolina 27710 USA
Anthony Hinrichs Division of Statistical Genomics, Washington University School of Medicine, 4444 Forest Park Boulevard, Campus Box 8506, St. Louis, Missouri 63108, USA
Jack W Kent Department of Genetics, Southwest Foundation for Biomedical Research, P.O. Box 760549, San Antonio, Texas 78245, USA
Lisa J Martin Division of Biostatistics and Epidemiology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Mail Code 5041, Cincinnati, Ohio 45229, USA
Maria Martinez INSERM, U.563, University Paul-Sabatier, CPTP, Toulouse F-31300, France
Rosalind J Neuman Division of Statistical Genomics, Washington University School of Medicine, 4444 Forest Park Boulevard, Campus Box 8506, St. Louis, Missouri 63108, USA
Michael Province Division of Statistical Genomics, Washington University School of Medicine, 4444 Forest Park Boulevard, Campus Box 8506, St. Louis, Missouri 63108, USA
Silke Szymczak Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Maria-Goeppert-Strasse 1, 23562 Lübeck, Germany
Marsha A Wilcox Johnson & Johnson Pharmaceutical Research and Development, 1125 Trenton-Harbourton Road, Titusville, New Jersey 08560, USA
Andreas Ziegler Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Maria-Goeppert-Strasse 1, 23562 Lübeck, Germany
Jean W MacCluer Department of Genetics, Southwest Foundation for Biomedical Research, P.O. Box 760549, San Antonio, Texas 78245, USA
Laura Almasy Department of Genetics, Southwest Foundation for Biomedical Research, P.O. Box 760549, San Antonio, Texas 78245, USA

Collapse

Guo W, Liang CY, Lin S. Haplotype association analysis of North American Rheumatoid Arthritis Consortium data using a generalized linear model with regularization. BMC Proc 2009;3 Suppl 7:S32. [PMID: 20018023 PMCID: PMC2795930 DOI: 10.1186/1753-6561-3-s7-s32] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open

Martin LJ, Gao G, Kang G, Fang Y, Woo JG. Improving the signal-to-noise ratio in genome-wide association studies. Genet Epidemiol 2009;33 Suppl 1:S29-32. [PMID: 19924719 DOI: 10.1002/gepi.20469] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]

MacCluer JW, Amos CI, Gregersen PK, Heard-Costa N, Lee M, Kraja AT, Borecki IB, Cupples LA, Almasy L. Genetic Analysis Workshop 16: introduction to workshop summaries. Genet Epidemiol 2009;33 Suppl 1:S1-7. [PMID: 19924709 DOI: 10.1002/gepi.20464] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Szymczak S, Biernacka JM, Cordell HJ, González-Recio O, König IR, Zhang H, Sun YV. Machine learning in genome-wide association studies. Genet Epidemiol 2009;33 Suppl 1:S51-7. [PMID: 19924717 DOI: 10.1002/gepi.20473] [Citation(s) in RCA: 103] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]