1
|
[An improved association analysis pipeline for tumor susceptibility variant in haplotype amplification area]. NAN FANG YI KE DA XUE XUE BAO = JOURNAL OF SOUTHERN MEDICAL UNIVERSITY 2020; 40:1493-1499. [PMID: 33118521 PMCID: PMC7606235 DOI: 10.12122/j.issn.1673-4254.2020.10.16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
OBJECTIVE Haplotype amplification on germline variants is suggested to imply potential selective advantages and clonal expansion susceptibility and has become an important signature for seeking cancer susceptibility gene.Here we propose an improved association method that fully considers the haplotype amplification status. METHODS The haplotype amplification status was estimated by the variant allelic frequencies.We adopted a permutation test on variant allelic frequencies to divide the candidate variants into multiple groups.A likelihood clustering method was then applied to establish the neighborhood system of the hidden Markov random field framework.A filtering pipeline was introduced into the proposed method to further refine the candidate variants, including a Wilson's interval filter and a false discovery rate controller.The final candidate set along with the haplotype amplification status was collapsed into the weighted virtual sites for association tests. RESULTS Through simulated tests on a series of datasets, we compared the type Ⅰ error rates of different minor allele frequencies, which stably fell within 2%, suggesting good robustness of the algorithm.In addition, we compared another 5 published association approaches for Type-Ⅰ and Type-Ⅱ error rates with the proposed method, which resulted in the error rates all within 2%, demonstrating significant advantages and a good statistical ability of the proposed method. CONCLUSIONS The proposed method can accurately identify tumor susceptibility variants in haplotype amplification area with good robustness and stability.
Collapse
|
2
|
Barna B, Badaruddoza, Kaur M, Bhanwer A. A multifactor dimensionality reduction model of gene polymorphisms and an environmental interaction analysis in type 2 diabetes mellitus study among Punjabi, a North India population. Meta Gene 2018. [DOI: 10.1016/j.mgene.2018.01.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
|
3
|
Friedrichs S, Manitz J, Burger P, Amos CI, Risch A, Chang-Claude J, Wichmann HE, Kneib T, Bickeböller H, Hofner B. Pathway-Based Kernel Boosting for the Analysis of Genome-Wide Association Studies. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2017; 2017:6742763. [PMID: 28785300 PMCID: PMC5530424 DOI: 10.1155/2017/6742763] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Revised: 04/15/2017] [Accepted: 05/10/2017] [Indexed: 01/24/2023]
Abstract
The analysis of genome-wide association studies (GWAS) benefits from the investigation of biologically meaningful gene sets, such as gene-interaction networks (pathways). We propose an extension to a successful kernel-based pathway analysis approach by integrating kernel functions into a powerful algorithmic framework for variable selection, to enable investigation of multiple pathways simultaneously. We employ genetic similarity kernels from the logistic kernel machine test (LKMT) as base-learners in a boosting algorithm. A model to explain case-control status is created iteratively by selecting pathways that improve its prediction ability. We evaluated our method in simulation studies adopting 50 pathways for different sample sizes and genetic effect strengths. Additionally, we included an exemplary application of kernel boosting to a rheumatoid arthritis and a lung cancer dataset. Simulations indicate that kernel boosting outperforms the LKMT in certain genetic scenarios. Applications to GWAS data on rheumatoid arthritis and lung cancer resulted in sparse models which were based on pathways interpretable in a clinical sense. Kernel boosting is highly flexible in terms of considered variables and overcomes the problem of multiple testing. Additionally, it enables the prediction of clinical outcomes. Thus, kernel boosting constitutes a new, powerful tool in the analysis of GWAS data and towards the understanding of biological processes involved in disease susceptibility.
Collapse
Affiliation(s)
- Stefanie Friedrichs
- Institute of Genetic Epidemiology, University Medical Centre, Georg-August University Göttingen, Göttingen, Germany
| | - Juliane Manitz
- Department of Statistics and Econometrics, Georg-August University Göttingen, Göttingen, Germany
- Department of Mathematics and Statistics, Boston University, Boston, MA, USA
| | - Patricia Burger
- Institute of Genetic Epidemiology, University Medical Centre, Georg-August University Göttingen, Göttingen, Germany
| | - Christopher I. Amos
- Department of Community and Family Medicine, Geisel School of Medicine, Dartmouth College, Lebanon, NH, USA
| | - Angela Risch
- Division of Molecular Biology, University of Salzburg, Salzburg, Austria
- Translational Lung Research Center Heidelberg (TLRC-H), Member of the German Center for Lung Research (DZL), Heidelberg, Germany
- Division of Epigenomics and Cancer Risk Factors, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Jenny Chang-Claude
- Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Heinz-Erich Wichmann
- Institute of Medical Informatics, Biometry and Epidemiology, Chair of Epidemiology, Ludwig-Maximilians University, Munich, Germany
- Helmholtz Center Munich, Institute of Epidemiology II, Munich, Germany
- Institute of Medical Statistics and Epidemiology, Technical University Munich, Munich, Germany
| | - Thomas Kneib
- Department of Statistics and Econometrics, Georg-August University Göttingen, Göttingen, Germany
| | - Heike Bickeböller
- Institute of Genetic Epidemiology, University Medical Centre, Georg-August University Göttingen, Göttingen, Germany
| | - Benjamin Hofner
- Department of Medical Informatics, Biometry and Epidemiology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
- Section Biostatistics, Paul-Ehrlich-Institut, Langen, Germany
| |
Collapse
|
4
|
Coombes B, Basu S, Guha S, Schork N. Weighted Score Tests Implementing Model-Averaging Schemes in Detection of Rare Variants in Case-Control Studies. PLoS One 2015; 10:e0139355. [PMID: 26436424 PMCID: PMC4593572 DOI: 10.1371/journal.pone.0139355] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2015] [Accepted: 09/11/2015] [Indexed: 12/04/2022] Open
Abstract
Multi-locus effect modeling is a powerful approach for detection of genes influencing a complex disease. Especially for rare variants, we need to analyze multiple variants together to achieve adequate power for detection. In this paper, we propose several parsimonious branching model techniques to assess the joint effect of a group of rare variants in a case-control study. These models implement a data reduction strategy within a likelihood framework and use a weighted score test to assess the statistical significance of the effect of the group of variants on the disease. The primary advantage of the proposed approach is that it performs model-averaging over a substantially smaller set of models supported by the data and thus gains power to detect multi-locus effects. We illustrate these proposed approaches on simulated and real data and study their performance compared to several existing rare variant detection approaches. The primary goal of this paper is to assess if there is any gain in power to detect association by averaging over a number of models instead of selecting the best model. Extensive simulations and real data application demonstrate the advantage the proposed approach in presence of causal variants with opposite directional effects along with a moderate number of null variants in linkage disequilibrium.
Collapse
Affiliation(s)
- Brandon Coombes
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, United States of America
| | - Saonli Basu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, United States of America
| | - Sharmistha Guha
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, United States of America
| | - Nicholas Schork
- J. Craig Venter Institute, La Jolla, CA, United States of America
| |
Collapse
|
5
|
Ray D, Li X, Pan W, Pankow JS, Basu S. A Bayesian Partitioning Model for the Detection of Multilocus Effects in Case-Control Studies. Hum Hered 2015; 79:69-79. [PMID: 26044550 DOI: 10.1159/000369858] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2014] [Accepted: 11/12/2014] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND Genome-wide association studies (GWASs) have identified hundreds of genetic variants associated with complex diseases, but these variants appear to explain very little of the disease heritability. The typical single-locus association analysis in a GWAS fails to detect variants with small effect sizes and to capture higher-order interaction among these variants. Multilocus association analysis provides a powerful alternative by jointly modeling the variants within a gene or a pathway and by reducing the burden of multiple hypothesis testing in a GWAS. METHODS Here, we propose a powerful and flexible dimension reduction approach to model multilocus association. We use a Bayesian partitioning model which clusters SNPs according to their direction of association, models higher-order interactions using a flexible scoring scheme and uses posterior marginal probabilities to detect association between the SNP set and the disease. RESULTS We illustrate our method using extensive simulation studies and applying it to detect multilocus interaction in Atherosclerosis Risk in Communities (ARIC) GWAS with type 2 diabetes. CONCLUSION We demonstrate that our approach has better power to detect multilocus interactions than several existing approaches. When applied to the ARIC study dataset with 9,328 individuals to study gene-based associations for type 2 diabetes, our method identified some novel variants not detected by conventional single-locus association analyses.
Collapse
Affiliation(s)
- Debashree Ray
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minn., USA
| | | | | | | | | |
Collapse
|
6
|
Freytag S, Bickeböller H, Amos CI, Kneib T, Schlather M. A novel kernel for correcting size bias in the logistic kernel machine test with an application to rheumatoid arthritis. Hum Hered 2013; 74:97-108. [PMID: 23466369 DOI: 10.1159/000347188] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2012] [Accepted: 01/17/2013] [Indexed: 01/07/2023] Open
Abstract
OBJECTIVES The logistic kernel machine test (LKMT) is a testing procedure tailored towards high-dimensional genetic data. Its use in pathway analyses of case-control genome-wide association studies results from its computational efficiency and flexibility in incorporating additional information via the kernel. The kernel can be any positive definite function; unfortunately, its form strongly influences the test's power and bias. Most authors have recommended the use of a simple linear kernel. We demonstrate via a simulation that the probability of rejecting the null hypothesis of no association just by chance increases with the number of SNPs or genes in the pathway when applying a simple linear kernel. METHODS We propose a novel kernel that includes an appropriate standardization in order to protect against any inflation of false positive results. Moreover, our novel kernel contains information on gene membership of SNPs in the pathway. RESULTS When applying the novel kernel to data from the North American Rheumatoid Arthritis Consortium, we find that even this basic genomic structure can improve the ability of the LKMT to identify meaningful associations. We also demonstrate that the standardization effectively eliminates problems of size bias. CONCLUSION We recommend the use of our standardized kernel and urge caution when using non-adjusted kernels in the LKMT to conduct pathway analyses.
Collapse
Affiliation(s)
- Saskia Freytag
- Department of Genetic Epidemiology, Medical School, Georg-August University Göttingen, Göttingen, Germany.
| | | | | | | | | |
Collapse
|
7
|
McEachin RC, Cavalcoli JD. Overlap of genetic influences in phenotypes classically categorized as psychiatric vs medical disorders. World J Med Genet 2011; 1:4-10. [DOI: 10.5496/wjmg.v1.i1.4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Psychiatric disorders have traditionally been segregated from medical disorders in terms of drugs, treatment, insurance coverage and training of clinicians. This segregation is consistent with the long-standing observation that there are inherent differences between psychiatric disorders (diseases relating to thoughts, feelings and behavior) and medical disorders (diseases relating to physical processes). However, these differences are growing less distinct as we improve our understanding of the roles of epistasis and pleiotropy in medical genetics. Both psychiatric and medical disorders are predisposed in part by genetic variation, and psychiatric disorders tend to be comorbid with medical disorders. One hypothesis on this interaction posits that certain combinations of genetic variants (epistasis) influence psychiatric disorders due to their impact on the brain, but the associated genes are also expressed in other tissues so the same groups of variants influence medical disorders (pleiotropy). The observation that psychiatric and medical disorders may interact is not novel. Equally, both epistasis and pleiotropy are fundamental concepts in medical genetics. However, we are just beginning to understand how genetic variation can influence both psychiatric and medical disorders. In our recent work, we have discovered gene networks significantly associated with psychiatric and substance use disorders. Invariably, these networks are also significantly associated with medical disorders. Recognizing how genetic variation can influence both psychiatric and medical disorders will help us to understand the etiology of the individual and comorbid disease phenotypes, predict and minimize side effects in drug and other treatments, and help to reduce stigma associated with psychiatric disorders.
Collapse
|
8
|
Pan W, Basu S, Shen X. Adaptive tests for detecting gene-gene and gene-environment interactions. Hum Hered 2011; 72:98-109. [PMID: 21934325 DOI: 10.1159/000330632] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2011] [Accepted: 07/02/2011] [Indexed: 12/14/2022] Open
Abstract
There has been an increasing interest in detecting gene-gene and gene-environment interactions in genetic association studies. A major statistical challenge is how to deal with a large number of parameters measuring possible interaction effects, which leads to reduced power of any statistical test due to a large number of degrees of freedom or high cost of adjustment for multiple testing. Hence, a popular idea is to first apply some dimension reduction techniques before testing, while another is to apply only statistical tests that are developed for and robust to high-dimensional data. To combine both ideas, we propose applying an adaptive sum of squared score (SSU) test and several other adaptive tests. These adaptive tests are extensions of the adaptive Neyman test [Fan, 1996], which was originally proposed for high-dimensional data, providing a simple and effective way for dimension reduction. On the other hand, the original SSU test coincides with a version of a test specifically developed for high-dimensional data. We apply these adaptive tests and their original nonadaptive versions to simulated data to detect interactions between two groups of SNPs (e.g. multiple SNPs in two candidate regions). We found that for sparse models (i.e. with only few non-zero interaction parameters), the adaptive SSU test and its close variant, an adaptive version of the weighted sum of squared score (SSUw) test, improved the power over their non-adaptive versions, and performed consistently well across various scenarios. The proposed adaptive tests are built in the general framework of regression analysis, and can thus be applied to various types of traits in the presence of covariates.
Collapse
Affiliation(s)
- Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, USA. weip @ biostat.umn.edu
| | | | | |
Collapse
|