1
|
Sun NA, Wang YU, Chu J, Han Q, Shen Y. Bayesian Approaches in Exploring Gene-environment and Gene-gene Interactions: A Comprehensive Review. Cancer Genomics Proteomics 2023; 20:669-678. [PMID: 38035701 PMCID: PMC10687732 DOI: 10.21873/cgp.20414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 10/30/2023] [Accepted: 11/01/2023] [Indexed: 12/02/2023] Open
Abstract
Rapid advancements in high-throughput biological techniques have facilitated the generation of high-dimensional omics datasets, which have provided a solid foundation for precision medicine and prognosis prediction. Nonetheless, the problem of missing heritability persists. To solve this problem, it is essential to explain the genetic structure of disease incidence risk and prognosis by incorporating interactions. The development of the Bayesian theory has provided new approaches for developing models for interaction identification and estimation. Several Bayesian models have been developed to improve the accuracy of model and identify the main effect, gene-environment (G×E) and gene-gene (G×G) interactions. Studies based on single-nucleotide polymorphisms (SNPs) are significant for the exploration of rare and common variants. Models based on the effect heredity principle and group-based models are relatively flexible and do not require strict constraints when dealing with the hierarchical structure between the main effect and interactions (M-I). These models have a good interpretability of biological mechanisms. Machine learning-based Bayesian approaches are highly competitive in improving prediction accuracy. These models provide insights into the mechanisms underlying the occurrence and progression of complex diseases, identify more reliable biomarkers, and develop higher predictive accuracy. In this paper, we provide a comprehensive review of these Bayesian approaches.
Collapse
Affiliation(s)
- N A Sun
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Y U Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Jiadong Chu
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Qiang Han
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Yueping Shen
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| |
Collapse
|
2
|
Stamp J, DenAdel A, Weinreich D, Crawford L. Leveraging the genetic correlation between traits improves the detection of epistasis in genome-wide association studies. G3 (BETHESDA, MD.) 2023; 13:jkad118. [PMID: 37243672 PMCID: PMC10484060 DOI: 10.1093/g3journal/jkad118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 01/11/2023] [Accepted: 05/23/2023] [Indexed: 05/29/2023]
Abstract
Epistasis, commonly defined as the interaction between genetic loci, is known to play an important role in the phenotypic variation of complex traits. As a result, many statistical methods have been developed to identify genetic variants that are involved in epistasis, and nearly all of these approaches carry out this task by focusing on analyzing one trait at a time. Previous studies have shown that jointly modeling multiple phenotypes can often dramatically increase statistical power for association mapping. In this study, we present the "multivariate MArginal ePIstasis Test" (mvMAPIT)-a multioutcome generalization of a recently proposed epistatic detection method which seeks to detect marginal epistasis or the combined pairwise interaction effects between a given variant and all other variants. By searching for marginal epistatic effects, one can identify genetic variants that are involved in epistasis without the need to identify the exact partners with which the variants interact-thus, potentially alleviating much of the statistical and computational burden associated with conventional explicit search-based methods. Our proposed mvMAPIT builds upon this strategy by taking advantage of correlation structure between traits to improve the identification of variants involved in epistasis. We formulate mvMAPIT as a multivariate linear mixed model and develop a multitrait variance component estimation algorithm for efficient parameter inference and P-value computation. Together with reasonable model approximations, our proposed approach is scalable to moderately sized genome-wide association studies. With simulations, we illustrate the benefits of mvMAPIT over univariate (or single-trait) epistatic mapping strategies. We also apply mvMAPIT framework to protein sequence data from two broadly neutralizing anti-influenza antibodies and approximately 2,000 heterogeneous stock of mice from the Wellcome Trust Centre for Human Genetics. The mvMAPIT R package can be downloaded at https://github.com/lcrawlab/mvMAPIT.
Collapse
Affiliation(s)
- Julian Stamp
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
| | - Alan DenAdel
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
| | - Daniel Weinreich
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
- Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02906, USA
| | - Lorin Crawford
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
- Department of Biostatistics, Brown University, Providence, RI 02903, USA
- Microsoft Research New England, Cambridge, MA 02142, USA
| |
Collapse
|
3
|
Abegaz F, Van Lishout F, Mahachie John JM, Chiachoompu K, Bhardwaj A, Duroux D, Gusareva ES, Wei Z, Hakonarson H, Van Steen K. Performance of model-based multifactor dimensionality reduction methods for epistasis detection by controlling population structure. BioData Min 2021; 14:16. [PMID: 33608043 PMCID: PMC7893746 DOI: 10.1186/s13040-021-00247-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Accepted: 02/07/2021] [Indexed: 12/15/2022] Open
Abstract
Background In genome-wide association studies the extent and impact of confounding due to population structure have been well recognized. Inadequate handling of such confounding is likely to lead to spurious associations, hampering replication, and the identification of causal variants. Several strategies have been developed for protecting associations against confounding, the most popular one is based on Principal Component Analysis. In contrast, the extent and impact of confounding due to population structure in gene-gene interaction association epistasis studies are much less investigated and understood. In particular, the role of nonlinear genetic population substructure in epistasis detection is largely under-investigated, especially outside a regression framework. Methods To identify causal variants in synergy, to improve interpretability and replicability of epistasis results, we introduce three strategies based on a model-based multifactor dimensionality reduction approach for structured populations, namely MBMDR-PC, MBMDR-PG, and MBMDR-GC. Results Simulation results comparing the performance of various approaches show that in the presence of population structure MBMDR-PC and MBMDR-PG consistently better control type I error rate at the nominal level than MBMDR-GC. Moreover, our proposed three methods of population structure correction outperform MDR-SP in terms of statistical power. Conclusion We demonstrate through extensive simulation studies the effect of various degrees of genetic population structure and relatedness on epistasis detection and propose appropriate remedial measures based on linear and nonlinear sample genetic similarity. Supplementary Information The online version contains supplementary material available at 10.1186/s13040-021-00247-w.
Collapse
Affiliation(s)
- Fentaw Abegaz
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium.
| | | | | | | | - Archana Bhardwaj
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium
| | - Diane Duroux
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium
| | - Elena S Gusareva
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium
| | - Zhi Wei
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
| | - Hakon Hakonarson
- Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Pediatrics, Division of Human Genetics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Kristel Van Steen
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium.,WELBIO (Walloon Excellence in Lifesciences and Biotechnology), University of Liège, Liège, Belgium
| |
Collapse
|
4
|
Abegaz F, Van Lishout F, Mahachie John JM, Chiachoompu K, Bhardwaj A, Gusareva ES, Wei Z, Hakonarson H, Van Steen K. Epistasis Detection in Genome-Wide Screening for Complex Human Diseases in Structured Populations. SYSTEMS MEDICINE 2019. [DOI: 10.1089/sysm.2019.0003] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Affiliation(s)
- Fentaw Abegaz
- GIGA-R, Medical Genomics—BIO3, University of Liege, Liege, Belgium
| | | | | | | | - Archana Bhardwaj
- GIGA-R, Medical Genomics—BIO3, University of Liege, Liege, Belgium
| | | | - Zhi Wei
- Department of Computer Science, New Jersey Institute of Technology, Newark, New Jersey
| | - Hakon Hakonarson
- Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania
- Division of Human Genetics, Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Kristel Van Steen
- GIGA-R, Medical Genomics—BIO3, University of Liege, Liege, Belgium
- WELBIO (Walloon Excellence in Lifesciences and Biotechnology), University of Liege, Liege, Belgium
| | | |
Collapse
|
5
|
Xu EL, Qian X, Yu Q, Zhang H, Cui S. Feature selection with interactions in logistic regression models using multivariate synergies for a GWAS application. BMC Genomics 2018; 19:170. [PMID: 29589561 PMCID: PMC5872388 DOI: 10.1186/s12864-018-4552-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
BACKGROUND Genotype-phenotype association has been one of the long-standing problems in bioinformatics. Identifying both the marginal and epistatic effects among genetic markers, such as Single Nucleotide Polymorphisms (SNPs), has been extensively integrated in Genome-Wide Association Studies (GWAS) to help derive "causal" genetic risk factors and their interactions, which play critical roles in life and disease systems. Identifying "synergistic" interactions with respect to the outcome of interest can help accurate phenotypic prediction and understand the underlying mechanism of system behavior. Many statistical measures for estimating synergistic interactions have been proposed in the literature for such a purpose. However, except for empirical performance, there is still no theoretical analysis on the power and limitation of these synergistic interaction measures. RESULTS In this paper, it is shown that the existing information-theoretic multivariate synergy depends on a small subset of the interaction parameters in the model, sometimes on only one interaction parameter. In addition, an adjusted version of multivariate synergy is proposed as a new measure to estimate the interactive effects, with experiments conducted over both simulated data sets and a real-world GWAS data set to show the effectiveness. CONCLUSIONS We provide rigorous theoretical analysis and empirical evidence on why the information-theoretic multivariate synergy helps with identifying genetic risk factors via synergistic interactions. We further establish the rigorous sample complexity analysis on detecting interactive effects, confirmed by both simulated and real-world data sets.
Collapse
Affiliation(s)
- Easton Li Xu
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, 48109 MI USA
- School of Science and Engineering, Chinese University of Hong Kong, Shenzhen, Guangdong, 518172 China
| | - Xiaoning Qian
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, 77843 TX USA
| | - Qilian Yu
- Department of Electrical and Computer Engineering, University of California, Davis, 95616 CA USA
| | - Han Zhang
- Department of Electrical and Computer Engineering, University of California, Davis, 95616 CA USA
| | - Shuguang Cui
- Department of Electrical and Computer Engineering, University of California, Davis, 95616 CA USA
| |
Collapse
|
6
|
Niche harmony search algorithm for detecting complex disease associated high-order SNP combinations. Sci Rep 2017; 7:11529. [PMID: 28912584 PMCID: PMC5599559 DOI: 10.1038/s41598-017-11064-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2016] [Accepted: 08/17/2017] [Indexed: 02/01/2023] Open
Abstract
Genome-wide association study is especially challenging in detecting high-order disease-causing models due to model diversity, possible low or even no marginal effect of the model, and extraordinary search and computations. In this paper, we propose a niche harmony search algorithm where joint entropy is utilized as a heuristic factor to guide the search for low or no marginal effect model, and two computationally lightweight scores are selected to evaluate and adapt to diverse of disease models. In order to obtain all possible suspected pathogenic models, niche technique merges with HS, which serves as a taboo region to avoid HS trapping into local search. From the resultant set of candidate SNP-combinations, we use G-test statistic for testing true positives. Experiments were performed on twenty typical simulation datasets in which 12 models are with marginal effect and eight ones are with no marginal effect. Our results indicate that the proposed algorithm has very high detection power for searching suspected disease models in the first stage and it is superior to some typical existing approaches in both detection power and CPU runtime for all these datasets. Application to age-related macular degeneration (AMD) demonstrates our method is promising in detecting high-order disease-causing models.
Collapse
|
7
|
Crawford L, Zeng P, Mukherjee S, Zhou X. Detecting epistasis with the marginal epistasis test in genetic mapping studies of quantitative traits. PLoS Genet 2017; 13:e1006869. [PMID: 28746338 PMCID: PMC5550000 DOI: 10.1371/journal.pgen.1006869] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Revised: 08/09/2017] [Accepted: 06/15/2017] [Indexed: 12/13/2022] Open
Abstract
Epistasis, commonly defined as the interaction between multiple genes, is an important genetic component underlying phenotypic variation. Many statistical methods have been developed to model and identify epistatic interactions between genetic variants. However, because of the large combinatorial search space of interactions, most epistasis mapping methods face enormous computational challenges and often suffer from low statistical power due to multiple test correction. Here, we present a novel, alternative strategy for mapping epistasis: instead of directly identifying individual pairwise or higher-order interactions, we focus on mapping variants that have non-zero marginal epistatic effects-the combined pairwise interaction effects between a given variant and all other variants. By testing marginal epistatic effects, we can identify candidate variants that are involved in epistasis without the need to identify the exact partners with which the variants interact, thus potentially alleviating much of the statistical and computational burden associated with standard epistatic mapping procedures. Our method is based on a variance component model, and relies on a recently developed variance component estimation method for efficient parameter inference and p-value computation. We refer to our method as the "MArginal ePIstasis Test", or MAPIT. With simulations, we show how MAPIT can be used to estimate and test marginal epistatic effects, produce calibrated test statistics under the null, and facilitate the detection of pairwise epistatic interactions. We further illustrate the benefits of MAPIT in a QTL mapping study by analyzing the gene expression data of over 400 individuals from the GEUVADIS consortium.
Collapse
Affiliation(s)
- Lorin Crawford
- Department of Biostatistics, Brown University, Providence, Rhode Island, United States of America
- Center for Statistical Sciences, Brown University, Providence, Rhode Island, United States of America
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
| | - Ping Zeng
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
- Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Sayan Mukherjee
- Department of Statistical Science, Duke University, Durham, North Carolina, United States of America
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
- Department of Mathematics, Duke University, Durham, North Carolina, United States of America
- Department of Bioinformatics & Biostatistics, Duke University, Durham, North Carolina, United States of America
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
- Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
8
|
Guo X, Zhang J, Cai Z, Du DZ, Pan Y. Searching Genome-Wide Multi-Locus Associations for Multiple Diseases Based on Bayesian Inference. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:600-610. [PMID: 26887006 DOI: 10.1109/tcbb.2016.2527648] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Taking the advantage of high-throughput single nucleotide polymorphism (SNP) genotyping technology, large genome-wide association studies (GWASs) have been considered to hold promise for unraveling complex relationships between genotypes and phenotypes. Current multi-locus-based methods are insufficient to detect interactions with diverse genetic effects on multifarious diseases. Also, statistic tests for high-order epistasis ( ≥ 2 SNPs) raise huge computational and analytical challenges because the computation increases exponentially as the growth of the cardinality of SNPs combinations. In this paper, we provide a simple, fast and powerful method, named DAM, using Bayesian inference to detect genome-wide multi-locus epistatic interactions in multiple diseases. Experimental results on simulated data demonstrate that our method is powerful and efficient. We also apply DAM on two GWAS datasets from WTCCC, i.e., Rheumatoid Arthritis and Type 1 Diabetes, and identify some novel findings. Therefore, we believe that our method is suitable and efficient for the full-scale analysis of multi-disease-related interactions in GWASs.
Collapse
|
9
|
FHSA-SED: Two-Locus Model Detection for Genome-Wide Association Study with Harmony Search Algorithm. PLoS One 2016; 11:e0150669. [PMID: 27014873 PMCID: PMC4807955 DOI: 10.1371/journal.pone.0150669] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Accepted: 02/16/2016] [Indexed: 12/24/2022] Open
Abstract
Motivation Two-locus model is a typical significant disease model to be identified in genome-wide association study (GWAS). Due to intensive computational burden and diversity of disease models, existing methods have drawbacks on low detection power, high computation cost, and preference for some types of disease models. Method In this study, two scoring functions (Bayesian network based K2-score and Gini-score) are used for characterizing two SNP locus as a candidate model, the two criteria are adopted simultaneously for improving identification power and tackling the preference problem to disease models. Harmony search algorithm (HSA) is improved for quickly finding the most likely candidate models among all two-locus models, in which a local search algorithm with two-dimensional tabu table is presented to avoid repeatedly evaluating some disease models that have strong marginal effect. Finally G-test statistic is used to further test the candidate models. Results We investigate our method named FHSA-SED on 82 simulated datasets and a real AMD dataset, and compare it with two typical methods (MACOED and CSE) which have been developed recently based on swarm intelligent search algorithm. The results of simulation experiments indicate that our method outperforms the two compared algorithms in terms of detection power, computation time, evaluation times, sensitivity (TPR), specificity (SPC), positive predictive value (PPV) and accuracy (ACC). Our method has identified two SNPs (rs3775652 and rs10511467) that may be also associated with disease in AMD dataset.
Collapse
|
10
|
Colak R, Kim T, Kazan H, Oh Y, Cruz M, Valladares-Salgado A, Peralta J, Escobedo J, Parra EJ, Kim PM, Goldenberg A. JBASE: Joint Bayesian Analysis of Subphenotypes and Epistasis. Bioinformatics 2016; 32:203-10. [PMID: 26411870 PMCID: PMC4708100 DOI: 10.1093/bioinformatics/btv504] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2014] [Revised: 08/02/2015] [Accepted: 08/24/2015] [Indexed: 01/22/2023] Open
Abstract
MOTIVATION Rapid advances in genotyping and genome-wide association studies have enabled the discovery of many new genotype-phenotype associations at the resolution of individual markers. However, these associations explain only a small proportion of theoretically estimated heritability of most diseases. In this work, we propose an integrative mixture model called JBASE: joint Bayesian analysis of subphenotypes and epistasis. JBASE explores two major reasons of missing heritability: interactions between genetic variants, a phenomenon known as epistasis and phenotypic heterogeneity, addressed via subphenotyping. RESULTS Our extensive simulations in a wide range of scenarios repeatedly demonstrate that JBASE can identify true underlying subphenotypes, including their associated variants and their interactions, with high precision. In the presence of phenotypic heterogeneity, JBASE has higher Power and lower Type 1 Error than five state-of-the-art approaches. We applied our method to a sample of individuals from Mexico with Type 2 diabetes and discovered two novel epistatic modules, including two loci each, that define two subphenotypes characterized by differences in body mass index and waist-to-hip ratio. We successfully replicated these subphenotypes and epistatic modules in an independent dataset from Mexico genotyped with a different platform. AVAILABILITY AND IMPLEMENTATION JBASE is implemented in C++, supported on Linux and is available at http://www.cs.toronto.edu/∼goldenberg/JBASE/jbase.tar.gz. The genotype data underlying this study are available upon approval by the ethics review board of the Medical Centre Siglo XXI. Please contact Dr Miguel Cruz at mcruzl@yahoo.com for assistance with the application. CONTACT anna.goldenberg@utoronto.ca SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Recep Colak
- Department of Computer Science, University of Toronto, M5S 2E4, Toronto, ON, Canada, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, M5S 3E1, Toronto, ON, Canada
| | - TaeHyung Kim
- Department of Computer Science, University of Toronto, M5S 2E4, Toronto, ON, Canada, Department of Computer Engineering, Antalya International University, 07190, Antalya, Turkey
| | - Hilal Kazan
- Department of Computer Engineering, Antalya International University, 07190, Antalya, Turkey
| | - Yoomi Oh
- Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, M5S 3E1, Toronto, ON, Canada, Department of Molecular Genetics, University of Toronto, M5S 1A8, Toronto, ON, Canada
| | - Miguel Cruz
- Unidad de Investigación Médica en Bioquímica, Hospital de Especialidades, IMSS, 06720, Mexico City, Mexico
| | - Adan Valladares-Salgado
- Unidad de Investigación Médica en Bioquímica, Hospital de Especialidades, IMSS, 06720, Mexico City, Mexico
| | - Jesus Peralta
- Unidad de Investigación Médica en Bioquímica, Hospital de Especialidades, IMSS, 06720, Mexico City, Mexico
| | - Jorge Escobedo
- Unidad de Investigación en Epidemiología Clínica, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Esteban J Parra
- Department of Anthropology, University of Toronto, L5L 1C6, Mississauga, ON, Canada
| | - Philip M Kim
- Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, M5S 3E1, Toronto, ON, Canada, Department of Molecular Genetics, University of Toronto, M5S 1A8, Toronto, ON, Canada, Genetics and Genome Biology, Hospital for Sick Children, M5G 0A4, Toronto, ON, Canada and Banting and Best Department of Medical Research, University of Toronto, M5G 1L6, Toronto, ON, Canada
| | - Anna Goldenberg
- Department of Computer Science, University of Toronto, M5S 2E4, Toronto, ON, Canada, Genetics and Genome Biology, Hospital for Sick Children, M5G 0A4, Toronto, ON, Canada and
| |
Collapse
|
11
|
Wang J, Joshi T, Valliyodan B, Shi H, Liang Y, Nguyen HT, Zhang J, Xu D. A Bayesian model for detection of high-order interactions among genetic variants in genome-wide association studies. BMC Genomics 2015; 16:1011. [PMID: 26607428 PMCID: PMC4660815 DOI: 10.1186/s12864-015-2217-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Accepted: 11/16/2015] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND A central question for disease studies and crop improvements is how genetics variants drive phenotypes. Genome Wide Association Study (GWAS) provides a powerful tool for characterizing the genotype-phenotype relationships in complex traits and diseases. Epistasis (gene-gene interaction), including high-order interaction among more than two genes, often plays important roles in complex traits and diseases, but current GWAS analysis usually just focuses on additive effects of single nucleotide polymorphisms (SNPs). The lack of effective computational modelling of high-order functional interactions often leads to significant under-utilization of GWAS data. RESULTS We have developed a novel Bayesian computational method with a Markov Chain Monte Carlo (MCMC) search, and implemented the method as a Bayesian High-order Interaction Toolkit (BHIT) for detecting epistatic interactions among SNPs. BHIT first builds a Bayesian model on both continuous data and discrete data, which is capable of detecting high-order interactions in SNPs related to case--control or quantitative phenotypes. We also developed a pipeline that enables users to apply BHIT on different species in different use cases. CONCLUSIONS Using both simulation data and soybean nutritional seed composition studies on oil content and protein content, BHIT effectively detected some high-order interactions associated with phenotypes, and it outperformed a number of other available tools. BHIT is freely available for academic users at http://digbio.missouri.edu/BHIT/.
Collapse
Affiliation(s)
- Juexin Wang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China.
- Department of Computer Science, Informatics Institute, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA.
| | - Trupti Joshi
- Department of Computer Science, Informatics Institute, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA.
| | - Babu Valliyodan
- Division of Plant Sciences and National Center for Soybean Biotechnology (NCSB), University of Missouri, Columbia, MO, USA.
| | - Haiying Shi
- Division of Plant Sciences and National Center for Soybean Biotechnology (NCSB), University of Missouri, Columbia, MO, USA.
| | - Yanchun Liang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China.
- Department of Computer Science, Informatics Institute, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA.
| | - Henry T Nguyen
- Division of Plant Sciences and National Center for Soybean Biotechnology (NCSB), University of Missouri, Columbia, MO, USA.
| | - Jing Zhang
- Department of Statistics, Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.
- Department of Mathematics and Statistics, Georgia State University, Atlanta, GA, USA.
| | - Dong Xu
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China.
- Department of Computer Science, Informatics Institute, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA.
| |
Collapse
|
12
|
Ray D, Li X, Pan W, Pankow JS, Basu S. A Bayesian Partitioning Model for the Detection of Multilocus Effects in Case-Control Studies. Hum Hered 2015; 79:69-79. [PMID: 26044550 DOI: 10.1159/000369858] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2014] [Accepted: 11/12/2014] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND Genome-wide association studies (GWASs) have identified hundreds of genetic variants associated with complex diseases, but these variants appear to explain very little of the disease heritability. The typical single-locus association analysis in a GWAS fails to detect variants with small effect sizes and to capture higher-order interaction among these variants. Multilocus association analysis provides a powerful alternative by jointly modeling the variants within a gene or a pathway and by reducing the burden of multiple hypothesis testing in a GWAS. METHODS Here, we propose a powerful and flexible dimension reduction approach to model multilocus association. We use a Bayesian partitioning model which clusters SNPs according to their direction of association, models higher-order interactions using a flexible scoring scheme and uses posterior marginal probabilities to detect association between the SNP set and the disease. RESULTS We illustrate our method using extensive simulation studies and applying it to detect multilocus interaction in Atherosclerosis Risk in Communities (ARIC) GWAS with type 2 diabetes. CONCLUSION We demonstrate that our approach has better power to detect multilocus interactions than several existing approaches. When applied to the ARIC study dataset with 9,328 individuals to study gene-based associations for type 2 diabetes, our method identified some novel variants not detected by conventional single-locus association analyses.
Collapse
Affiliation(s)
- Debashree Ray
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minn., USA
| | | | | | | | | |
Collapse
|
13
|
Kozyryev I, Zhang J. Bayesian analysis of complex interacting mutations in HIV drug resistance and cross-resistance. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2015; 827:367-83. [PMID: 25387976 DOI: 10.1007/978-94-017-9245-5_22] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
A successful treatment of AIDS world-wide is severely hindered by the HIV virus' drug resistance capability resulting from complicated mutation patterns of viral proteins. Such a system of mutations enables the virus to survive and reproduce despite the presence of various antiretroviral drugs by disrupting their binding capability. Although these interacting mutation patterns are extremely difficult to efficiently uncover and interpret, they contribute valuable information to personalized therapeutic regimen design. The use of Bayesian statistical modeling provides an unprecedented opportunity in the field of anti-HIV therapy to understand detailed interaction structures of drug resistant mutations. Multiple Bayesian models equipped with Markov Chain Monte Carlo (MCMC) methods have been recently proposed in this field (Zhang et al. in PNAS 107:1321, 2010 [1]; Zhang et al. in J Proteome Sci Comput Biol 1:2, 2012 [2]; Svicher et al. in Antiviral Res 93(1):86-93, 2012 [3]; Svicher et al. in Antiviral Therapy 16(7):1035-1045, 2011 [4]; Svicher et al. in Antiviral Ther 16(4):A14-A14, 2011 [5]; Svicher et al. in Antiviral Ther 16(4):A85-A85, 2011 [6]; Alteri et al. in Signature mutations in V3 and bridging sheet domain of HIV-1 gp120 HIV-1 are specifically associated with dual tropism and modulate the interaction with CCR5 N-Terminus, 2011 [7]). Probabilistically modeling mutations in the HIV-1 protease or reverse transcriptase (RT) isolated from drug-treated patients provides a powerful statistical procedure that first detects mutation combinations associated with single or multiple-drug resistance, and then infers detailed dependence structures among the interacting mutations in viral proteins (Zhang et al. in PNAS 107:1321, 2010 [1]; Zhang et al. in J Proteome Sci Comput Biol 1:2, 2012 [2]). Combined with molecular dynamics simulations and free energy calculations, Bayesian analysis predictions help to uncover genetic and structural mechanisms in the HIV treatment resistance. Results obtained with such stochastic methods pave the way not only for optimization of the use for existing HIV drugs, but also for the development of the new more efficient antiretroviral medicines. In this chapter we survey current challenges in the bioinformatics of anti-HIV therapy, and outline how recently emerged Bayesian methods can help with the clinical management of HIV-1 infection. We will provide a rigorous review of the Bayesian variable partition model and the recursive model selection procedure based on probability theory and mathematical data analysis techniques while highlighting real applications in HIV and HBV studies including HIV drug resistance (Zhang et al. in PNAS 107:1321, 2010 [1]), cross-resistance (Zhang et al. in J Proteome Sci Comput Biol 1:2, 2012 [2]), HIV coreceptor usage (Svicher et al. in Antiviral Therapy 16(7):1035-1045, 2011 [4]; Svicher et al. in Antiviral Ther 16(4):A14-A14, 2011 [5]; Alteri et al. in Signature mutations in V3 and bridging sheet domain of HIV-1 gp120 HIV-1 are specifically associated with dual tropism and modulate the interaction with CCR5 N-Terminus, 2011 [7]), and occult HBV infection (Svicher et al. in Antiviral Res 93(1):86-93, 2012 [3]; Svicher et al. in Antiviral Ther 16(4):A85-A85, 2011 [6]).
Collapse
Affiliation(s)
- Ivan Kozyryev
- Department of Physics, Harvard University, Cambridge, MA, USA
| | | |
Collapse
|
14
|
Serão NVL, Matika O, Kemp RA, Harding JCS, Bishop SC, Plastow GS, Dekkers JCM. Genetic analysis of reproductive traits and antibody response in a PRRS outbreak herd. J Anim Sci 2014; 92:2905-21. [PMID: 24879764 DOI: 10.2527/jas.2014-7821] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Porcine reproductive and respiratory syndrome (PRRS) is the most economically significant disease impacting pig production in North America, Europe, and Asia, causing reproductive losses such as increased rates of stillbirth and mummified piglets. The objective of this study was to explore the genetic basis of host response to the PRRS virus (PRRSV) in a commercial multiplier sow herd before and after a PRRS outbreak, using antibody response and reproductive traits. Reproductive data comprising number born alive (NBA), number alive at 24 h (NA24), number stillborn (NSB), number born mummified (NBM), proportion born dead (PBD), number born dead (NBD), number weaned (NW), and number of mortalities through weaning (MW) of 5,227 litters from 1,967 purebred Landrace sows were used along with a pedigree comprising 2,995 pigs. The PRRS outbreak date was estimated from rolling averages of farrowing traits and was used to split the data into a pre-PRRS phase and a PRRS phase. All 641 sows in the herd during the outbreak were blood sampled 46 d after the estimated outbreak date and were tested for anti-PRRSV IgG using ELISA (sample-to-positive [S/P] ratio). Genetic parameters of traits were estimated separately for the pre-PRRS and PRRS phase data sets. Sows were genotyped using the PorcineSNP60 BeadChip, and genome-wide association studies (GWAS) were performed using method Bayes B. Heritability estimates for reproductive traits ranged from 0.01 (NBM) to 0.12 (NSB) and from 0.01 (MW) to 0.12 (NBD) for the pre-PRRS and PRRS phases, respectively. S/P ratio had heritability (0.45) and strong genetic correlations with most traits, ranging from -0.72 (NBM) to 0.73 (NBA). In the pre-PRRS phase, regions associated with NSB and PBD explained 1.6% and 3% of the genetic variance, respectively. In the PRRS phase, regions associated with NBD, NSB, and S/P ratio explained 0.8%, 11%, and 50.6% of the genetic variance, respectively. For S/P ratio, 2 regions on SSC 7 (SSC7) separated by 100 Mb explained 40% of the genetic variation, including a region encompassing the major histocompatibility complex, which explained 25% of the genetic variance. These results indicate a significant genomic component associated with PRRSV antibody response and NSB in this data set. Also, the high heritability and genetic correlation estimates for S/P ratio during the PRRS phase suggest that S/P ratio could be used as an indicator of the impact of PRRS on reproductive traits.
Collapse
Affiliation(s)
- N V L Serão
- Department of Animal Science, Iowa State University, Ames 50011
| | - O Matika
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian EH25 9RG, UK
| | - R A Kemp
- Genesus, Oakville, MB R0H 0Y0, Canada
| | - J C S Harding
- Department of Large Animal Clinical Sciences, University of Saskatchewan, Saskatoon, SK S7N 5A1, Canada
| | - S C Bishop
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian EH25 9RG, UK
| | - G S Plastow
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB T6G 2R3, Canada
| | - J C M Dekkers
- Department of Animal Science, Iowa State University, Ames 50011
| |
Collapse
|
15
|
Jahromi MM. Haplotype specific alteration of diabetes MHC risk by olfactory receptor gene polymorphism. Autoimmun Rev 2012; 12:270-4. [DOI: 10.1016/j.autrev.2012.05.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2012] [Accepted: 04/23/2012] [Indexed: 12/12/2022]
|
16
|
Luss R, Rosset S, Shahar M. Efficient regularized isotonic regression with application to gene–gene interaction search. Ann Appl Stat 2012. [DOI: 10.1214/11-aoas504] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
17
|
Zhang Y. A novel bayesian graphical model for genome-wide multi-SNP association mapping. Genet Epidemiol 2011; 36:36-47. [PMID: 22127647 DOI: 10.1002/gepi.20661] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2011] [Revised: 09/20/2011] [Accepted: 10/05/2011] [Indexed: 11/10/2022]
Abstract
Most disease association mapping algorithms are based on hypothesis testing procedures that test one variant at a time. Those methods lose power when the disease mutations are jointly tagged by multiple variants, or when gene-gene interaction exist. Nearby variants are also correlated, for which procedures ignoring the dependence between variants will inevitably produce redundant results. With a large number of variants genotyped in current genome-wide disease association studies, simultaneous multivariant association mapping algorithms are strongly desired. We present a novel Bayesian method for automatic detection of multivariant joint association in genome-wide case-control studies. Our method has improved power and specificity over existing tools. We fit a joint probabilistic model to the entire data and identify disease variants simultaneously. The method dynamically accounts for the strong linkage disequilibrium (LD) between variants. As a result, only the primary disease variants will be identified, with all secondary associations due to LD effects filtered out. Our method better pinpoints the disease variants with improved resolution. The method is also computationally efficient for genome-wide studies. When applied to a real data set of inflammatory bowel disease (IBD) containing 401,473 variants in 4,720 individuals, our method detected all previously reported IBD loci in the same data, and recovered two missed loci. We further detected two novel interchromosome interactions. The first is between STAT3 and PARD6G, and the second is between DLG5 and an intergenic region at 5p14. We further validated the two interactions in an independent study.
Collapse
Affiliation(s)
- Yu Zhang
- Department of Statistics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA.
| |
Collapse
|
18
|
Pan W, Basu S, Shen X. Adaptive tests for detecting gene-gene and gene-environment interactions. Hum Hered 2011; 72:98-109. [PMID: 21934325 DOI: 10.1159/000330632] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2011] [Accepted: 07/02/2011] [Indexed: 12/14/2022] Open
Abstract
There has been an increasing interest in detecting gene-gene and gene-environment interactions in genetic association studies. A major statistical challenge is how to deal with a large number of parameters measuring possible interaction effects, which leads to reduced power of any statistical test due to a large number of degrees of freedom or high cost of adjustment for multiple testing. Hence, a popular idea is to first apply some dimension reduction techniques before testing, while another is to apply only statistical tests that are developed for and robust to high-dimensional data. To combine both ideas, we propose applying an adaptive sum of squared score (SSU) test and several other adaptive tests. These adaptive tests are extensions of the adaptive Neyman test [Fan, 1996], which was originally proposed for high-dimensional data, providing a simple and effective way for dimension reduction. On the other hand, the original SSU test coincides with a version of a test specifically developed for high-dimensional data. We apply these adaptive tests and their original nonadaptive versions to simulated data to detect interactions between two groups of SNPs (e.g. multiple SNPs in two candidate regions). We found that for sparse models (i.e. with only few non-zero interaction parameters), the adaptive SSU test and its close variant, an adaptive version of the weighted sum of squared score (SSUw) test, improved the power over their non-adaptive versions, and performed consistently well across various scenarios. The proposed adaptive tests are built in the general framework of regression analysis, and can thus be applied to various types of traits in the presence of covariates.
Collapse
Affiliation(s)
- Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, USA. weip @ biostat.umn.edu
| | | | | |
Collapse
|
19
|
Zhang BY, Zhang J, Liu JS. BLOCK-BASED BAYESIAN EPISTASIS ASSOCIATION MAPPING WITH APPLICATION TO WTCCC TYPE 1 DIABETES DATA. Ann Appl Stat 2011; 5:2052-2077. [PMID: 22140419 PMCID: PMC3226821 DOI: 10.1214/11-aoas469] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Interactions among multiple genes across the genome may contribute to the risks of many complex human diseases. Whole-genome single nucleotide polymorphisms (SNPs) data collected for many thousands of SNP markers from thousands of individuals under the case-control design promise to shed light on our understanding of such interactions. However, nearby SNPs are highly correlated due to linkage disequilibrium (LD) and the number of possible interactions is too large for exhaustive evaluation. We propose a novel Bayesian method for simultaneously partitioning SNPs into LD-blocks and selecting SNPs within blocks that are associated with the disease, either individually or interactively with other SNPs. When applied to homogeneous population data, the method gives posterior probabilities for LD-block boundaries, which not only result in accurate block partitions of SNPs, but also provide measures of partition uncertainty. When applied to case-control data for association mapping, the method implicitly filters out SNP associations created merely by LD with disease loci within the same blocks. Simulation study showed that this approach is more powerful in detecting multi-locus associations than other methods we tested, including one of ours. When applied to the WTCCC type 1 diabetes data, the method identified many previously known T1D associated genes, including PTPN22, CTLA4, MHC, and IL2RA. The method also revealed some interesting two-way associations that are undetected by single SNP methods. Most of the significant associations are located within the MHC region. Our analysis showed that the MHC SNPs form long-distance joint associations over several known recombination hotspots. By controlling the haplotypes of the MHC class II region, we identified additional associations in both MHC class I (HLA-A, HLA-B) and class III regions (BAT1). We also observed significant interactions between genes PRSS16, ZNF184 in the extended MHC region and the MHC class II genes. The proposed method can be broadly applied to the classification problem with correlated discrete covariates.
Collapse
Affiliation(s)
- By Yu Zhang
- Department of Statistics, Pennsylvania State University, 422A Thomas, University Park, Pennsylvania 16802, USA
| | - Jing Zhang
- Department of Statistics, Yale University, 24 Hillhouse Ave., New Haven, Connecticut 06511, USA
| | - Jun S. Liu
- Department of Statistics, Harvard University, Science Center, 1 Oxford St., Cambridge, Massachusetts 02138, USA
| |
Collapse
|