1
|
Hajiaghabozorgi M, Fischbach M, Albrecht M, Wang W, Myers CL. BridGE: a pathway-based analysis tool for detecting genetic interactions from GWAS. Nat Protoc 2024; 19:1400-1435. [PMID: 38514837 DOI: 10.1038/s41596-024-00954-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 11/22/2023] [Indexed: 03/23/2024]
Abstract
Genetic interactions have the potential to modulate phenotypes, including human disease. In principle, genome-wide association studies (GWAS) provide a platform for detecting genetic interactions; however, traditional methods for identifying them, which tend to focus on testing individual variant pairs, lack statistical power. In this protocol, we describe a novel computational approach, called Bridging Gene sets with Epistasis (BridGE), for discovering genetic interactions between biological pathways from GWAS data. We present a Python-based implementation of BridGE along with instructions for its application to a typical human GWAS cohort. The major stages include initial data processing and quality control, construction of a variant-level genetic interaction network, measurement of pathway-level genetic interactions, evaluation of statistical significance using sample permutations and generation of results in a standardized output format. The BridGE software pipeline includes options for running the analysis on multiple cores and multiple nodes for users who have access to computing clusters or a cloud computing environment. In a cluster computing environment with 10 nodes and 100 GB of memory per node, the method can be run in less than 24 h for typical human GWAS cohorts. Using BridGE requires knowledge of running Python programs and basic shell script programming experience.
Collapse
Affiliation(s)
- Mehrad Hajiaghabozorgi
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
| | - Mathew Fischbach
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
- Graduate Program in Bioinformatics and Computational Biology (BICB), University of Minnesota, Minneapolis, MN, USA
| | - Michael Albrecht
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
| | - Wen Wang
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA.
| | - Chad L Myers
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA.
- Graduate Program in Bioinformatics and Computational Biology (BICB), University of Minnesota, Minneapolis, MN, USA.
| |
Collapse
|
2
|
Gómez-Sánchez G, Alonso L, Pérez MÁ, Morán I, Torrents D, Berral JL. Exhaustive Variant Interaction Analysis using Multifactor Dimensionality Reduction. RESEARCH SQUARE 2023:rs.3.rs-3401025. [PMID: 37886566 PMCID: PMC10602162 DOI: 10.21203/rs.3.rs-3401025/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]
Abstract
One of the main goals of human genetics is to understand the connections between genomic variation and the predisposition to develop a complex disorder. These disease-variant associations are usually studied in a single independent manner, disregarding the possible effect derived from the interaction between genomic variants. In particular, in a background of complex diseases, these interactions can be directly linked to the disorder and may play an important role in disease development. Although their study has been suggested to help to complete the understanding of the genetic bases of complex diseases, this still represents a big challenge due to large computing demands. Here, we have taken advantage of High-Performance Computing technologies to tackle this problem using a combination of machine learning methods and statistical approaches. As a result, we have created a containerized framework that uses Multifactor Dimensionality Reduction to detect pairs of variants associated with Type 2 Diabetes (T2D). This methodology has been tested in the Northwestern University NUgene project cohort using a dataset of 1,883,192 variant pairs with a certain degree of association with T2D. Out of the pairs studied, we have identified 104 significant pairs, two of which exhibit a potential functional relationship with T2D.
Collapse
Affiliation(s)
- Gonzalo Gómez-Sánchez
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- Universitat Politècnica de Catalunya - BarcelonaTECH, Barcelona, Spain
| | - Lorena Alonso
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | | | - Ignasi Morán
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - David Torrents
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- Institut Català de Recerca i Estudis Avançats, Barcelona, Spain
| | - Josep Ll. Berral
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- Universitat Politècnica de Catalunya - BarcelonaTECH, Barcelona, Spain
| |
Collapse
|
3
|
Chen D, Li J, Liu H, Liu X, Zhang C, Luo H, Wei Y, Xi Y, Liang H, Zhang Q. Genome-Wide Epistasis Study of Cerebrospinal Fluid Hyperphosphorylated Tau in ADNI Cohort. Genes (Basel) 2023; 14:1322. [PMID: 37510227 PMCID: PMC10379656 DOI: 10.3390/genes14071322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 06/19/2023] [Accepted: 06/20/2023] [Indexed: 07/30/2023] Open
Abstract
Alzheimer's disease (AD) is the main cause of dementia worldwide, and the genetic mechanism of which is not yet fully understood. Much evidence has accumulated over the past decade to suggest that after the first large-scale genome-wide association studies (GWAS) were conducted, the problem of "missing heritability" in AD is still a great challenge. Epistasis has been considered as one of the main causes of "missing heritability" in AD, which has been largely ignored in human genetics. The focus of current genome-wide epistasis studies is usually on single nucleotide polymorphisms (SNPs) that have significant individual effects, and the amount of heritability explained by which was very low. Moreover, AD is characterized by progressive cognitive decline and neuronal damage, and some studies have suggested that hyperphosphorylated tau (P-tau) mediates neuronal death by inducing necroptosis and inflammation in AD. Therefore, this study focused on identifying epistasis between two-marker interactions at marginal main effects across the whole genome using cerebrospinal fluid (CSF) P-tau as quantitative trait (QT). We sought to detect interactions between SNPs in a multi-GPU based linear regression method by using age, gender, and clinical diagnostic status (cds) as covariates. We then used the STRING online tool to perform the PPI network and identify two-marker epistasis at the level of gene-gene interaction. A total of 758 SNP pairs were found to be statistically significant. Particularly, between the marginal main effect SNP pairs, highly significant SNP-SNP interactions were identified, which explained a relatively high variance at the P-tau level. In addition, 331 AD-related genes were identified, 10 gene-gene interaction pairs were replicated in the PPI network. The identified gene-gene interactions and genes showed associations with AD in terms of neuroinflammation and neurodegeneration, neuronal cells activation and brain development, thereby leading to cognitive decline in AD, which is indirectly associated with the P-tau pathological feature of AD and in turn supports the results of this study. Thus, the results of our study might be beneficial for explaining part of the "missing heritability" of AD.
Collapse
Affiliation(s)
- Dandan Chen
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China
- School of Automation Engineering, Northeast Electric Power University, Jilin 132012, China
| | - Jin Li
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China
| | - Hongwei Liu
- School of Computer Science, Northeast Electric Power University, Jilin 132012, China
| | - Xiaolong Liu
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China
| | - Chenghao Zhang
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China
| | - Haoran Luo
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China
| | - Yiming Wei
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China
| | - Yang Xi
- School of Computer Science, Northeast Electric Power University, Jilin 132012, China
| | - Hong Liang
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China
| | - Qiushi Zhang
- School of Computer Science, Northeast Electric Power University, Jilin 132012, China
| |
Collapse
|
4
|
Evaluating the detection ability of a range of epistasis detection methods on simulated data for pure and impure epistatic models. PLoS One 2022; 17:e0263390. [PMID: 35180244 PMCID: PMC8856572 DOI: 10.1371/journal.pone.0263390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 01/18/2022] [Indexed: 11/19/2022] Open
Abstract
Background Numerous approaches have been proposed for the detection of epistatic interactions within GWAS datasets in order to better understand the drivers of disease and genetics. Methods A selection of state-of-the-art approaches were assessed. These included the statistical tests, fast-epistasis, BOOST, logistic regression and wtest; swarm intelligence methods, namely AntEpiSeeker, epiACO and CINOEDV; and data mining approaches, including MDR, GSS, SNPRuler and MPI3SNP. Data were simulated to provide randomly generated models with no individual main effects at different heritabilities (pure epistasis) as well as models based on penetrance tables with some main effects (impure epistasis). Detection of both two and three locus interactions were assessed across a total of 1,560 simulated datasets. The different methods were also applied to a section of the UK biobank cohort for Atrial Fibrillation. Results For pure, two locus interactions, PLINK’s implementation of BOOST recovered the highest number of correct interactions, with 53.9% and significantly better performing than the other methods (p = 4.52e − 36). For impure two locus interactions, MDR exhibited the best performance, recovering 62.2% of the most significant impure epistatic interactions (p = 6.31e − 90 for all but one test). The assessment of three locus interaction prediction revealed that wtest recovered the highest number (17.2%) of pure epistatic interactions(p = 8.49e − 14). wtest also recovered the highest number of three locus impure epistatic interactions (p = 6.76e − 48) while AntEpiSeeker ranked as the most significant the highest number of such interactions (40.5%). Finally, when applied to a real dataset for Atrial Fibrillation, most notably finding an interaction between SYNE2 and DTNB.
Collapse
|
5
|
Amorim ST, Stafuzza NB, Kluska S, Peripolli E, Pereira ASC, Muller da Silveira LF, de Albuquerque LG, Baldi F. Genome-wide interaction study reveals epistatic interactions for beef lipid-related traits in Nellore cattle. Anim Genet 2021; 53:35-48. [PMID: 34407235 DOI: 10.1111/age.13124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/02/2021] [Indexed: 11/27/2022]
Abstract
Gene-gene interactions cause hidden genetic variation in natural populations and could be responsible for the lack of replication that is typically observed in complex traits studies. This study aimed to identify gene-gene interactions using the empirical Hilbert-Schmidt Independence Criterion method to test for epistasis in beef fatty acid profile traits of Nellore cattle. The dataset contained records from 963 bulls, genotyped using a 777 962k SNP chip. Meat samples of Longissimus muscle, were taken to measure fatty acid composition, which was quantified by gas chromatography. We chose to work with the sums of saturated (SFA), monounsaturated (MUFA), polyunsaturated (PUFA), omega-3 (OM3), omega-6 (OM6), SFA:PUFA and OM3:OM6 fatty acid ratios. The SNPs in the interactions where P < 10 - 8 were mapped individually and used to search for candidate genes. Totals of 602, 3, 13, 23, 13, 215 and 169 candidate genes for SFAs, MUFAs, PUFAs, OM3s, OM6s and SFA:PUFA and OM3:OM6 ratios were identified respectively. The candidate genes found were associated with cholesterol, lipid regulation, low-density lipoprotein receptors, feed efficiency and inflammatory response. Enrichment analysis revealed 57 significant GO and 18 KEGG terms ( P < 0.05), most of them related to meat quality and complementary terms. Our results showed substantial genetic interactions associated with lipid profile, meat quality, carcass and feed efficiency traits for the first time in Nellore cattle. The knowledge of these SNP-SNP interactions could improve understanding of the genetic and physiological mechanisms that contribute to lipid-related traits and improve human health by the selection of healthier meat products.
Collapse
Affiliation(s)
- S T Amorim
- Departamento de Zootecnia, Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista, Via de acesso Prof. Paulo Donato Castellane, s/no, Jaboticabal, CEP 14884-900, Brazil
| | - N B Stafuzza
- Instituto de Zootecnia - Centro de Pesquisa em Bovinos de Corte, Rodovia Carlos Tonanni, Km94, Sertãozinho, 14174-000, Brazil
| | - S Kluska
- Departamento de Zootecnia, Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista, Via de acesso Prof. Paulo Donato Castellane, s/no, Jaboticabal, CEP 14884-900, Brazil
| | - E Peripolli
- Departamento de Zootecnia, Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista, Via de acesso Prof. Paulo Donato Castellane, s/no, Jaboticabal, CEP 14884-900, Brazil
| | - A S C Pereira
- Faculdade de Zootecnia e Engenharia de Alimentos, Núcleo de Apoio à Pesquisa em Melhoramento Animal, Biotecnologia e Transgenia, Universidade de São Paulo, Rua Duque de Caxias Norte, 225, Pirassununga, CEP 13635-900, Brazil
| | - L F Muller da Silveira
- Faculdade de Zootecnia e Engenharia de Alimentos, Núcleo de Apoio à Pesquisa em Melhoramento Animal, Biotecnologia e Transgenia, Universidade de São Paulo, Rua Duque de Caxias Norte, 225, Pirassununga, CEP 13635-900, Brazil
| | - L G de Albuquerque
- Departamento de Zootecnia, Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista, Via de acesso Prof. Paulo Donato Castellane, s/no, Jaboticabal, CEP 14884-900, Brazil
| | - F Baldi
- Departamento de Zootecnia, Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista, Via de acesso Prof. Paulo Donato Castellane, s/no, Jaboticabal, CEP 14884-900, Brazil
| |
Collapse
|
6
|
Johnsen PV, Riemer-Sørensen S, DeWan AT, Cahill ME, Langaas M. A new method for exploring gene-gene and gene-environment interactions in GWAS with tree ensemble methods and SHAP values. BMC Bioinformatics 2021; 22:230. [PMID: 33947323 PMCID: PMC8097909 DOI: 10.1186/s12859-021-04041-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 02/22/2021] [Indexed: 01/08/2023] Open
Abstract
Background The identification of gene–gene and gene–environment interactions in genome-wide association studies is challenging due to the unknown nature of the interactions and the overwhelmingly large number of possible combinations. Parametric regression models are suitable to look for prespecified interactions. Nonparametric models such as tree ensemble models, with the ability to detect any unspecified interaction, have previously been difficult to interpret. However, with the development of methods for model explainability, it is now possible to interpret tree ensemble models efficiently and with a strong theoretical basis. Results We propose a tree ensemble- and SHAP-based method for identifying as well as interpreting potential gene–gene and gene–environment interactions on large-scale biobank data. A set of independent cross-validation runs are used to implicitly investigate the whole genome. We apply and evaluate the method using data from the UK Biobank with obesity as the phenotype. The results are in line with previous research on obesity as we identify top SNPs previously associated with obesity. We further demonstrate how to interpret and visualize interaction candidates. Conclusions The new method identifies interaction candidates otherwise not detected with parametric regression models. However, further research is needed to evaluate the uncertainties of these candidates. The method can be applied to large-scale biobanks with high-dimensional data. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04041-7.
Collapse
Affiliation(s)
- Pål V Johnsen
- SINTEF DIGITAL, Forskningsveien 1, 0373, Oslo, Norway. .,Department of Mathematical Sciences, Norwegian University of Science and Technology, A. Getz vei 1, 7491, Trondheim, Norway.
| | | | - Andrew Thomas DeWan
- Department of Chronic Disease Epidemiology and Center for Perinatal, Pediatric and Environmental Epidemiology, Yale School of Public Health, 1 Church Street, New Haven, CT, 06510, USA.,Gemini Center for Sepsis Research, Department of Circulation and Medical Imaging, NTNU, Norwegian University of Science and Technology, Prinsesse Kristinas gate 3, 7030, Trondheim, Norway
| | - Megan E Cahill
- Department of Chronic Disease Epidemiology and Center for Perinatal, Pediatric and Environmental Epidemiology, Yale School of Public Health, 1 Church Street, New Haven, CT, 06510, USA
| | - Mette Langaas
- Department of Mathematical Sciences, Norwegian University of Science and Technology, A. Getz vei 1, 7491, Trondheim, Norway
| |
Collapse
|
7
|
Akbarzadeh M, Dehkordi SR, Roudbar MA, Sargolzaei M, Guity K, Sedaghati-Khayat B, Riahi P, Azizi F, Daneshpour MS. GWAS findings improved genomic prediction accuracy of lipid profile traits: Tehran Cardiometabolic Genetic Study. Sci Rep 2021; 11:5780. [PMID: 33707626 PMCID: PMC7952573 DOI: 10.1038/s41598-021-85203-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 02/26/2021] [Indexed: 12/15/2022] Open
Abstract
In recent decades, ongoing GWAS findings discovered novel therapeutic modifications such as whole-genome risk prediction in particular. Here, we proposed a method based on integrating the traditional genomic best linear unbiased prediction (gBLUP) approach with GWAS information to boost genetic prediction accuracy and gene-based heritability estimation. This study was conducted in the framework of the Tehran Cardio-metabolic Genetic study (TCGS) containing 14,827 individuals and 649,932 SNP markers. Five SNP subsets were selected based on GWAS results: top 1%, 5%, 10%, 50% significant SNPs, and reported associated SNPs in previous studies. Furthermore, we randomly selected subsets as large as every five subsets. Prediction accuracy has been investigated on lipid profile traits with a tenfold and 10-repeat cross-validation algorithm by the gBLUP method. Our results revealed that genetic prediction based on selected subsets of SNPs obtained from the dataset outperformed the subsets from previously reported SNPs. Selected SNPs' subsets acquired a more precise prediction than whole SNPs and much higher than randomly selected SNPs. Also, common SNPs with the most captured prediction accuracy in the selected sets caught the highest gene-based heritability. However, it is better to be mindful of the fact that a small number of SNPs obtained from GWAS results could capture a highly notable proportion of variance and prediction accuracy.
Collapse
Affiliation(s)
- Mahdi Akbarzadeh
- Cellular and Molecular Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, POBox: 19195-4763, Tehran, Iran
| | - Saeid Rasekhi Dehkordi
- Cellular and Molecular Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, POBox: 19195-4763, Tehran, Iran
| | - Mahmoud Amiri Roudbar
- Department of Animal Science, Safiabad-Dezful Agricultural and Natural Resources Research and Education Center, Agricultural Research, Education & Extension Organization (AREEO), Dezful, Iran
| | - Mehdi Sargolzaei
- Department of Pathobiology, Ontario Veterinary College, University of Guelph, Guelph, Canada
- Select Sires Inc., Plain City, USA
| | - Kamran Guity
- Cellular and Molecular Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, POBox: 19195-4763, Tehran, Iran
| | - Bahareh Sedaghati-Khayat
- Cellular and Molecular Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, POBox: 19195-4763, Tehran, Iran
| | - Parisa Riahi
- Cellular and Molecular Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, POBox: 19195-4763, Tehran, Iran
| | - Fereidoun Azizi
- Endocrine Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Maryam S Daneshpour
- Cellular and Molecular Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, POBox: 19195-4763, Tehran, Iran.
| |
Collapse
|
8
|
What Can Machine Learning Approaches in Genomics Tell Us about the Molecular Basis of Amyotrophic Lateral Sclerosis? J Pers Med 2020; 10:jpm10040247. [PMID: 33256133 PMCID: PMC7712791 DOI: 10.3390/jpm10040247] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 11/21/2020] [Accepted: 11/23/2020] [Indexed: 02/07/2023] Open
Abstract
Amyotrophic Lateral Sclerosis (ALS) is the most common late-onset motor neuron disorder, but our current knowledge of the molecular mechanisms and pathways underlying this disease remain elusive. This review (1) systematically identifies machine learning studies aimed at the understanding of the genetic architecture of ALS, (2) outlines the main challenges faced and compares the different approaches that have been used to confront them, and (3) compares the experimental designs and results produced by those approaches and describes their reproducibility in terms of biological results and the performances of the machine learning models. The majority of the collected studies incorporated prior knowledge of ALS into their feature selection approaches, and trained their machine learning models using genomic data combined with other types of mined knowledge including functional associations, protein-protein interactions, disease/tissue-specific information, epigenetic data, and known ALS phenotype-genotype associations. The importance of incorporating gene-gene interactions and cis-regulatory elements into the experimental design of future ALS machine learning studies is highlighted. Lastly, it is suggested that future advances in the genomic and machine learning fields will bring about a better understanding of ALS genetic architecture, and enable improved personalized approaches to this and other devastating and complex diseases.
Collapse
|
9
|
|
10
|
Zheng X, Bai J, Ye M, Liu Y, Jin Y, He X. Bivariate genome-wide association study of the growth plasticity of Staphylococcus aureus in coculture with Escherichia coli. Appl Microbiol Biotechnol 2020; 104:5437-5447. [PMID: 32350560 DOI: 10.1007/s00253-020-10636-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Revised: 04/15/2020] [Accepted: 04/20/2020] [Indexed: 12/20/2022]
Abstract
Phenotypic plasticity is the capacity to change the phenotype in response to different environments without alteration of the genotype. Despite sufficient evidence that microorganisms have a major role in the fitness and sickness of eukaryotes, there has been little research regarding microbial phenotypic plasticity. In this study, 45 strains of Staphylococcus aureus were grown for 12 days in both monoculture and in coculture with the same strain of Escherichia coli to create a competitive environment. Cell abundance was determined by quantitative PCR every 24 h, and growth curves of each S. aureus strain under the two sets of conditions were generated. Combined with whole-genome resequencing data, bivariate genome-wide association study (GWAS) was performed to analyze the growth plasticity of S. aureus in coculture. Finally, 20 significant single-nucleotide polymorphisms (eight annotated, seven unannotated, and five non-coding regions) were obtained, which may affect the competitive growth of S. aureus. This study advances genome-wide bacterial growth plasticity research and demonstrates the potential of bivariate GWAS for bacterial phenotypic plasticity research. KEY POINTS: • Growth plasticity of S. aureus was analyzed by bivariate GWAS. • Twenty significant SNPs may affect the growth plasticity of S. aureus.
Collapse
Affiliation(s)
- Xuyang Zheng
- College of Biological Sciences and Biotechnology, Beijing Forestry University, Beijing, 100083, China
| | - Jun Bai
- College of Biological Sciences and Biotechnology, Beijing Forestry University, Beijing, 100083, China
| | - Meixia Ye
- College of Biological Sciences and Biotechnology, Beijing Forestry University, Beijing, 100083, China
- Center for Computational Biology, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, 100083, China
| | - Yanxi Liu
- College of Biological Sciences and Biotechnology, Beijing Forestry University, Beijing, 100083, China
| | - Yi Jin
- College of Biological Sciences and Biotechnology, Beijing Forestry University, Beijing, 100083, China.
| | - Xiaoqing He
- College of Biological Sciences and Biotechnology, Beijing Forestry University, Beijing, 100083, China.
| |
Collapse
|
11
|
Wang H, Yue T, Yang J, Wu W, Xing EP. Deep mixed model for marginal epistasis detection and population stratification correction in genome-wide association studies. BMC Bioinformatics 2019; 20:656. [PMID: 31881907 PMCID: PMC6933893 DOI: 10.1186/s12859-019-3300-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2019] [Accepted: 12/02/2019] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Genome-wide Association Studies (GWAS) have contributed to unraveling associations between genetic variants in the human genome and complex traits for more than a decade. While many works have been invented as follow-ups to detect interactions between SNPs, epistasis are still yet to be modeled and discovered more thoroughly. RESULTS In this paper, following the previous study of detecting marginal epistasis signals, and motivated by the universal approximation power of deep learning, we propose a neural network method that can potentially model arbitrary interactions between SNPs in genetic association studies as an extension to the mixed models in correcting confounding factors. Our method, namely Deep Mixed Model, consists of two components: 1) a confounding factor correction component, which is a large-kernel convolution neural network that focuses on calibrating the residual phenotypes by removing factors such as population stratification, and 2) a fixed-effect estimation component, which mainly consists of an Long-short Term Memory (LSTM) model that estimates the association effect size of SNPs with the residual phenotype. CONCLUSIONS After validating the performance of our method using simulation experiments, we further apply it to Alzheimer's disease data sets. Our results help gain some explorative understandings of the genetic architecture of Alzheimer's disease.
Collapse
Affiliation(s)
- Haohan Wang
- Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA USA
| | - Tianwei Yue
- Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA USA
| | - Jingkang Yang
- Department of Electrical and Computer Engineering, Rice University, Houston, TX USA
| | - Wei Wu
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA USA
| | - Eric P. Xing
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA USA
| |
Collapse
|
12
|
Ansarifar J, Wang L. New algorithms for detecting multi-effect and multi-way epistatic interactions. Bioinformatics 2019; 35:5078-5085. [DOI: 10.1093/bioinformatics/btz463] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Revised: 04/14/2019] [Accepted: 05/31/2019] [Indexed: 11/14/2022] Open
Abstract
AbstractMotivationEpistasis, which is the phenomenon of genetic interactions, plays a central role in many scientific discoveries. However, due to the combinatorial nature of the problem, it is extremely challenging to decipher the exact combinations of genes that trigger the epistatic effects. Many existing methods only focus on two-way interactions. Some of the most effective methods used machine learning techniques, but many were designed for special case-and-control studies or suffer from overfitting. We propose three new algorithms for multi-effect and multi-way epistases detection, with one guaranteeing global optimality and the other two being local optimization oriented heuristics.ResultsThe computational performance of the proposed heuristic algorithm was compared with several state-of-the-art methods using a yeast dataset. Results suggested that searching for the global optimal solution could be extremely time consuming, but the proposed heuristic algorithm was much more effective and efficient than others at finding a close-to-optimal solution. Moreover, it was able to provide biological insight on the exact configurations of epistases, besides achieving a higher prediction accuracy than the state-of-the-art methods.Availability and implementationData source was publicly available and details are provided in the text.
Collapse
|
13
|
Rong M, Zheng X, Ye M, Bai J, Xie X, Jin Y, He X. Phenotypic Plasticity of Staphylococcus aureus in Liquid Medium Containing Vancomycin. Front Microbiol 2019; 10:809. [PMID: 31057516 PMCID: PMC6477096 DOI: 10.3389/fmicb.2019.00809] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2018] [Accepted: 03/29/2019] [Indexed: 12/17/2022] Open
Abstract
Phenotypic plasticity enables individuals to develop different phenotypes in a changing environment and promotes adaptive evolution. Genome-wide association study (GWAS) facilitates the study of the genetic basis of bacterial phenotypes, and provides a new opportunity for bacterial phenotypic plasticity research. To investigate the relationship between growth plasticity and genotype in bacteria, 41 Staphylococcus aureus strains, including 29 vancomycin-intermediate S. aureus (VISA) strains, were inoculated in the absence or presence of vancomycin for 48 h. Growth curves and maximum growth rates revealed that strains with the same minimum inhibitory concentration (MIC) showed different levels of plasticity in response to vancomycin. A bivariate GWAS was performed to map single-nucleotide polymorphisms (SNPs) associated with growth plasticity. In total, 227 SNPs were identified from 14 time points, while 15 high-frequency SNPs were mapped to different annotated genes. The P-values and growth variations between the two cultures suggest that non-coding region (SNP 738836), ebh (SNP 1394043), drug transporter (SNP 264897), and pepV (SNP 1775112) play important roles in the growth plasticity of S. aureus. Our study provides an alternative strategy for dissecting the adaptive growth of S. aureus in vancomycin and highlights the feasibility of bivariate GWAS in bacterial phenotypic plasticity research.
Collapse
Affiliation(s)
- Mengdi Rong
- College of Biological Sciences and Biotechnology, Beijing Forestry University, Beijing, China
| | - Xuyang Zheng
- College of Biological Sciences and Biotechnology, Beijing Forestry University, Beijing, China
| | - Meixia Ye
- College of Biological Sciences and Biotechnology, Beijing Forestry University, Beijing, China.,Center for Computational Biology, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Jun Bai
- College of Biological Sciences and Biotechnology, Beijing Forestry University, Beijing, China
| | - Xiangming Xie
- College of Biological Sciences and Biotechnology, Beijing Forestry University, Beijing, China
| | - Yi Jin
- College of Biological Sciences and Biotechnology, Beijing Forestry University, Beijing, China
| | - Xiaoqing He
- College of Biological Sciences and Biotechnology, Beijing Forestry University, Beijing, China
| |
Collapse
|
14
|
Du Q, Lu W, Quan M, Xiao L, Song F, Li P, Zhou D, Xie J, Wang L, Zhang D. Genome-Wide Association Studies to Improve Wood Properties: Challenges and Prospects. FRONTIERS IN PLANT SCIENCE 2018; 9:1912. [PMID: 30622554 PMCID: PMC6309013 DOI: 10.3389/fpls.2018.01912] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Accepted: 12/10/2018] [Indexed: 05/02/2023]
Abstract
Wood formation is an excellent model system for quantitative trait analysis due to the strong associations between the transcriptional and metabolic traits that contribute to this complex process. Investigating the genetic architecture and regulatory mechanisms underlying wood formation will enhance our understanding of the quantitative genetics and genomics of complex phenotypic variation. Genome-wide association studies (GWASs) represent an ideal statistical strategy for dissecting the genetic basis of complex quantitative traits. However, elucidating the molecular mechanisms underlying many favorable loci that contribute to wood formation and optimizing GWAS design remain challenging in this omics era. In this review, we summarize the recent progress in GWAS-based functional genomics of wood property traits in major timber species such as Eucalyptus, Populus, and various coniferous species. We discuss several appropriate experimental designs for extensive GWAS in a given undomesticated tree population, such as omics-wide association studies and high-throughput phenotyping technologies. We also explain why more attention should be paid to rare allelic and major structural variation. Finally, we explore the potential use of GWAS for the molecular breeding of trees. Such studies will help provide an integrated understanding of complex quantitative traits and should enable the molecular design of new cultivars.
Collapse
Affiliation(s)
- Qingzhang Du
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Beijing Forestry University, Beijing, China
- National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Wenjie Lu
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Beijing Forestry University, Beijing, China
- National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Mingyang Quan
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Beijing Forestry University, Beijing, China
- National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Liang Xiao
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Beijing Forestry University, Beijing, China
- National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Fangyuan Song
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Beijing Forestry University, Beijing, China
- National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Peng Li
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Beijing Forestry University, Beijing, China
- National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Daling Zhou
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Beijing Forestry University, Beijing, China
- National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Jianbo Xie
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Beijing Forestry University, Beijing, China
- National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Longxin Wang
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Beijing Forestry University, Beijing, China
- National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Deqiang Zhang
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Beijing Forestry University, Beijing, China
- National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| |
Collapse
|
15
|
Lee KY, Leung KS, Tang NLS, Wong MH. Discovering Genetic Factors for psoriasis through exhaustively searching for significant second order SNP-SNP interactions. Sci Rep 2018; 8:15186. [PMID: 30315195 PMCID: PMC6185942 DOI: 10.1038/s41598-018-33493-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Accepted: 09/28/2018] [Indexed: 12/24/2022] Open
Abstract
In this paper, we aim at discovering genetic factors of psoriasis through searching for statistically significant SNP-SNP interactions exhaustively from two real psoriasis genome-wide association study datasets (phs000019.v1.p1 and phs000982.v1.p1) downloaded from the database of Genotypes and Phenotypes. To deal with the enormous search space, our search algorithm is accelerated with eight biological plausible interaction patterns and a pre-computed look-up table. After our search, we have discovered several SNPs having a stronger association to psoriasis when they are in combination with another SNP and these combinations may be non-linear interactions. Among the top 20 SNP-SNP interactions being found in terms of pairwise p-value and improvement metric value, we have discovered 27 novel potential psoriasis-associated SNPs where most of them are reported to be eQTLs of a number of known psoriasis-associated genes. On the other hand, we have inferred a gene network after selecting the top 10000 SNP-SNP interactions in terms of improvement metric value and we have discovered a novel long distance interaction between XXbac-BPG154L12.4 and RNU6-283P which is not a long distance haplotype and may be a new discovery. Finally, our experiments with the synthetic datasets have shown that our pre-computed look-up table technique can significantly speed up the search process.
Collapse
Affiliation(s)
- Kwan-Yeung Lee
- Department of Computer Science and Engineering, the Chinese University of Hong Kong, Hong Kong, China.
| | - Kwong-Sak Leung
- Department of Computer Science and Engineering, the Chinese University of Hong Kong, Hong Kong, China
| | - Nelson L S Tang
- Department of Chemical Pathology, the Chinese University of Hong Kong, Hong Kong, China.
| | - Man-Hon Wong
- Department of Computer Science and Engineering, the Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
16
|
Chatelain C, Durand G, Thuillier V, Augé F. Performance of epistasis detection methods in semi-simulated GWAS. BMC Bioinformatics 2018; 19:231. [PMID: 29914375 PMCID: PMC6006572 DOI: 10.1186/s12859-018-2229-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2017] [Accepted: 06/04/2018] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Part of the missing heritability in Genome Wide Association Studies (GWAS) is expected to be explained by interactions between genetic variants, also called epistasis. Various statistical methods have been developed to detect epistasis in case-control GWAS. These methods face major statistical challenges due to the number of tests required, the complexity of the Linkage Disequilibrium (LD) structure, and the lack of consensus regarding the definition of epistasis. Their limited impact in terms of uncovering new biological knowledge might be explained in part by the limited amount of experimental data available to validate their statistical performances in a realistic GWAS context. In this paper, we introduce a simulation pipeline for generating real scale GWAS data, including epistasis and realistic LD structure. We evaluate five exhaustive bivariate interaction methods, fastepi, GBOOST, SHEsisEpi, DSS, and IndOR. Two hundred thirty four different disease scenarios are considered in extensive simulations. We report the performances of each method in terms of false positive rate control, power, area under the ROC curve (AUC), and computation time using a GPU. Finally we compare the result of each methods on a real GWAS of type 2 diabetes from the Welcome Trust Case Control Consortium. RESULTS GBOOST, SHEsisEpi and DSS allow a satisfactory control of the false positive rate. fastepi and IndOR present an increase in false positive rate in presence of LD between causal SNPs, with our definition of epistasis. DSS performs best in terms of power and AUC in most scenarios with no or weak LD between causal SNPs. All methods can exhaustively analyze a GWAS with 6.105 SNPs and 15,000 samples in a couple of hours using a GPU. CONCLUSION This study confirms that computation time is no longer a limiting factor for performing an exhaustive search of epistasis in large GWAS. For this task, using DSS on SNP pairs with limited LD seems to be a good strategy to achieve the best statistical performance. A combination approach using both DSS and GBOOST is supported by the simulation results and the analysis of the WTCCC dataset demonstrated that this approach can detect distinct genes in epistasis. Finally, weak epistasis between common variants will be detectable with existing methods when GWAS of a few tens of thousands cases and controls are available.
Collapse
Affiliation(s)
| | - Guillermo Durand
- Laboratoire de Probabilités et Modèles Aléatoires, Université Pierre et Marie Curie, 4, place Jussieu, Paris Cedex 05, 75252 France
| | - Vincent Thuillier
- SANOFI R&D, Biostatistics & Programming, Chilly Mazarin, 91385 France
| | - Franck Augé
- SANOFI R&D, Translational Sciences, Chilly Mazarin, 91385 France
| |
Collapse
|
17
|
Niel C, Sinoquet C, Dina C, Rocheleau G. SMMB: a stochastic Markov blanket framework strategy for epistasis detection in GWAS. Bioinformatics 2018; 34:2773-2780. [DOI: 10.1093/bioinformatics/bty154] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2017] [Accepted: 03/09/2018] [Indexed: 12/22/2022] Open
Affiliation(s)
- Clément Niel
- Laboratoire des Sciences du Numérique de Nantes (LS2N), Centre National de la recherche Scientifique UMR6004, University of Nantes, Nantes, France
| | - Christine Sinoquet
- Laboratoire des Sciences du Numérique de Nantes (LS2N), Centre National de la recherche Scientifique UMR6004, University of Nantes, Nantes, France
| | - Christian Dina
- Institut du Thorax, Institut National de la Santé et de la Recherche Médicale UMR 1087, Centre National de la Recherche Scientifique UMR 6291, University of Nantes, Nantes, France
| | - Ghislain Rocheleau
- European Genomic Institute for Diabetes FR3508, Centre National de la Recherche Scientifique UMR 8199, Lille 2 University, Lille, France
| |
Collapse
|
18
|
Uppu S, Krishna A, Gopalan RP. A Review on Methods for Detecting SNP Interactions in High-Dimensional Genomic Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:599-612. [PMID: 28060710 DOI: 10.1109/tcbb.2016.2635125] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this era of genome-wide association studies (GWAS), the quest for understanding the genetic architecture of complex diseases is rapidly increasing more than ever before. The development of high throughput genotyping and next generation sequencing technologies enables genetic epidemiological analysis of large scale data. These advances have led to the identification of a number of single nucleotide polymorphisms (SNPs) responsible for disease susceptibility. The interactions between SNPs associated with complex diseases are increasingly being explored in the current literature. These interaction studies are mathematically challenging and computationally complex. These challenges have been addressed by a number of data mining and machine learning approaches. This paper reviews the current methods and the related software packages to detect the SNP interactions that contribute to diseases. The issues that need to be considered when developing these models are addressed in this review. The paper also reviews the achievements in data simulation to evaluate the performance of these models. Further, it discusses the future of SNP interaction analysis.
Collapse
|
19
|
Liu J, Yu G, Jiang Y, Wang J. HiSeeker: Detecting High-Order SNP Interactions Based on Pairwise SNP Combinations. Genes (Basel) 2017; 8:genes8060153. [PMID: 28561745 PMCID: PMC5485517 DOI: 10.3390/genes8060153] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2017] [Revised: 05/06/2017] [Accepted: 05/25/2017] [Indexed: 01/27/2023] Open
Abstract
Detecting single nucleotide polymorphisms’ (SNPs) interaction is one of the most popular approaches for explaining the missing heritability of common complex diseases in genome-wide association studies. Many methods have been proposed for SNP interaction detection, but most of them only focus on pairwise interactions and ignore high-order ones, which may also contribute to complex traits. Existing methods for high-order interaction detection can hardly handle genome-wide data and suffer from low detection power, due to the exponential growth of search space. In this paper, we proposed a flexible two-stage approach (called HiSeeker) to detect high-order interactions. In the screening stage, HiSeeker employs the chi-squared test and logistic regression model to efficiently obtain candidate pairwise combinations, which have intermediate or significant associations with the phenotype for interaction detection. In the search stage, two different strategies (exhaustive search and ant colony optimization-based search) are utilized to detect high-order interactions from candidate combinations. The experimental results on simulated datasets demonstrate that HiSeeker can more efficiently and effectively detect high-order interactions than related representative algorithms. On two real case-control datasets, HiSeeker also detects several significant high-order interactions, whose individual SNPs and pairwise interactions have no strong main effects or pairwise interaction effects, and these high-order interactions can hardly be identified by related algorithms.
Collapse
Affiliation(s)
- Jie Liu
- College of Computer and Information Science, Southwest University, Chongqing 400715, China.
| | - Guoxian Yu
- College of Computer and Information Science, Southwest University, Chongqing 400715, China.
| | - Yuan Jiang
- College of Computer and Information Science, Southwest University, Chongqing 400715, China.
| | - Jun Wang
- College of Computer and Information Science, Southwest University, Chongqing 400715, China.
| |
Collapse
|
20
|
Li J, Zhang Q, Chen F, Meng X, Liu W, Chen D, Yan J, Kim S, Wang L, Feng W, Saykin AJ, Liang H, Shen L. Genome-wide association and interaction studies of CSF T-tau/Aβ 42 ratio in ADNI cohort. Neurobiol Aging 2017. [PMID: 28641921 DOI: 10.1016/j.neurobiolaging.2017.05.007] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
The pathogenic relevance in Alzheimer's disease (AD) presents a decrease of cerebrospinal fluid amyloid-ß42 (Aß42) burden and an increase in cerebrospinal fluid total tau (T-tau) levels. In this work, we performed genome-wide association study (GWAS) and genome-wide interaction study of T-tau/Aß42 ratio as an AD imaging quantitative trait on 843 subjects and 563,980 single-nucleotide polymorphisms (SNPs) in ADNI cohort. We aim to identify not only SNPs with significant main effects but also SNPs with interaction effects to help explain "missing heritability". Linear regression method was used to detect SNP-SNP interactions among SNPs with uncorrected p-value ≤0.01 from the GWAS. Age, gender, and diagnosis were considered as covariates in both studies. The GWAS results replicated the previously reported AD-related genes APOE, APOC1, and TOMM40, as well as identified 14 novel genes, which showed genome-wide statistical significance. Genome-wide interaction study revealed 7 pairs of SNPs meeting the cell-size criteria and with bonferroni-corrected p-value ≤0.05. As we expect, these interaction pairs all had marginal main effects but explained a relatively high-level variance of T-tau/Aß42, demonstrating their potential association with AD pathology.
Collapse
Affiliation(s)
- Jin Li
- College of Automation, Harbin Engineering University, Harbin, China
| | - Qiushi Zhang
- College of Automation, Harbin Engineering University, Harbin, China; College of Information Engineering, Northeast Dianli University, Jilin, China
| | - Feng Chen
- College of Automation, Harbin Engineering University, Harbin, China
| | - Xianglian Meng
- College of Automation, Harbin Engineering University, Harbin, China
| | - Wenjie Liu
- College of Automation, Harbin Engineering University, Harbin, China
| | - Dandan Chen
- College of Automation, Harbin Engineering University, Harbin, China; College of Information Engineering, Northeast Dianli University, Jilin, China
| | - Jingwen Yan
- Center for Neuroimaging, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA; Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA; Department of BioHealth Informatics, Indiana University School of Informatics and Computing, Indianapolis, IN, USA
| | - Sungeun Kim
- Center for Neuroimaging, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Lei Wang
- College of Automation, Harbin Engineering University, Harbin, China
| | - Weixing Feng
- College of Automation, Harbin Engineering University, Harbin, China
| | - Andrew J Saykin
- Center for Neuroimaging, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Hong Liang
- College of Automation, Harbin Engineering University, Harbin, China.
| | - Li Shen
- Center for Neuroimaging, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA; Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA; Department of BioHealth Informatics, Indiana University School of Informatics and Computing, Indianapolis, IN, USA.
| | | |
Collapse
|
21
|
Goudey B, Abraham G, Kikianty E, Wang Q, Rawlinson D, Shi F, Haviv I, Stern L, Kowalczyk A, Inouye M. Interactions within the MHC contribute to the genetic architecture of celiac disease. PLoS One 2017; 12:e0172826. [PMID: 28282431 PMCID: PMC5345796 DOI: 10.1371/journal.pone.0172826] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2016] [Accepted: 02/10/2017] [Indexed: 01/04/2023] Open
Abstract
Interaction analysis of GWAS can detect signal that would be ignored by single variant analysis, yet few robust interactions in humans have been detected. Recent work has highlighted interactions in the MHC region between known HLA risk haplotypes for various autoimmune diseases. To better understand the genetic interactions underlying celiac disease (CD), we have conducted exhaustive genome-wide scans for pairwise interactions in five independent CD case-control studies, using a rapid model-free approach to examine over 500 billion SNP pairs in total. We found 14 independent interaction signals within the MHC region that achieved stringent replication criteria across multiple studies and were independent of known CD risk HLA haplotypes. The strongest independent CD interaction signal corresponded to genes in the HLA class III region, in particular PRRC2A and GPANK1/C6orf47, which are known to contain variants for non-Hodgkin's lymphoma and early menopause, co-morbidities of celiac disease. Replicable evidence for statistical interaction outside the MHC was not observed. Both within and between European populations, we observed striking consistency of two-locus models and model distribution. Within the UK population, models of CD based on both interactions and additive single-SNP effects increased explained CD variance by approximately 1% over those of single SNPs. The interactions signal detected across the five cohorts indicates the presence of novel associations in the MHC region that cannot be detected using additive models. Our findings have implications for the determination of genetic architecture and, by extension, the use of human genetics for validation of therapeutic targets.
Collapse
Affiliation(s)
- Benjamin Goudey
- NICTA Victoria Research Lab, The University of Melbourne, Parkville, Victoria, Australia
- Centre for Epidemiology and Biostatistics, The University of Melbourne, Parkville, Victoria, Australia
- Department of Computing and Information Systems, The University of Melbourne, Parkville, Victoria, Australia
- IBM Research, Australia, Level 5, Carlton, Victoria, Australia
| | - Gad Abraham
- Centre for Systems Genomics, The University of Melbourne, Parkville, Victoria, Australia
- School of BioSciences, The University of Melbourne, Parkville, Victoria, Australia
- Department of Pathology, The University of Melbourne, Parkville, Victoria, Australia
| | - Eder Kikianty
- Department of Mathematics, University of Johannesburg, Auckland Park, South Africa
| | - Qiao Wang
- NICTA Victoria Research Lab, The University of Melbourne, Parkville, Victoria, Australia
- Centre for Epidemiology and Biostatistics, The University of Melbourne, Parkville, Victoria, Australia
| | - Dave Rawlinson
- NICTA Victoria Research Lab, The University of Melbourne, Parkville, Victoria, Australia
- Centre for Epidemiology and Biostatistics, The University of Melbourne, Parkville, Victoria, Australia
| | - Fan Shi
- NICTA Victoria Research Lab, The University of Melbourne, Parkville, Victoria, Australia
- Centre for Epidemiology and Biostatistics, The University of Melbourne, Parkville, Victoria, Australia
| | - Izhak Haviv
- Faculty of Medicine, Bar Ilan University, Safed, Israel
| | - Linda Stern
- Centre for Epidemiology and Biostatistics, The University of Melbourne, Parkville, Victoria, Australia
| | - Adam Kowalczyk
- NICTA Victoria Research Lab, The University of Melbourne, Parkville, Victoria, Australia
- Centre for Epidemiology and Biostatistics, The University of Melbourne, Parkville, Victoria, Australia
- Center for Neural Engineering, The University of Melbourne, Parkville, Victoria, Australia
| | - Michael Inouye
- Centre for Systems Genomics, The University of Melbourne, Parkville, Victoria, Australia
- School of BioSciences, The University of Melbourne, Parkville, Victoria, Australia
- Department of Pathology, The University of Melbourne, Parkville, Victoria, Australia
- * E-mail:
| |
Collapse
|
22
|
Bao F, Deng Y, Zhao Y, Suo J, Dai Q. Bosco: Boosting Corrections for Genome-Wide Association Studies With Imbalanced Samples. IEEE Trans Nanobioscience 2017; 16:69-77. [PMID: 28141527 DOI: 10.1109/tnb.2017.2660498] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
In genome-wide association studies (GWAS), the acquired sequential data may exhibit imbalance structure: abundant control vs. limited case samples. Such sample imbalance issue is particularly serious when investigating rare diseases or common diseases on rare populations. Conventional GWAS methods may suffer from severe statistic biases to the major group, leading to power losses in uncovering true suspicious loci. We introduce a boosting correction method termed as Bosco to deal with such imbalanced problem. Bosco is motivated by the boost learning theory in machine learning and is implemented in a coarse-to-fine learning framework: the coarse step assigns importance scores for all samples in the major group and the fine step calculates P -values by a weighted logistic regression. On simulated data sets, we demonstrate the proposed methods can dramatically improve the discovery power even on extremely imbalanced datasets, with well controlling the false positives. The Bosco is also applied to a genome-scale gastric cancer data set to conduct genome-wide analysis. Our method replicates existing reported findings (from the likelihood ratio test) with high statistical significance and shows the ability to identify new suspicious SNPs.
Collapse
|
23
|
Use of Information Measures and Their Approximations to Detect Predictive Gene-Gene Interaction. ENTROPY 2017. [DOI: 10.3390/e19010023] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
24
|
Pirih N, Kunej T. Toward a Taxonomy for Multi-Omics Science? Terminology Development for Whole Genome Study Approaches by Omics Technology and Hierarchy. ACTA ACUST UNITED AC 2017; 21:1-16. [DOI: 10.1089/omi.2016.0144] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Affiliation(s)
- Nina Pirih
- Department of Animal Science, Biotechnical Faculty, University of Ljubljana, Domzale, Slovenia
| | - Tanja Kunej
- Department of Animal Science, Biotechnical Faculty, University of Ljubljana, Domzale, Slovenia
| |
Collapse
|
25
|
Kodama K, Saigo H. KDSNP: A kernel-based approach to detecting high-order SNP interactions. J Bioinform Comput Biol 2016; 14:1644003. [DOI: 10.1142/s0219720016440030] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Despite the accumulation of quantitative trait loci (QTL) data in many complex human diseases, most of current approaches that have attempted to relate genotype to phenotype have achieved limited success, and genetic factors of many common diseases are yet remained to be elucidated. One of the reasons that makes this problem complex is the existence of single nucleotide polymorphism (SNP) interaction, or epistasis. Due to excessive amount of computation for searching the combinatorial space, existing approaches cannot fully incorporate high-order SNP interactions into their models, but limit themselves to detecting only lower-order SNP interactions. We present an empirical approach based on ridge regression with polynomial kernels and model selection technique for determining the true degree of epistasis among SNPs. Computer experiments in simulated data show the ability of the proposed method to correctly predict the number of interacting SNPs provided that the number of samples is large enough relative to the number of SNPs. For cases in which the number of the available samples is limited, we propose to perform sliding window approach to ensure sufficiently large sample/SNP ratio in each window. In computational experiments using heterogeneous stock mice data, our approach has successfully detected subregions that harbor known causal SNPs. Our analysis further suggests the existence of additional candidate causal SNPs interacting to each other in the neighborhood of the known causal gene. Software is available from https://github.com/HirotoSaigo/KDSNP .
Collapse
Affiliation(s)
- Kento Kodama
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka 820-8502, Fukuoka, Japan
| | - Hiroto Saigo
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka 820-8502, Fukuoka, Japan
| |
Collapse
|
26
|
Suzuki S, Kakuta M, Ishida T, Akiyama Y. GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering. PLoS One 2016; 11:e0157338. [PMID: 27482905 PMCID: PMC4970815 DOI: 10.1371/journal.pone.0157338] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2015] [Accepted: 05/27/2016] [Indexed: 11/30/2022] Open
Abstract
Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads.
Collapse
Affiliation(s)
- Shuji Suzuki
- Graduate School of Information Science and Engineering, Tokyo Institute of Technology, Meguro-ku, Tokyo, Japan
- Education Academy of Computational Life Sciences (ACLS), Tokyo Institute of Technology, Meguro-ku, Tokyo, Japan
| | - Masanori Kakuta
- Graduate School of Information Science and Engineering, Tokyo Institute of Technology, Meguro-ku, Tokyo, Japan
| | - Takashi Ishida
- Graduate School of Information Science and Engineering, Tokyo Institute of Technology, Meguro-ku, Tokyo, Japan
- Education Academy of Computational Life Sciences (ACLS), Tokyo Institute of Technology, Meguro-ku, Tokyo, Japan
| | - Yutaka Akiyama
- Graduate School of Information Science and Engineering, Tokyo Institute of Technology, Meguro-ku, Tokyo, Japan
- Education Academy of Computational Life Sciences (ACLS), Tokyo Institute of Technology, Meguro-ku, Tokyo, Japan
- * E-mail:
| |
Collapse
|
27
|
Colak R, Kim T, Kazan H, Oh Y, Cruz M, Valladares-Salgado A, Peralta J, Escobedo J, Parra EJ, Kim PM, Goldenberg A. JBASE: Joint Bayesian Analysis of Subphenotypes and Epistasis. Bioinformatics 2016; 32:203-10. [PMID: 26411870 PMCID: PMC4708100 DOI: 10.1093/bioinformatics/btv504] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2014] [Revised: 08/02/2015] [Accepted: 08/24/2015] [Indexed: 01/22/2023] Open
Abstract
MOTIVATION Rapid advances in genotyping and genome-wide association studies have enabled the discovery of many new genotype-phenotype associations at the resolution of individual markers. However, these associations explain only a small proportion of theoretically estimated heritability of most diseases. In this work, we propose an integrative mixture model called JBASE: joint Bayesian analysis of subphenotypes and epistasis. JBASE explores two major reasons of missing heritability: interactions between genetic variants, a phenomenon known as epistasis and phenotypic heterogeneity, addressed via subphenotyping. RESULTS Our extensive simulations in a wide range of scenarios repeatedly demonstrate that JBASE can identify true underlying subphenotypes, including their associated variants and their interactions, with high precision. In the presence of phenotypic heterogeneity, JBASE has higher Power and lower Type 1 Error than five state-of-the-art approaches. We applied our method to a sample of individuals from Mexico with Type 2 diabetes and discovered two novel epistatic modules, including two loci each, that define two subphenotypes characterized by differences in body mass index and waist-to-hip ratio. We successfully replicated these subphenotypes and epistatic modules in an independent dataset from Mexico genotyped with a different platform. AVAILABILITY AND IMPLEMENTATION JBASE is implemented in C++, supported on Linux and is available at http://www.cs.toronto.edu/∼goldenberg/JBASE/jbase.tar.gz. The genotype data underlying this study are available upon approval by the ethics review board of the Medical Centre Siglo XXI. Please contact Dr Miguel Cruz at mcruzl@yahoo.com for assistance with the application. CONTACT anna.goldenberg@utoronto.ca SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Recep Colak
- Department of Computer Science, University of Toronto, M5S 2E4, Toronto, ON, Canada, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, M5S 3E1, Toronto, ON, Canada
| | - TaeHyung Kim
- Department of Computer Science, University of Toronto, M5S 2E4, Toronto, ON, Canada, Department of Computer Engineering, Antalya International University, 07190, Antalya, Turkey
| | - Hilal Kazan
- Department of Computer Engineering, Antalya International University, 07190, Antalya, Turkey
| | - Yoomi Oh
- Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, M5S 3E1, Toronto, ON, Canada, Department of Molecular Genetics, University of Toronto, M5S 1A8, Toronto, ON, Canada
| | - Miguel Cruz
- Unidad de Investigación Médica en Bioquímica, Hospital de Especialidades, IMSS, 06720, Mexico City, Mexico
| | - Adan Valladares-Salgado
- Unidad de Investigación Médica en Bioquímica, Hospital de Especialidades, IMSS, 06720, Mexico City, Mexico
| | - Jesus Peralta
- Unidad de Investigación Médica en Bioquímica, Hospital de Especialidades, IMSS, 06720, Mexico City, Mexico
| | - Jorge Escobedo
- Unidad de Investigación en Epidemiología Clínica, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Esteban J Parra
- Department of Anthropology, University of Toronto, L5L 1C6, Mississauga, ON, Canada
| | - Philip M Kim
- Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, M5S 3E1, Toronto, ON, Canada, Department of Molecular Genetics, University of Toronto, M5S 1A8, Toronto, ON, Canada, Genetics and Genome Biology, Hospital for Sick Children, M5G 0A4, Toronto, ON, Canada and Banting and Best Department of Medical Research, University of Toronto, M5G 1L6, Toronto, ON, Canada
| | - Anna Goldenberg
- Department of Computer Science, University of Toronto, M5S 2E4, Toronto, ON, Canada, Genetics and Genome Biology, Hospital for Sick Children, M5G 0A4, Toronto, ON, Canada and
| |
Collapse
|
28
|
Abstract
In the single locus strategy a number of genetic variants are analyzed, in order to find variants that are distributed significantly different between controls and patients. A supplementary strategy is to analyze combinations of genetic variants. A combination that is the genetic basis for a polygenic disorder will not occur in in control persons genetically unrelated to patients, so the strategy is to analyze combinations of genetic variants present exclusively in patients. In a previous study of oral cancer and leukoplakia 325 SNPs were analyzed. This study has been supplemented with an analysis of combinations of two SNP genotypes from among the 325 SNPs. Two clusters of combinations containing 95 patient specific combinations were significantly associated with oral cancer or leukoplakia. Of 373 patients with oral cancer 205 patients had a number of these 95 combinations in their genome, whereas none of 535 control persons had any of these combinations in their genome.
Collapse
Affiliation(s)
- Erling Mellerup
- Laboratory of Neuropsychiatry, Department of Neuroscience and Pharmacology, Faculty of Health, University of Copenhagen, Denmark
| | - Gert Lykke Moeller
- Genokey ApS, ScionDTU, Technical University of Denmark, Hoersholm, Denmark
| | | | - Susanta Roychoudhury
- Cancer Biology and Inflammatory Disorder Division, CSIR-Indian Institute of Chemical Biology, Kolkata, India
| |
Collapse
|
29
|
Mellerup E, Andreassen OA, Bennike B, Dam H, Djurovic S, Hansen T, Jorgensen MB, Kessing LV, Koefoed P, Melle I, Mors O, Werge T, Moeller GL. Combinations of Genetic Data Present in Bipolar Patients, but Absent in Control Persons. PLoS One 2015; 10:e0143432. [PMID: 26587987 PMCID: PMC4654514 DOI: 10.1371/journal.pone.0143432] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Accepted: 11/04/2015] [Indexed: 11/19/2022] Open
Abstract
The main objective of the study was to find combinations of genetic variants significantly associated with bipolar disorder. In a previous study of bipolar disorder, combinations of three single nucleotide polymorphism (SNP) genotypes taken from 803 SNPs were analyzed, and four clusters of combinations were found to be significantly associated with bipolar disorder. In the present study, combinations of four SNP genotypes taken from the same 803 SNPs were analyzed, and one cluster of combinations was found to be significantly associated with bipolar disorder. Combinations from the new cluster and from the four previous clusters were identified in the genomes of 209 of the 607 patients in the study whereas none of the 1355 control participants had any of these combinations in their genome.
Collapse
Affiliation(s)
- Erling Mellerup
- Laboratory of Neuropsychiatry, Department of Neuroscience and Pharmacology, University of Copenhagen, Blegdamsvej 9 O-6102, DK-2100 Copenhagen, Denmark
- * E-mail:
| | - Ole A. Andreassen
- Department of Psychiatry, Oslo University Hospital and Institute of Psychiatry, University of Oslo, Kirkeveien 166. 0407 Oslo, Norway
| | - Bente Bennike
- Laboratory of Neuropsychiatry, Department of Neuroscience and Pharmacology, University of Copenhagen, Blegdamsvej 9 O-6102, DK-2100 Copenhagen, Denmark
| | - Henrik Dam
- Psychiatric Centre Copenhagen, Department O, Copenhagen University Hospital, Rigshospitalet, Blegdamsvej 9 O-6102, DK-2100 Copenhagen, Denmark
| | - Srdjan Djurovic
- Department of Medical Genetics, Oslo University Hospital and Institute of Psychiatry, University of Oslo, Kirkeveien 166. 0407 Oslo, Norway
| | - Thomas Hansen
- Department of Biological Psychiatry, Mental Health Centre Sct. Hans, Copenhagen University Hospital, Boserupvej 2, DK-4000 Roskilde, Denmark
| | - Martin Balslev Jorgensen
- Psychiatric Centre Copenhagen, Department O, Copenhagen University Hospital, Rigshospitalet, Blegdamsvej 9 O-6102, DK-2100 Copenhagen, Denmark
| | - Lars Vedel Kessing
- Psychiatric Centre Copenhagen, Department O, Copenhagen University Hospital, Rigshospitalet, Blegdamsvej 9 O-6102, DK-2100 Copenhagen, Denmark
| | - Pernille Koefoed
- Laboratory of Neuropsychiatry, Department of Neuroscience and Pharmacology, University of Copenhagen, Blegdamsvej 9 O-6102, DK-2100 Copenhagen, Denmark
- Psychiatric Centre Copenhagen, Department O, Copenhagen University Hospital, Rigshospitalet, Blegdamsvej 9 O-6102, DK-2100 Copenhagen, Denmark
| | - Ingrid Melle
- Department of Psychiatry, Oslo University Hospital and Institute of Psychiatry, University of Oslo, Kirkeveien 166. 0407 Oslo, Norway
| | - Ole Mors
- Centre for Psyciatric Research, Aarhus University Hospital, Skovagervej 2, DK-8240 Risskov, Denmark
| | - Thomas Werge
- Department of Biological Psychiatry, Mental Health Centre Sct. Hans, Copenhagen University Hospital, Boserupvej 2, DK-4000 Roskilde, Denmark
| | - Gert Lykke Moeller
- Genokey ApS, ScionDTU, Technical University Denmark, Agern Allé 3, DK-2970 Hoersholm, Denmark
| |
Collapse
|
30
|
Sapin E, Keedwell E, Frayling T. An Ant Colony Optimization and Tabu List Approach to the Detection of Gene-Gene Interactions in Genome-Wide Association Studies [Research Frontier]. IEEE COMPUT INTELL M 2015. [DOI: 10.1109/mci.2015.2471236] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
31
|
Niel C, Sinoquet C, Dina C, Rocheleau G. A survey about methods dedicated to epistasis detection. Front Genet 2015; 6:285. [PMID: 26442103 PMCID: PMC4564769 DOI: 10.3389/fgene.2015.00285] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2015] [Accepted: 08/27/2015] [Indexed: 12/25/2022] Open
Abstract
During the past decade, findings of genome-wide association studies (GWAS) improved our knowledge and understanding of disease genetics. To date, thousands of SNPs have been associated with diseases and other complex traits. Statistical analysis typically looks for association between a phenotype and a SNP taken individually via single-locus tests. However, geneticists admit this is an oversimplified approach to tackle the complexity of underlying biological mechanisms. Interaction between SNPs, namely epistasis, must be considered. Unfortunately, epistasis detection gives rise to analytic challenges since analyzing every SNP combination is at present impractical at a genome-wide scale. In this review, we will present the main strategies recently proposed to detect epistatic interactions, along with their operating principle. Some of these methods are exhaustive, such as multifactor dimensionality reduction, likelihood ratio-based tests or receiver operating characteristic curve analysis; some are non-exhaustive, such as machine learning techniques (random forests, Bayesian networks) or combinatorial optimization approaches (ant colony optimization, computational evolution system).
Collapse
Affiliation(s)
- Clément Niel
- Computer Science Institute of Nantes-Atlantic (Lina), Centre National de la Recherche Scientifique UMR 6241, Ecole Polytechnique de l'Université de Nantes Nantes, France
| | - Christine Sinoquet
- Computer Science Institute of Nantes-Atlantic (Lina), Centre National de la Recherche Scientifique UMR 6241, University of Nantes Nantes, France
| | - Christian Dina
- Institut du Thorax, Institut National de la Santé et de la Recherche Médicale UMR 1087, Centre National de la Recherche Scientifique UMR 6291, University of Nantes Nantes, France
| | - Ghislain Rocheleau
- European Genomic Institute for Diabetes FR3508, Centre National de la Recherche Scientifique UMR 8199, Lille 2 University Lille, France
| |
Collapse
|
32
|
Upton A, Trelles O, Cornejo-García JA, Perkins JR. Review: High-performance computing to detect epistasis in genome scale data sets. Brief Bioinform 2015; 17:368-79. [PMID: 26272945 DOI: 10.1093/bib/bbv058] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Indexed: 11/14/2022] Open
Abstract
It is becoming clear that most human diseases have a complex etiology that cannot be explained by single nucleotide polymorphisms (SNPs) or simple additive combinations; the general consensus is that they are caused by combinations of multiple genetic variations. The limited success of some genome-wide association studies is partly a result of this focus on single genetic markers. A more promising approach is to take into account epistasis, by considering the association of multiple SNP interactions with disease. However, as genomic data continues to grow in resolution, and genome and exome sequencing become more established, the number of combinations of variants to consider increases rapidly. Two potential solutions should be considered: the use of high-performance computing, which allows us to consider a larger number of variables, and heuristics to make the solution more tractable, essential in the case of genome sequencing. In this review, we look at different computational methods to analyse epistatic interactions within disease-related genetic data sets created by microarray technology. We also review efforts to use epistatic analysis results to produce biomarkers for diagnostic tests and give our views on future directions in this field in light of advances in sequencing technology and variants in non-coding regions.
Collapse
|
33
|
Lipka AE, Kandianis CB, Hudson ME, Yu J, Drnevich J, Bradbury PJ, Gore MA. From association to prediction: statistical methods for the dissection and selection of complex traits in plants. CURRENT OPINION IN PLANT BIOLOGY 2015; 24:110-8. [PMID: 25795170 DOI: 10.1016/j.pbi.2015.02.010] [Citation(s) in RCA: 82] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2014] [Revised: 02/24/2015] [Accepted: 02/27/2015] [Indexed: 05/02/2023]
Abstract
Quantification of genotype-to-phenotype associations is central to many scientific investigations, yet the ability to obtain consistent results may be thwarted without appropriate statistical analyses. Models for association can consider confounding effects in the materials and complex genetic interactions. Selecting optimal models enables accurate evaluation of associations between marker loci and numerous phenotypes including gene expression. Significant improvements in QTL discovery via association mapping and acceleration of breeding cycles through genomic selection are two successful applications of models using genome-wide markers. Given recent advances in genotyping and phenotyping technologies, further refinement of these approaches is needed to model genetic architecture more accurately and run analyses in a computationally efficient manner, all while accounting for false positives and maximizing statistical power.
Collapse
Affiliation(s)
- Alexander E Lipka
- University of Illinois, Department of Crop Sciences, Urbana, IL 61801, USA.
| | - Catherine B Kandianis
- Michigan State University, Department of Biochemistry and Molecular Biology, East Lansing, MI 48824, USA; Cornell University, Plant Breeding and Genetics Section, School of Integrative Plant Science, Ithaca, NY 14853, USA
| | - Matthew E Hudson
- University of Illinois, Department of Crop Sciences, Urbana, IL 61801, USA
| | - Jianming Yu
- Iowa State University, Department of Agronomy, Ames, IA 50011, USA
| | - Jenny Drnevich
- University of Illinois, High Performance Biological Computing Group and the Carver Biotechnology Center, Urbana, IL 61801, USA
| | - Peter J Bradbury
- United States Department of Agriculture (USDA) - Agricultural Research Service (ARS), Robert W. Holley Center for Agriculture and Health, Ithaca, NY 14853, USA
| | - Michael A Gore
- Cornell University, Plant Breeding and Genetics Section, School of Integrative Plant Science, Ithaca, NY 14853, USA
| |
Collapse
|
34
|
High performance computing enabling exhaustive analysis of higher order single nucleotide polymorphism interaction in Genome Wide Association Studies. Health Inf Sci Syst 2015; 3:S3. [PMID: 25870758 PMCID: PMC4383059 DOI: 10.1186/2047-2501-3-s1-s3] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Genome-wide association studies (GWAS) are a common approach for systematic discovery of single nucleotide polymorphisms (SNPs) which are associated with a given disease. Univariate analysis approaches commonly employed may miss important SNP associations that only appear through multivariate analysis in complex diseases. However, multivariate SNP analysis is currently limited by its inherent computational complexity. In this work, we present a computational framework that harnesses supercomputers. Based on our results, we estimate a three-way interaction analysis on 1.1 million SNP GWAS data requiring over 5.8 years on the full "Avoca" IBM Blue Gene/Q installation at the Victorian Life Sciences Computation Initiative. This is hundreds of times faster than estimates for other CPU based methods and four times faster than runtimes estimated for GPU methods, indicating how the improvement in the level of hardware applied to interaction analysis may alter the types of analysis that can be performed. Furthermore, the same analysis would take under 3 months on the currently largest IBM Blue Gene/Q supercomputer "Sequoia" at the Lawrence Livermore National Laboratory assuming linear scaling is maintained as our results suggest. Given that the implementation used in this study can be further optimised, this runtime means it is becoming feasible to carry out exhaustive analysis of higher order interaction studies on large modern GWAS.
Collapse
|
35
|
Grange L, Bureau JF, Nikolayeva I, Paul R, Van Steen K, Schwikowski B, Sakuntabhai A. Filter-free exhaustive odds ratio-based genome-wide interaction approach pinpoints evidence for interaction in the HLA region in psoriasis. BMC Genet 2015; 16:11. [PMID: 25655172 PMCID: PMC4341885 DOI: 10.1186/s12863-015-0174-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Accepted: 01/23/2015] [Indexed: 12/02/2022] Open
Abstract
Background Deciphering the genetic architecture of complex traits is still a major challenge for human genetics. In most cases, genome-wide association studies have only partially explained the heritability of traits and diseases. Epistasis, one potentially important cause of this missing heritability, is difficult to explore at the genome-wide level. Here, we develop and assess a tool based on interactive odds ratios (IOR), Fast Odds Ratio-based sCan for Epistasis (FORCE), as a novel approach for exhaustive genome-wide epistasis search. IOR is the ratio between the multiplicative term of the odds ratio (OR) of having each variant over the OR of having both of them. By definition, an IOR that significantly deviates from 1 suggests the occurrence of an interaction (epistasis). As the IOR is fast to calculate, we used the IOR to rank and select pairs of interacting polymorphisms for P value estimation, which is more time consuming. Results FORCE displayed power and accuracy similar to existing parametric and non-parametric methods, and is fast enough to complete a filter-free genome-wide epistasis search in a few days on a standard computer. Analysis of psoriasis data uncovered novel epistatic interactions in the HLA region, corroborating the known major and complex role of the HLA region in psoriasis susceptibility. Conclusions Our systematic study revealed the ability of FORCE to uncover novel interactions, highlighted the importance of exhaustiveness, as well as its specificity for certain types of interactions that were not detected by existing approaches. We therefore believe that FORCE is a valuable new tool for decoding the genetic basis of complex diseases. Electronic supplementary material The online version of this article (doi:10.1186/s12863-015-0174-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Laura Grange
- Department of Genomes and Genetics, Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris, 75015, France. .,CNRS URA3012, Paris, 75015, France. .,Université Paris Diderot, Paris, 75013, France.
| | - Jean-François Bureau
- Department of Genomes and Genetics, Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris, 75015, France. .,CNRS URA3012, Paris, 75015, France.
| | - Iryna Nikolayeva
- Department of Genomes and Genetics, Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris, 75015, France. .,Department of Genomes and Genetics, Institut Pasteur, Systems Biology Lab, Paris, 75015, France. .,Université Paris-Descartes, Sorbonne Paris Cité, Paris, France.
| | - Richard Paul
- Department of Genomes and Genetics, Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris, 75015, France. .,CNRS URA3012, Paris, 75015, France.
| | - Kristel Van Steen
- Systems and Modeling Unit, Montefiore institute, University of Liège, Liège, Belgium. .,Bioinformatics and Modeling, GiGA-R, University of Liège, Liège, Belgium.
| | - Benno Schwikowski
- Department of Genomes and Genetics, Institut Pasteur, Systems Biology Lab, Paris, 75015, France.
| | - Anavaj Sakuntabhai
- Department of Genomes and Genetics, Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris, 75015, France. .,CNRS URA3012, Paris, 75015, France.
| |
Collapse
|
36
|
Abstract
Epistasis is a growing area of research in genome-wide studies, but the differences between alternative definitions of epistasis remain a source of confusion for many researchers. One problem is that models for epistasis are presented in a number of formats, some of which have difficult-to-interpret parameters. In addition, the relation between the different models is rarely explained. Existing software for testing epistatic interactions between single-nucleotide polymorphisms (SNPs) does not provide the flexibility to compare the available model parameterizations. For that reason we have developed an R package for investigating epistatic and penetrance models, Epi2Loc, to aid users who wish to easily compare, interpret, and utilize models for two-locus epistatic interactions. Epi2Loc facilitates research on SNP-SNP interactions by allowing the R user to easily convert between common parametric forms for two-locus interactions, generate data for simulation studies, and perform power analyses for the selected model with a continuous or dichotomous phenotype. The usefulness of the package for model interpretation and power analysis is illustrated using data on rheumatoid arthritis.
Collapse
Affiliation(s)
- Raymond K. Walters
- Department of Psychology, University of Notre Dame, Notre Dame, Indiana, USA
| | - Charles Laurin
- Department of Psychology, University of Notre Dame, Notre Dame, Indiana, USA
| | - Gitta H. Lubke
- Department of Psychology, University of Notre Dame, Notre Dame, Indiana, USA
- Biological Psychology, VU University Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
37
|
Bedő J, Rawlinson D, Goudey B, Ong CS. Stability of bivariate GWAS biomarker detection. PLoS One 2014; 9:e93319. [PMID: 24787002 PMCID: PMC4005767 DOI: 10.1371/journal.pone.0093319] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2013] [Accepted: 03/03/2014] [Indexed: 12/24/2022] Open
Abstract
Given the difficulty and effort required to confirm candidate causal SNPs detected in genome-wide association studies (GWAS), there is no practical way to definitively filter false positives. Recent advances in algorithmics and statistics have enabled repeated exhaustive search for bivariate features in a practical amount of time using standard computational resources, allowing us to use cross-validation to evaluate the stability. We performed 10 trials of 2-fold cross-validation of exhaustive bivariate analysis on seven Wellcome–Trust Case–Control Consortium GWAS datasets, comparing the traditional test for association, the high-performance GBOOST method and the recently proposed GSS statistic (Available at http://bioinformatics.research.nicta.com.au/software/gwis/). We use Spearman's correlation to measure the similarity between the folds of cross validation. To compare incomplete lists of ranks we propose an extension to Spearman's correlation. The extension allows us to consider a natural threshold for feature selection where the correlation is zero. This is the first reported cross-validation study of exhaustive bivariate GWAS feature selection. We found that stability between ranked lists from different cross-validation folds was higher for GSS in the majority of diseases. A thorough analysis of the correlation between SNP-frequency and univariate score demonstrated that the test for association is highly confounded by main effects: SNPs with high univariate significance replicably dominate the ranked results. We show that removal of the univariately significant SNPs improves replicability but risks filtering pairs involving SNPs with univariate effects. We empirically confirm that the stability of GSS and GBOOST were not affected by removal of univariately significant SNPs. These results suggest that the GSS and GBOOST tests are successfully targeting bivariate association with phenotype and that GSS is able to reliably detect a larger set of SNP-pairs than GBOOST in the majority of the data we analysed. However, the test for association was confounded by main effects.
Collapse
Affiliation(s)
- Justin Bedő
- NICTA Victoria Research Laboratory, University of Melbourne, Victoria, Australia
- Department of Computing and Information Systems, University of Melbourne, Victoria, Australia
- * E-mail:
| | - David Rawlinson
- NICTA Victoria Research Laboratory, University of Melbourne, Victoria, Australia
- Department of Electrical & Electronic Engineering, University of Melbourne, Victoria, Australia
| | - Benjamin Goudey
- NICTA Victoria Research Laboratory, University of Melbourne, Victoria, Australia
- Department of Computing and Information Systems, University of Melbourne, Victoria, Australia
| | - Cheng Soon Ong
- NICTA Victoria Research Laboratory, University of Melbourne, Victoria, Australia
- Department of Electrical & Electronic Engineering, University of Melbourne, Victoria, Australia
| |
Collapse
|
38
|
Abstract
Epistasis is a growing area of research in genome-wide studies, but the differences between alternative definitions of epistasis remain a source of confusion for many researchers. One problem is that models for epistasis are presented in a number of formats, some of which have difficult-to-interpret parameters. In addition, the relation between the different models is rarely explained. Existing software for testing epistatic interactions between single-nucleotide polymorphisms (SNPs) does not provide the flexibility to compare the available model parameterizations. For that reason we have developed an R package for investigating epistatic and penetrance models, EpiPen, to aid users who wish to easily compare, interpret, and utilize models for two-locus epistatic interactions. EpiPen facilitates research on SNP-SNP interactions by allowing the R user to easily convert between common parametric forms for two-locus interactions, generate data for simulation studies, and perform power analyses for the selected model with a continuous or dichotomous phenotype. The usefulness of the package for model interpretation and power analysis is illustrated using data on rheumatoid arthritis.
Collapse
|
39
|
Livingston KM, Bada M, Hunter LE, Verspoor K. Representing annotation compositionality and provenance for the Semantic Web. J Biomed Semantics 2013; 4:38. [PMID: 24268021 PMCID: PMC4129183 DOI: 10.1186/2041-1480-4-38] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2013] [Accepted: 09/20/2013] [Indexed: 12/03/2022] Open
Abstract
Background Though the annotation of digital artifacts with metadata has a long history, the bulk of that work focuses on the association of single terms or concepts to single targets. As annotation efforts expand to capture more complex information, annotations will need to be able to refer to knowledge structures formally defined in terms of more atomic knowledge structures. Existing provenance efforts in the Semantic Web domain primarily focus on tracking provenance at the level of whole triples and do not provide enough detail to track how individual triple elements of annotations were derived from triple elements of other annotations. Results We present a task- and domain-independent ontological model for capturing annotations and their linkage to their denoted knowledge representations, which can be singular concepts or more complex sets of assertions. We have implemented this model as an extension of the Information Artifact Ontology in OWL and made it freely available, and we show how it can be integrated with several prominent annotation and provenance models. We present several application areas for the model, ranging from linguistic annotation of text to the annotation of disease-associations in genome sequences. Conclusions With this model, progressively more complex annotations can be composed from other annotations, and the provenance of compositional annotations can be represented at the annotation level or at the level of individual elements of the RDF triples composing the annotations. This in turn allows for progressively richer annotations to be constructed from previous annotation efforts, the precise provenance recording of which facilitates evidence-based inference and error tracking.
Collapse
Affiliation(s)
- Kevin M Livingston
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Michael Bada
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Lawrence E Hunter
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Karin Verspoor
- National ICT Australia, Victoria Research Laboratory, Melbourne, VIC, 3010, Australia ; Department of Computing and Information Systems, The University of Melbourne, Melbourne 3010 VIC, Australia
| |
Collapse
|
40
|
Bromberg Y, Capriotti E. Thoughts from SNP-SIG 2012: future challenges in the annotation of genetic variations. BMC Genomics 2013; 14 Suppl 3:S1. [PMID: 23819751 PMCID: PMC3665538 DOI: 10.1186/1471-2164-14-s3-s1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Affiliation(s)
- Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, USA.
| | | |
Collapse
|