1
|
Veyssiere M, Rodriguez Ordonez MDP, Chalabi S, Michou L, Cornelis F, Boland A, Olaso R, Deleuze JF, Petit-Teixeira E, Chaudru V. MYLK* FLNB and DOCK1* LAMA2 gene-gene interactions associated with rheumatoid arthritis in the focal adhesion pathway. Front Genet 2024; 15:1375036. [PMID: 38803542 PMCID: PMC11128622 DOI: 10.3389/fgene.2024.1375036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 04/18/2024] [Indexed: 05/29/2024] Open
Abstract
Rheumatoid arthritis (RA) is a chronic, systemic autoimmune disease caused by a combination of genetic and environmental factors. Rare variants with low predicted effects in genes participating in the same biological function might be involved in developing complex diseases such as RA. From whole-exome sequencing (WES) data, we identified genes containing rare non-neutral variants with complete penetrance and no phenocopy in at least one of nine French multiplex families. Further enrichment analysis highlighted focal adhesion as the most significant pathway. We then tested if interactions between the genes participating in this function would increase or decrease the risk of developing RA disease. The model-based multifactor dimensionality reduction (MB-MDR) approach was used to detect epistasis in a discovery sample (19 RA cases and 11 healthy individuals from 9 families and 98 unrelated CEU controls from the International Genome Sample Resource). We identified 9 significant interactions involving 11 genes (MYLK, FLNB, DOCK1, LAMA2, RELN, PIP5K1C, TNC, PRKCA, VEGFB, ITGB5, and FLT1). One interaction (MYLK*FLNB) increasing RA risk and one interaction decreasing RA risk (DOCK1*LAMA2) were confirmed in a replication sample (200 unrelated RA cases and 91 GBR unrelated controls). Functional and genomic data in RA samples or relevant cell types argue the key role of these genes in RA.
Collapse
Affiliation(s)
- Maëva Veyssiere
- Institut National de la Santé et de la Recherche Médicale, Université de Paris, Paris, France
| | | | - Smahane Chalabi
- GenHotel—Univ Evry, University of Paris Saclay, Evry, France
| | - Laetitia Michou
- Division of Rheumatology, Department of Medicine, CHU de Québec-Université Laval, Québec City, QC, Canada
| | - François Cornelis
- Génétiqe-Oncogénétique Adulte-Prévention, Institut National de la Santé et de la Recherche Médicale, Clermont-Auvergne University and CHU, Clermont-Ferrand, France
| | - Anne Boland
- Commissariat à l'Energie Atomique, Centre National de Recherche en Génomique Humaine (CNRGH), Université Paris-Saclay, Evry, France
| | - Robert Olaso
- Commissariat à l'Energie Atomique, Centre National de Recherche en Génomique Humaine (CNRGH), Université Paris-Saclay, Evry, France
| | - Jean-François Deleuze
- Commissariat à l'Energie Atomique, Centre National de Recherche en Génomique Humaine (CNRGH), Université Paris-Saclay, Evry, France
| | | | - Valérie Chaudru
- Institut National de la Santé et de la Recherche Médicale, Université de Paris, Paris, France
- GenHotel—Univ Evry, University of Paris Saclay, Evry, France
| |
Collapse
|
2
|
Balunathan N, Rani G U, Perumal V, Kumarasamy P. Single nucleotide polymorphisms of Interleukin - 4, Interleukin-18, FCRL3 and sPLA2IIa genes and their association in pathogenesis of endometriosis. Mol Biol Rep 2023; 50:4239-4252. [PMID: 36905404 DOI: 10.1007/s11033-023-08316-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Accepted: 01/31/2023] [Indexed: 03/12/2023]
Abstract
BACKGROUND Endometriosis is a complex gynaecological disorder that contributes to infertility, dysmenorrhea, dyspareunia, and other chronic issues. It is a multifactorial disease involving genetic, hormonal, immunological and environmental components. Endometriosis's pathogenesis remains unclear. AIM OF THE STUDY was to analyse the polymorphisms in Interleukin 4, Interleukin 18, FCRL3 and sPLA2IIa genes to identify any significant association with the risk of endometriosis. MATERIAL AND METHODS This study evaluated the polymorphism of -590 C/T in interleukin- 4(IL-4) gene, C607A in Interleukin - 18(IL-18) gene, -169T > C in FCRL3 gene and 763 C > G in sPLA2IIa gene in women with endometriosis. The case-control study included 150 women with endometriosis and 150 apparently healthy women as control subjects. DNA was extracted from peripheral blood leukocytes and endometriotic tissue of cases and blood samples for controls and further analysed by PCR amplification and then sequencing was carried out to find the allele and genotypes of the subjects and then to analyse the relationship between the gene polymorphisms and endometriosis. To evaluate the association of the different genotypes, 95% confidence intervals (CI) were calculated. RESULTS Interleukin - 18 and FCRL3 gene polymorphisms of endometriotic tissue and blood samples of endometriosis (cases) showed significantly associated (OR = 4.88 [95% CI = 2.31-10.30], P > 0.0001) and (OR = 4.00 [95% CI = 2.2-7.33], P > 0.0001) when compared with normal blood samples. However, there was no significant difference in Interleukin - 4 and sPLA2IIa gene polymorphisms between control women and patients with endometriosis. CONCLUSIONS The present study suggests that the IL-18 and FCRL3 gene polymorphisms are associated with a higher risk for endometriosis, which delivers valuable knowledge of endometriosis's pathogenesis. However, a larger sample size of patients from various ethnic backgrounds is necessary to evaluate whether these alleles have a direct effect on disease susceptibility.
Collapse
Affiliation(s)
- Nandhini Balunathan
- Department of Human Genetics, Faculty of Biomedical sciences & technology, Sri Ramachandra Institute of Higher Education and Research (Deemed to be University, Porur, Chennai, India.
| | - Usha Rani G
- Department of Obstetrics & Gynaecology, Sri Ramachandra Institute of Higher Education and Research (Deemed to be University, Porur, Chennai, India
| | - Venkatachalam Perumal
- Department of Human Genetics, Faculty of Biomedical sciences & technology, Sri Ramachandra Institute of Higher Education and Research (Deemed to be University, Porur, Chennai, India
| | - P Kumarasamy
- Controller of examinations, Tamilnadu Veterinary and Animal sciences university, Chennai, India
| |
Collapse
|
3
|
Sha Z, Chen Y, Hu T. NSPA: characterizing the disease association of multiple genetic interactions at single-subject resolution. BIOINFORMATICS ADVANCES 2023; 3:vbad010. [PMID: 36818729 PMCID: PMC9927570 DOI: 10.1093/bioadv/vbad010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 01/02/2023] [Accepted: 02/02/2023] [Indexed: 02/10/2023]
Abstract
Motivation The interaction between genetic variables is one of the major barriers to characterizing the genetic architecture of complex traits. To consider epistasis, network science approaches are increasingly being used in research to elucidate the genetic architecture of complex diseases. Network science approaches associate genetic variables' disease susceptibility to their topological importance in the network. However, this network only represents genetic interactions and does not describe how these interactions attribute to disease association at the subject-scale. We propose the Network-based Subject Portrait Approach (NSPA) and an accompanying feature transformation method to determine the collective risk impact of multiple genetic interactions for each subject. Results The feature transformation method converts genetic variants of subjects into new values that capture how genetic variables interact with others to attribute to a subject's disease association. We apply this approach to synthetic and genetic datasets and learn that (1) the disease association can be captured using multiple disjoint sets of genetic interactions and (2) the feature transformation method based on NSPA improves predictive performance comparing with using the original genetic variables. Our findings confirm the role of genetic interaction in complex disease and provide a novel approach for gene-disease association studies to identify genetic architecture in the context of epistasis. Availability and implementation The codes of NSPA are now available in: https://github.com/MIB-Lab/Network-based-Subject-Portrait-Approach. Contact ting.hu@queensu.ca. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Zhendong Sha
- School of Computing, Queen’s University, Kingston, Ontario, Canada K7L 2N8
| | - Yuanzhu Chen
- School of Computing, Queen’s University, Kingston, Ontario, Canada K7L 2N8
| | - Ting Hu
- To whom correspondence should be addressed.
| |
Collapse
|
4
|
Wang H, Wu X. IPP: An Intelligent Privacy-Preserving Scheme for Detecting Interactions in Genome Association Studies. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:455-464. [PMID: 35239492 DOI: 10.1109/tcbb.2022.3155774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Analyzing K-order Single Nucleotide Polymorphism (SNP) interactions through the statistics of Genome-Wide Association Studies (GWAS) is crucial for discovering pathogenic causes of human complex diseases and controlling risk genetic variants of diverse disorders. We propose a method based on Ant Colony Optimization (ACO) algorithm to detect gene interactions for GWAS - an Intelligent Privacy-Preserving scheme (IPP). Initially, we design a multi-objective search algorithm to discover the candidate SNP sets related to disease phenotype, which utilizes Differential Privacy (DP) method by disturbing the multi-objective function to construct a rational epistatic privacy protection strategy. Furthermore, the global path selection strategy composed of two probabilistic methods is proposed to reduce the probability of falling into the local optimum. We use simulated models and a real dataset of Rheumatoid Arthritis (RA) to compare IPP with four popular methods to detect K-order SNPs, the experimental results show that IPP can guarantee the search accuracy effectively and enhance the detecting ability of various models. Further, the privacy budget experiments indicate that the range of privacy budget in IPP is reasonable and make the framework more stable.
Collapse
|
5
|
Saha S, Perrin L, Röder L, Brun C, Spinelli L. Epi-MEIF: detecting higher order epistatic interactions for complex traits using mixed effect conditional inference forests. Nucleic Acids Res 2022; 50:e114. [PMID: 36107776 PMCID: PMC9639209 DOI: 10.1093/nar/gkac715] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 07/29/2022] [Accepted: 09/12/2022] [Indexed: 12/04/2022] Open
Abstract
Understanding the relationship between genetic variations and variations in complex and quantitative phenotypes remains an ongoing challenge. While Genome-wide association studies (GWAS) have become a vital tool for identifying single-locus associations, we lack methods for identifying epistatic interactions. In this article, we propose a novel method for higher-order epistasis detection using mixed effect conditional inference forest (epiMEIF). The proposed method is fitted on a group of single nucleotide polymorphisms (SNPs) potentially associated with the phenotype and the tree structure in the forest facilitates the identification of n-way interactions between the SNPs. Additional testing strategies further improve the robustness of the method. We demonstrate its ability to detect true n-way interactions via extensive simulations in both cross-sectional and longitudinal synthetic datasets. This is further illustrated in an application to reveal epistatic interactions from natural variations of cardiac traits in flies (Drosophila). Overall, the method provides a generalized way to identify higher-order interactions from any GWAS data, thereby greatly improving the detection of the genetic architecture underlying complex phenotypes.
Collapse
Affiliation(s)
- Saswati Saha
- Aix Marseille Univ, INSERM, TAGC (UMR1090), Turing Centre for Living systems, Marseille, France
| | - Laurent Perrin
- Aix Marseille Univ, INSERM, TAGC (UMR1090), Turing Centre for Living systems, Marseille, France
- CNRS, Marseille, France
| | - Laurence Röder
- Aix Marseille Univ, INSERM, TAGC (UMR1090), Turing Centre for Living systems, Marseille, France
| | - Christine Brun
- Aix Marseille Univ, INSERM, TAGC (UMR1090), Turing Centre for Living systems, Marseille, France
- CNRS, Marseille, France
| | - Lionel Spinelli
- Aix Marseille Univ, INSERM, TAGC (UMR1090), Turing Centre for Living systems, Marseille, France
| |
Collapse
|
6
|
Abstract
BACKGROUND Autoimmune hepatitis has an unknown cause and genetic associations that are not disease-specific or always present. Clarification of its missing causality and heritability could improve prevention and management strategies. AIMS Describe the key epigenetic and genetic mechanisms that could account for missing causality and heritability in autoimmune hepatitis; indicate the prospects of these mechanisms as pivotal factors; and encourage investigations of their pathogenic role and therapeutic potential. METHODS English abstracts were identified in PubMed using multiple key search phases. Several hundred abstracts and 210 full-length articles were reviewed. RESULTS Environmental induction of epigenetic changes is the prime candidate for explaining the missing causality of autoimmune hepatitis. Environmental factors (diet, toxic exposures) can alter chromatin structure and the production of micro-ribonucleic acids that affect gene expression. Epistatic interaction between unsuspected genes is the prime candidate for explaining the missing heritability. The non-additive, interactive effects of multiple genes could enhance their impact on the propensity and phenotype of autoimmune hepatitis. Transgenerational inheritance of acquired epigenetic marks constitutes another mechanism of transmitting parental adaptations that could affect susceptibility. Management strategies could range from lifestyle adjustments and nutritional supplements to precision editing of the epigenetic landscape. CONCLUSIONS Autoimmune hepatitis has a missing causality that might be explained by epigenetic changes induced by environmental factors and a missing heritability that might reflect epistatic gene interactions or transgenerational transmission of acquired epigenetic marks. These unassessed or under-evaluated areas warrant investigation.
Collapse
|
7
|
Walakira A, Ocira J, Duroux D, Fouladi R, Moškon M, Rozman D, Van Steen K. Detecting gene-gene interactions from GWAS using diffusion kernel principal components. BMC Bioinformatics 2022; 23:57. [PMID: 35105309 PMCID: PMC8805268 DOI: 10.1186/s12859-022-04580-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 01/18/2022] [Indexed: 11/10/2022] Open
Abstract
Genes and gene products do not function in isolation but as components of complex networks of macromolecules through physical or biochemical interactions. Dependencies of gene mutations on genetic background (i.e., epistasis) are believed to play a role in understanding molecular underpinnings of complex diseases such as inflammatory bowel disease (IBD). However, the process of identifying such interactions is complex due to for instance the curse of high dimensionality, dependencies in the data and non-linearity. Here, we propose a novel approach for robust and computationally efficient epistasis detection. We do so by first reducing dimensionality, per gene via diffusion kernel principal components (kpc). Subsequently, kpc gene summaries are used for downstream analysis including the construction of a gene-based epistasis network. We show that our approach is not only able to recover known IBD associated genes but also additional genes of interest linked to this difficult gastrointestinal disease.
Collapse
Affiliation(s)
- Andrew Walakira
- Centre for Functional Genomics and Bio-Chips, Institute for Biochemistry and Molecular Genetics, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| | - Junior Ocira
- BIO3 - Laboratory for Systems Genetics, GIGA-R Medical Genomics, University of Liège, Liège, Belgium
| | - Diane Duroux
- BIO3 - Laboratory for Systems Genetics, GIGA-R Medical Genomics, University of Liège, Liège, Belgium
| | - Ramouna Fouladi
- BIO3 - Laboratory for Systems Genetics, GIGA-R Medical Genomics, University of Liège, Liège, Belgium
| | - Miha Moškon
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
| | - Damjana Rozman
- Centre for Functional Genomics and Bio-Chips, Institute for Biochemistry and Molecular Genetics, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| | - Kristel Van Steen
- BIO3 - Laboratory for Systems Genetics, GIGA-R Medical Genomics, University of Liège, Liège, Belgium
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
| |
Collapse
|
8
|
Association between gene expression levels of GDF9 and BMP15 and clinicopathological factors in the prognosis of female infertility in northeast Indian populations. Meta Gene 2021. [DOI: 10.1016/j.mgene.2021.100964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
9
|
Abegaz F, Van Lishout F, Mahachie John JM, Chiachoompu K, Bhardwaj A, Duroux D, Gusareva ES, Wei Z, Hakonarson H, Van Steen K. Performance of model-based multifactor dimensionality reduction methods for epistasis detection by controlling population structure. BioData Min 2021; 14:16. [PMID: 33608043 PMCID: PMC7893746 DOI: 10.1186/s13040-021-00247-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Accepted: 02/07/2021] [Indexed: 12/15/2022] Open
Abstract
Background In genome-wide association studies the extent and impact of confounding due to population structure have been well recognized. Inadequate handling of such confounding is likely to lead to spurious associations, hampering replication, and the identification of causal variants. Several strategies have been developed for protecting associations against confounding, the most popular one is based on Principal Component Analysis. In contrast, the extent and impact of confounding due to population structure in gene-gene interaction association epistasis studies are much less investigated and understood. In particular, the role of nonlinear genetic population substructure in epistasis detection is largely under-investigated, especially outside a regression framework. Methods To identify causal variants in synergy, to improve interpretability and replicability of epistasis results, we introduce three strategies based on a model-based multifactor dimensionality reduction approach for structured populations, namely MBMDR-PC, MBMDR-PG, and MBMDR-GC. Results Simulation results comparing the performance of various approaches show that in the presence of population structure MBMDR-PC and MBMDR-PG consistently better control type I error rate at the nominal level than MBMDR-GC. Moreover, our proposed three methods of population structure correction outperform MDR-SP in terms of statistical power. Conclusion We demonstrate through extensive simulation studies the effect of various degrees of genetic population structure and relatedness on epistasis detection and propose appropriate remedial measures based on linear and nonlinear sample genetic similarity. Supplementary Information The online version contains supplementary material available at 10.1186/s13040-021-00247-w.
Collapse
Affiliation(s)
- Fentaw Abegaz
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium.
| | | | | | | | - Archana Bhardwaj
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium
| | - Diane Duroux
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium
| | - Elena S Gusareva
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium
| | - Zhi Wei
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
| | - Hakon Hakonarson
- Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Pediatrics, Division of Human Genetics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Kristel Van Steen
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium.,WELBIO (Walloon Excellence in Lifesciences and Biotechnology), University of Liège, Liège, Belgium
| |
Collapse
|
10
|
Abegaz F, Chaichoompu K, Génin E, Fardo DW, König IR, Mahachie John JM, Van Steen K. Principals about principal components in statistical genetics. Brief Bioinform 2020; 20:2200-2216. [PMID: 30219892 DOI: 10.1093/bib/bby081] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Revised: 07/21/2018] [Accepted: 08/12/2018] [Indexed: 12/13/2022] Open
Abstract
Principal components (PCs) are widely used in statistics and refer to a relatively small number of uncorrelated variables derived from an initial pool of variables, while explaining as much of the total variance as possible. Also in statistical genetics, principal component analysis (PCA) is a popular technique. To achieve optimal results, a thorough understanding about the different implementations of PCA is required and their impact on study results, compared to alternative approaches. In this review, we focus on the possibilities, limitations and role of PCs in ancestry prediction, genome-wide association studies, rare variants analyses, imputation strategies, meta-analysis and epistasis detection. We also describe several variations of classic PCA that deserve increased attention in statistical genetics applications.
Collapse
|
11
|
Riahi P, Kazemnejad A, Mostafaei S, Meguro A, Mizuki N, Ashraf-Ganjouei A, Javinani A, Faezi ST, Shahram F, Mahmoudi M. ERAP1 polymorphisms interactions and their association with Behçet's disease susceptibly: Application of Model-Based Multifactor Dimension Reduction Algorithm (MB-MDR). PLoS One 2020; 15:e0227997. [PMID: 32023277 PMCID: PMC7001967 DOI: 10.1371/journal.pone.0227997] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Accepted: 01/03/2020] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Behçet's disease (BD) is a chronic multi-systemic vasculitis with a considerable prevalence in Asian countries. There are many genes associated with a higher risk of developing BD, one of which is endoplasmic reticulum aminopeptidase-1 (ERAP1). In this study, we aimed to investigate the interactions of ERAP1 single nucleotide polymorphisms (SNPs) using a novel data mining method called Model-based multifactor dimensionality reduction (MB-MDR). METHODS We have included 748 BD patients and 776 healthy controls. A peripheral blood sample was collected, and eleven SNPs were assessed. Furthermore, we have applied the MB-MDR method to evaluate the interactions of ERAP1 gene polymorphisms. RESULTS The TT genotype of rs1065407 had a synergistic effect on BD susceptibility, considering the significant main effect. In the second order of interactions, CC genotype of rs2287987 and GG genotype of rs1065407 had the most prominent synergistic effect (β = 12.74). The mentioned genotypes also had significant interactions with CC genotype of rs26653 and TT genotype of rs30187 in the third-order (β = 12.74 and β = 12.73, respectively). CONCLUSION To the best of our knowledge, this is the first study investigating the interaction of a particular gene's SNPs in BD patients by applying a novel data mining method. However, future studies investigating the interactions of various genes could clarify this issue.
Collapse
Affiliation(s)
- Parisa Riahi
- Department of Biostatistics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| | - Anoshirvan Kazemnejad
- Department of Biostatistics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
- * E-mail: (MM); (AK)
| | - Shayan Mostafaei
- Medical Biology Research Center, Health Technology Institute, Kermanshah University of Medical Sciences, Kermanshah, Iran
| | - Akira Meguro
- Department of Ophthalmology and Visual Science, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Nobuhisa Mizuki
- Department of Ophthalmology and Visual Science, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Amir Ashraf-Ganjouei
- Rheumatology Research Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Ali Javinani
- Rheumatology Research Center, Tehran University of Medical Sciences, Tehran, Iran
| | | | - Farhad Shahram
- Rheumatology Research Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Mahdi Mahmoudi
- Rheumatology Research Center, Tehran University of Medical Sciences, Tehran, Iran
- Inflammation Research Center, Tehran University of Medical Sciences, Tehran, Iran
- * E-mail: (MM); (AK)
| |
Collapse
|
12
|
Chattopadhyay A, Lu TP. Gene-gene interaction: the curse of dimensionality. ANNALS OF TRANSLATIONAL MEDICINE 2019; 7:813. [PMID: 32042829 DOI: 10.21037/atm.2019.12.87] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Identified genetic variants from genome wide association studies frequently show only modest effects on the disease risk, leading to the "missing heritability" problem. An avenue, to account for a part of this "missingness" is to evaluate gene-gene interactions (epistasis) thereby elucidating their effect on complex diseases. This can potentially help with identifying gene functions, pathways, and drug targets. However, the exhaustive evaluation of all possible genetic interactions among millions of single nucleotide polymorphisms (SNPs) raises several issues, otherwise known as the "curse of dimensionality". The dimensionality involved in the epistatic analysis of such exponentially growing SNPs diminishes the usefulness of traditional, parametric statistical methods. With the immense popularity of multifactor dimensionality reduction (MDR), a non-parametric method, proposed in 2001, that classifies multi-dimensional genotypes into one- dimensional binary approaches, led to the emergence of a fast-growing collection of methods that were based on the MDR approach. Moreover, machine-learning (ML) methods such as random forests and neural networks (NNs), deep-learning (DL) approaches, and hybrid approaches have also been applied profusely, in the recent years, to tackle this dimensionality issue associated with whole genome gene-gene interaction studies. However, exhaustive searching in MDR based approaches or variable selection in ML methods, still pose the risk of missing out on relevant SNPs. Furthermore, interpretability issues are a major hindrance for DL methods. To minimize this loss of information, Python based tools such as PySpark can potentially take advantage of distributed computing resources in the cloud, to bring back smaller subsets of data for further local analysis. Parallel computing can be a powerful resource that stands to fight this "curse". PySpark supports all standard Python libraries and C extensions thus making it convenient to write codes to deliver dramatic improvements in processing speed for extraordinarily large sets of data.
Collapse
Affiliation(s)
- Amrita Chattopadhyay
- Institute of Epidemiology and Preventive Medicine, Department of Public Health, National Taiwan University, Taipei
| | - Tzu-Pin Lu
- Institute of Epidemiology and Preventive Medicine, Department of Public Health, National Taiwan University, Taipei
| |
Collapse
|
13
|
Abegaz F, Van Lishout F, Mahachie John JM, Chiachoompu K, Bhardwaj A, Gusareva ES, Wei Z, Hakonarson H, Van Steen K. Epistasis Detection in Genome-Wide Screening for Complex Human Diseases in Structured Populations. SYSTEMS MEDICINE 2019. [DOI: 10.1089/sysm.2019.0003] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Affiliation(s)
- Fentaw Abegaz
- GIGA-R, Medical Genomics—BIO3, University of Liege, Liege, Belgium
| | | | | | | | - Archana Bhardwaj
- GIGA-R, Medical Genomics—BIO3, University of Liege, Liege, Belgium
| | | | - Zhi Wei
- Department of Computer Science, New Jersey Institute of Technology, Newark, New Jersey
| | - Hakon Hakonarson
- Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania
- Division of Human Genetics, Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Kristel Van Steen
- GIGA-R, Medical Genomics—BIO3, University of Liege, Liege, Belgium
- WELBIO (Walloon Excellence in Lifesciences and Biotechnology), University of Liege, Liege, Belgium
| | | |
Collapse
|
14
|
Joiret M, Mahachie John JM, Gusareva ES, Van Steen K. Confounding of linkage disequilibrium patterns in large scale DNA based gene-gene interaction studies. BioData Min 2019; 12:11. [PMID: 31198442 PMCID: PMC6558841 DOI: 10.1186/s13040-019-0199-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 05/09/2019] [Indexed: 01/07/2023] Open
Abstract
Background In Genome-Wide Association Studies (GWAS), the concept of linkage disequilibrium is important as it allows identifying genetic markers that tag the actual causal variants. In Genome-Wide Association Interaction Studies (GWAIS), similar principles hold for pairs of causal variants. However, Linkage Disequilibrium (LD) may also interfere with the detection of genuine epistasis signals in that there may be complete confounding between Gametic Phase Disequilibrium (GPD) and interaction. GPD may involve unlinked genetic markers, even residing on different chromosomes. Often GPD is eliminated in GWAIS, via feature selection schemes or so-called pruning algorithms, to obtain unconfounded epistasis results. However, little is known about the optimal degree of GPD/LD-pruning that gives a balance between false positive control and sufficient power of epistasis detection statistics. Here, we focus on Model-Based Multifactor Dimensionality Reduction as one large-scale epistasis detection tool. Its performance has been thoroughly investigated in terms of false positive control and power, under a variety of scenarios involving different trait types and study designs, as well as error-free and noisy data, but never with respect to multicollinear SNPs. Results Using real-life human LD patterns from a homogeneous subpopulation of British ancestry, we investigated the impact of LD-pruning on the statistical sensitivity of MB-MDR. We considered three different non-fully penetrant epistasis models with varying effect sizes. There is a clear advantage in pre-analysis pruning using sliding windows at r2 of 0.75 or lower, but using a threshold of 0.20 has a detrimental effect on the power to detect a functional interactive SNP pair (power < 25%). Signal sensitivity, directly using LD-block information to determine whether an epistasis signal is present or not, benefits from LD-pruning as well (average power across scenarios: 87%), but is largely hampered by functional loci residing at the boundaries of an LD-block. Conclusions Our results confirm that LD patterns and the position of causal variants in LD blocks do have an impact on epistasis detection, and that pruning strategies and LD-blocks definitions combined need careful attention, if we wish to maximize the power of large-scale epistasis screenings.
Collapse
Affiliation(s)
- Marc Joiret
- BIO3, GIGA-R Medical Genomics, Avenue de l'Hôpital 1-B34-CHU, Liège, 4000 Belgium.,Biomechanics Research Unit, GIGA-R in-silico medicine, Liège, Avenue de l'Hôpital 1-B34-CHU, Liège, 4000 Belgium
| | | | - Elena S Gusareva
- BIO3, GIGA-R Medical Genomics, Avenue de l'Hôpital 1-B34-CHU, Liège, 4000 Belgium
| | - Kristel Van Steen
- BIO3, GIGA-R Medical Genomics, Avenue de l'Hôpital 1-B34-CHU, Liège, 4000 Belgium.,WELBIO researcher, Avenue de l'Hôpital 1-B34-CHU, Liège, 4000 Belgium
| |
Collapse
|
15
|
Van Steen K, Moore JH. How to increase our belief in discovered statistical interactions via large-scale association studies? Hum Genet 2019; 138:293-305. [PMID: 30840129 PMCID: PMC6483943 DOI: 10.1007/s00439-019-01987-w] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 02/20/2019] [Indexed: 12/31/2022]
Abstract
The understanding that differences in biological epistasis may impact disease risk, diagnosis, or disease management stands in wide contrast to the unavailability of widely accepted large-scale epistasis analysis protocols. Several choices in the analysis workflow will impact false-positive and false-negative rates. One of these choices relates to the exploitation of particular modelling or testing strategies. The strengths and limitations of these need to be well understood, as well as the contexts in which these hold. This will contribute to determining the potentially complementary value of epistasis detection workflows and is expected to increase replication success with biological relevance. In this contribution, we take a recently introduced regression-based epistasis detection tool as a leading example to review the key elements that need to be considered to fully appreciate the value of analytical epistasis detection performance assessments. We point out unresolved hurdles and give our perspectives towards overcoming these.
Collapse
Affiliation(s)
- K Van Steen
- WELBIO, GIGA-R Medical Genomics-BIO3, University of Liège, Liege, Belgium.
- Department of Human Genetics, University of Leuven, Leuven, Belgium.
| | - J H Moore
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, USA
| |
Collapse
|
16
|
Statistical methods for genome-wide association studies. Semin Cancer Biol 2019; 55:53-60. [DOI: 10.1016/j.semcancer.2018.04.008] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2017] [Revised: 04/27/2018] [Accepted: 04/28/2018] [Indexed: 12/12/2022]
|
17
|
Statistical Modeling of Trivariate Static Systems: Isotonic Models. DATA 2019. [DOI: 10.3390/data4010017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
This paper presents an improved version of a statistical trivariate modeling algorithm introduced in a short Letter by the first author. This paper recalls the fundamental concepts behind the proposed algorithm, evidences its criticalities and illustrates a number of improvements which lead to a functioning modeling algorithm. The present paper also illustrates the features of the improved statistical modeling algorithm through a comprehensive set of numerical experiments performed on four synthetic and five natural datasets. The obtained results confirm that the proposed algorithm is able to model the considered synthetic and the natural datasets faithfully.
Collapse
|
18
|
Lee S, Son D, Kim Y, Yu W, Park T. Unified Cox model based multifactor dimensionality reduction method for gene-gene interaction analysis of the survival phenotype. BioData Min 2018; 11:27. [PMID: 30564286 PMCID: PMC6295107 DOI: 10.1186/s13040-018-0189-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Accepted: 11/26/2018] [Indexed: 12/04/2022] Open
Abstract
Background One strategy for addressing missing heritability in genome-wide association study is gene-gene interaction analysis, which, unlike a single gene approach, involves high-dimensionality. The multifactor dimensionality reduction method (MDR) has been widely applied to reduce multi-levels of genotypes into high or low risk groups. The Cox-MDR method has been proposed to detect gene-gene interactions associated with the survival phenotype by using the martingale residuals from a Cox model. However, this method requires a cross-validation procedure to find the best SNP pair among all possible pairs and the permutation procedure should be followed for the significance of gene-gene interactions. Recently, the unified model based multifactor dimensionality reduction method (UM-MDR) has been proposed to unify the significance testing with the MDR algorithm within the regression model framework, in which neither cross-validation nor permutation testing are needed. In this paper, we proposed a simple approach, called Cox UM-MDR, which combines Cox-MDR with the key procedure of UM-MDR to identify gene-gene interactions associated with the survival phenotype. Results The simulation study was performed to compare Cox UM-MDR with Cox-MDR with and without the marginal effects of SNPs. We found that Cox UM-MDR has similar power to Cox-MDR without marginal effects, whereas it outperforms Cox-MDR with marginal effects and more robust to heavy censoring. We also applied Cox UM-MDR to a dataset of leukemia patients and detected gene-gene interactions with regard to the survival time. Conclusion Cox UM-MDR is easily implemented by combining Cox-MDR with UM-MDR to detect the significant gene-gene interactions associated with the survival time without cross-validation and permutation testing. The simulation results are shown to demonstrate the utility of the proposed method, which achieves at least the same power as Cox-MDR in most scenarios, and outperforms Cox-MDR when some SNPs having only marginal effects might mask the detection of the causal epistasis.
Collapse
Affiliation(s)
- Seungyeoun Lee
- 1Department of Mathematics and Statistics, Sejong University, 209 Neungdong-ro, Gwangjin-gu, Seoul, 05006 South Korea
| | - Donghee Son
- 1Department of Mathematics and Statistics, Sejong University, 209 Neungdong-ro, Gwangjin-gu, Seoul, 05006 South Korea
| | - Yongkang Kim
- 2Department of Statistics, Seoul National University, Shilim-dong, Kwanak-gu, Seoul, 151-742 South Korea
| | - Wenbao Yu
- 3Division of Oncology and Centre for Childhood Cancer Research, Children's Hospital of Philadelphia, Philadelphia, PA 19104 USA
| | - Taesung Park
- 2Department of Statistics, Seoul National University, Shilim-dong, Kwanak-gu, Seoul, 151-742 South Korea
| |
Collapse
|
19
|
Male-specific epistasis between WWC1 and TLN2 genes is associated with Alzheimer's disease. Neurobiol Aging 2018; 72:188.e3-188.e12. [PMID: 30201328 PMCID: PMC6769421 DOI: 10.1016/j.neurobiolaging.2018.08.001] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2018] [Revised: 07/05/2018] [Accepted: 08/01/2018] [Indexed: 12/19/2022]
Abstract
Systematic epistasis analyses in multifactorial disorders are an important step to better characterize complex genetic risk structures. We conducted a hypothesis-free sex-stratified genome-wide screening for epistasis contributing to Alzheimer's disease (AD) susceptibility. We identified a statistical epistasis signal between the single nucleotide polymorphisms rs3733980 and rs7175766 that was associated with AD in males (genome-wide significant pBonferroni-corrected=0.0165). This signal pointed toward the genes WW and C2 domain containing 1, aka KIBRA; 5q34 and TLN2 (talin 2; 15q22.2). Gene-based meta-analysis in 3 independent consortium data sets confirmed the identified interaction: the most significant (pmeta-Bonferroni-corrected=9.02*10-3) was for the single nucleotide polymorphism pair rs1477307 and rs4077746. In functional studies, WW and C2 domain containing 1, aka KIBRA and TLN2 coexpressed in the temporal cortex brain tissue of AD subjects (β=0.17, 95% CI 0.04 to 0.30, p=0.01); modulated Tau toxicity in Drosophila eye experiments; colocalized in brain tissue cells, N2a neuroblastoma, and HeLa cell lines; and coimmunoprecipitated both in brain tissue and HEK293 cells. Our finding points toward new AD-related pathways and provides clues toward novel medical targets for the cure of AD.
Collapse
|
20
|
Yadav RP, Ghatak S, Chakraborty P, Lalrohlui F, Kannan R, Kumar R, Pautu JL, Zomingthanga J, Chenkual S, Muthukumaran R, Senthil Kumar N. Lifestyle chemical carcinogens associated with mutations in cell cycle regulatory genes increases the susceptibility to gastric cancer risk. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2018; 25:31691-31704. [PMID: 30209766 DOI: 10.1007/s11356-018-3080-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Accepted: 08/27/2018] [Indexed: 06/08/2023]
Abstract
In the present study, we correlated the various lifestyle habits and their associated mutations in cell cycle (P21 and MDM2) and DNA damage repair (MLH1) genes to investigate their role in gastric cancer (GC). Multifactor dimensionality reduction (MDR) analysis revealed the two-factor model of oral snuff and smoked meat as the significant model for GC risk. The interaction analysis between identified mutations and the significant demographic factors predicted that oral snuff is significantly associated with P21 3'UTR mutations. A total of five mutations in P21 gene, including three novel mutations in intron 2 (36651738G > A, 36651804A > T, 36651825G > T), were identified. In MLH1 gene, two variants were identified viz. one in exon 8 (37053568A > G; 219I > V) and a novel 37088831C > G in intron 16. Flow cytometric analysis predicted DNA aneuploidy in 07 (17.5%) and diploidy in 33 (82.5%) tumor samples. The G2/M phase was significantly arrested in aneuploid gastric tumor samples whereas high S-phase fraction was observed in all the gastric tumor samples. This study demonstrated that environmental chemical carcinogens along with alteration in cell cycle regulatory (P21) and mismatch repair (MLH1) genes may be stimulating the susceptibility of GC by altering the DNA content level abnormally in tumors in the Mizo ethic population.
Collapse
Affiliation(s)
- Ravi Prakash Yadav
- Department of Biotechnology, Mizoram University, Aizawl, Mizoram, 796004, India
| | - Souvik Ghatak
- Department of Biotechnology, Mizoram University, Aizawl, Mizoram, 796004, India
| | - Payel Chakraborty
- Department of Biotechnology, Mizoram University, Aizawl, Mizoram, 796004, India
| | - Freda Lalrohlui
- Department of Biotechnology, Mizoram University, Aizawl, Mizoram, 796004, India
| | - Ravi Kannan
- Cachar Cancer Hospital and Research Centre, Silchar, Assam 788015, India
| | - Rajeev Kumar
- Cachar Cancer Hospital and Research Centre, Silchar, Assam 788015, India
| | - Jeremy L Pautu
- Mizoram State Cancer Institute, Zemabawk, Aizawl, Mizoram, 796017, India
| | - John Zomingthanga
- Department of Pathology, Civil Hospital, Aizawl, Mizoram, 796001, India
| | - Saia Chenkual
- Department of Surgery, Civil Hospital, Aizawl, Mizoram, 796001, India
| | | | | |
Collapse
|
21
|
Kundu S, Ramshankar V, Verma AK, Thangaraj SV, Krishnamurthy A, Kumar R, Kannan R, Ghosh SK. Association of DFNA5, SYK, and NELL1 variants along with HPV infection in oral cancer among the prolonged tobacco-chewers. Tumour Biol 2018; 40:1010428318793023. [PMID: 30091681 DOI: 10.1177/1010428318793023] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Southeast Asia, especially India, is well known for the highest use of smokeless tobacco. These products are known to induce oral squamous cell carcinoma. However, not all long-term tobacco-chewers develop oral squamous cell carcinoma. In addition, germline variants play a crucial role in susceptibility, prognosis, development, and progression of the disease. These prompted us to study the genetic susceptibility to oral squamous cell carcinoma among the long-term tobacco-chewers. Here, we presented a retrospective study on prolonged tobacco-chewers of Northeast India to identify the potential protective or risk-associated germline variants in tobacco-related oral squamous cell carcinoma along with HPV infection. Targeted re-sequencing (n = 60) of 170 genetic regions from 75 genes was carried out in Ion-PGM™ and validation (n = 116) of the observed variants was done using Sequenom iPLEX MassARRAY™ platform followed by polymerase chain reaction-based HPV genotyping and p16-immunohistochemistry study. Subsequently, estimation of population structure, different statistical and in silico approaches were undertaken. We identified one nonsense-mediated mRNA decay transcript variant in the DFNA5 region (rs2237306), associated with Benzo(a)pyrene, as a protective factor (odds ratio = 0.33; p = 0.009) and four harmful (odds ratio > 2.5; p < 0.05) intronic variants, rs182361, rs290974, and rs169724 in SYK and rs1670661 in NELL1 region, involved in genetic susceptibility to tobacco- and HPV-mediated oral oncogenesis. Among the oral squamous cell carcinoma patients, 12.6% (11/87) were HPV positive, out of which 45.5% (5/11) were HPV16-infected, 27.3% (3/11) were HPV18-infected, and 27.3% (3/11) had an infection of both subtypes. Multifactor dimensionality reduction analysis showed that the interactions among HPV and NELL1 variant rs1670661 with age and gender augmented the risk of both non-tobacco- and tobacco-related oral squamous cell carcinoma, respectively. These suggest that HPV infection may be one of the important risk factors for oral squamous cell carcinoma in this population. Finally, we newly report a DFNA5 variant probably conferring protection via nonsense-mediated mRNA decay pathway against tobacco-related oral squamous cell carcinoma. Thus, the analytical approach used here can be useful in predicting the population-specific significant variants associated with oral squamous cell carcinoma in any heterogeneous population.
Collapse
Affiliation(s)
- Sharbadeb Kundu
- 1 Department of Biotechnology, Assam University, Silchar, India
| | | | | | | | | | - Rajeev Kumar
- 5 Department of Molecular Oncology, Cachar Cancer Hospital & Research Centre, Silchar, India
| | - Ravi Kannan
- 5 Department of Molecular Oncology, Cachar Cancer Hospital & Research Centre, Silchar, India
| | - Sankar Kumar Ghosh
- 1 Department of Biotechnology, Assam University, Silchar, India.,6 University of Kalyani, Nadia, India
| |
Collapse
|
22
|
García-González I, López-Díaz RI, Canché-Pech JR, Solís-Cárdenas ADJ, Flores-Ocampo JA, Mendoza-Alcocer R, Herrera-Sánchez LF, Jiménez-Rico MA, Ceballos-López AA, López-Novelo ME. Epistasis analysis of metabolic genes polymorphisms associated with ischemic heart disease in Yucatan. CLINICA E INVESTIGACION EN ARTERIOSCLEROSIS : PUBLICACION OFICIAL DE LA SOCIEDAD ESPANOLA DE ARTERIOSCLEROSIS 2018; 30:102-111. [PMID: 29395491 DOI: 10.1016/j.arteri.2017.11.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Revised: 11/27/2017] [Accepted: 11/29/2017] [Indexed: 06/07/2023]
Abstract
OBJECTIVE Epistasis is a type of genetic interaction that could explain much of the phenotypic variability of complex diseases. In this work, the effect of epistasis of metabolic genes and cardiovascular risk on the susceptibility to the development of ischemic heart disease in Yucatan was determined. METHODS Case-control study in 79 Yucatecan patients with ischemic heart disease and 101 healthy controls matched by age and origin with cases. The polymorphisms -108CT, Q192R, L55M (paraoxonase 1; PON1), C677T, A1298C (methylenetetrahydrofolate reductase; MTHFR), and the presence/absence of the glutathione S-transferase T1 (GSTT1) gene were genotyped. Epistasis analysis was performed using the multifactorial dimensional reduction method. The best risk prediction model was selected based on precision (%), statistical significance (P<0.05), and cross-validation consistency. RESULTS We found an independent association of the null genotype GSTT1*0/0 (OR=3.39, CI: 1.29-8.87, P=0.017) and the null allele (OR=1.86, CI: 1.19-2.91, P=0.007) with ischemic heart disease. The GSTT1*0 deletion and the 677TT genotype (MTHFR) were identified as being at a high cardiovascular risk, whereas the GSTT1*1 wild type genotype and the CC677 variant were at low risk. The gene-environment interaction identified the GSTT1 gene, C677T polymorphism (MTHFR), and hypertension as the factors that best explain ischemic heart disease in the study population. CONCLUSIONS The interaction of the MTHFR, GSTT1 and hypertension may constitute a predictive model of risk for early onset ischemic heart disease in the population of Yucatan.
Collapse
Affiliation(s)
- Igrid García-González
- Departamento de Biología Molecular, Laboratorios Biomédicos de Mérida, Mérida, Yucatán, México.
| | - Roger Iván López-Díaz
- Departamento de Biología Molecular, Laboratorios Biomédicos de Mérida, Mérida, Yucatán, México
| | - José Reyes Canché-Pech
- Departamento de Biología Molecular, Laboratorios Biomédicos de Mérida, Mérida, Yucatán, México
| | | | | | | | | | | | | | - María E López-Novelo
- Departamento de Biología Molecular, Laboratorios Biomédicos de Mérida, Mérida, Yucatán, México
| |
Collapse
|
23
|
Jung HY, Leem S, Park T. Fuzzy set-based generalized multifactor dimensionality reduction analysis of gene-gene interactions. BMC Med Genomics 2018; 11:32. [PMID: 29697366 PMCID: PMC5918459 DOI: 10.1186/s12920-018-0343-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Gene-gene interactions (GGIs) are a known cause of missing heritability. Multifactor dimensionality reduction (MDR) is one of most commonly used methods for GGI detection. The generalized multifactor dimensionality reduction (GMDR) method is an extension of MDR method that is applicable to various types of traits, and allows covariate adjustments. Our previous Fuzzy MDR (FMDR) is another extension for overcoming simple binary classification. FMDR uses continuous member-ship values instead of binary membership values 0 and 1, improving power for detecting causal SNPs and more intuitive interpretations in real data analysis. Here, we propose the fuzzy generalized multifactor dimensionality reduction (FGMDR) method, as a combined analysis of fuzzy set-based analysis and GMDR method, to detect GGIs associated with diseases using fuzzy set theory. RESULTS Through simulation studies for different types of traits, the proposed FGMDR showed a higher detection ratio of causal SNPs, compared to GMDR. We then applied FGMDR to two real data: Crohn's disease (CD) data from the Wellcome Trust Case Control Consortium (WTCCC) with a binary phenotype and the Homeostasis Model Assessment of Insulin Resistance (HOMA-IR) data from Korean population with a continuous phenotype. The interactions derived by our method include the pre-reported interactions associated with phenotypes. CONCLUSIONS The proposed FGMDR performs well for GGI detection with covariate adjustments. The program written in R for FGMDR is available at http://statgen.snu.ac.kr/software/FGMDR .
Collapse
Affiliation(s)
- Hye-Young Jung
- Faculty of Liberal Education, Seoul National University, Seoul, 08826 South Korea
| | - Sangseob Leem
- Department of Statistics, Seoul National University, Seoul, 08826 South Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul, 08826 South Korea
| |
Collapse
|
24
|
Ritchie MD, Van Steen K. The search for gene-gene interactions in genome-wide association studies: challenges in abundance of methods, practical considerations, and biological interpretation. ANNALS OF TRANSLATIONAL MEDICINE 2018; 6:157. [PMID: 29862246 DOI: 10.21037/atm.2018.04.05] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
One of the primary goals in this era of precision medicine is to understand the biology of human diseases and their treatment, such that each individual patient receives the best possible treatment for their disease based on their genetic and environmental exposures. One way to work towards achieving this goal is to identify the environmental exposures and genetic variants that are relevant to each disease in question, as well as the complex interplay between genes and environment. Genome-wide association studies (GWAS) have allowed for a greater understanding of the genetic component of many complex traits. However, these genetic effects are largely small and thus, our ability to use these GWAS finding for precision medicine is limited. As more and more GWAS have been performed, rather than focusing only on common single nucleotide polymorphisms (SNPs) and additive genetic models, many researchers have begun to explore alternative heritable components of complex traits including rare variants, structural variants, epigenetics, and genetic interactions. While genetic interactions are a plausible reality that could explain some of the heritabliy that has not yet been identified, especially when one considers the identification of genetic interactions in model organisms as well as our understanding of biological complexity, still there are significant challenges and considerations in identifying these genetic interactions. Broadly, these can be summarized in three categories: abundance of methods, practical considerations, and biological interpretation. In this review, we will discuss these important elements in the search for genetic interactions along with some potential solutions. While genetic interactions are theoretically understood to be important for complex human disease, the body of evidence is still building to support this component of the underlying genetic architecture of complex human traits. Our hope is that more sophisticated modeling approaches and more robust computational techniques will enable the community to identify these important genetic interactions and improve our ability to implement precision medicine in the future.
Collapse
Affiliation(s)
- Marylyn D Ritchie
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| | - Kristel Van Steen
- WELBIO, GIGA-R Medical Genomics Unit - BIO3, University of Liège, Liège, Belgium.,Department of Human Genetics, University of Leuven, Leuven, Belgium
| |
Collapse
|
25
|
Uppu S, Krishna A, Gopalan RP. A Review on Methods for Detecting SNP Interactions in High-Dimensional Genomic Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:599-612. [PMID: 28060710 DOI: 10.1109/tcbb.2016.2635125] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this era of genome-wide association studies (GWAS), the quest for understanding the genetic architecture of complex diseases is rapidly increasing more than ever before. The development of high throughput genotyping and next generation sequencing technologies enables genetic epidemiological analysis of large scale data. These advances have led to the identification of a number of single nucleotide polymorphisms (SNPs) responsible for disease susceptibility. The interactions between SNPs associated with complex diseases are increasingly being explored in the current literature. These interaction studies are mathematically challenging and computationally complex. These challenges have been addressed by a number of data mining and machine learning approaches. This paper reviews the current methods and the related software packages to detect the SNP interactions that contribute to diseases. The issues that need to be considered when developing these models are addressed in this review. The paper also reviews the achievements in data simulation to evaluate the performance of these models. Further, it discusses the future of SNP interaction analysis.
Collapse
|
26
|
Yu W, Lee S, Park T. A unified model based multifactor dimensionality reduction framework for detecting gene-gene interactions. Bioinformatics 2017; 32:i605-i610. [PMID: 27587680 DOI: 10.1093/bioinformatics/btw424] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Gene-gene interaction (GGI) is one of the most popular approaches for finding and explaining the missing heritability of common complex traits in genome-wide association studies. The multifactor dimensionality reduction (MDR) method has been widely studied for detecting GGI effects. However, there are several disadvantages of the existing MDR-based approaches, such as the lack of an efficient way of evaluating the significance of multi-locus models and the high computational burden due to intensive permutation. Furthermore, the MDR method does not distinguish marginal effects from pure interaction effects. METHODS We propose a two-step unified model based MDR approach (UM-MDR), in which, the significance of a multi-locus model, even a high-order model, can be easily obtained through a regression framework with a semi-parametric correction procedure for controlling Type I error rates. In comparison to the conventional permutation approach, the proposed semi-parametric correction procedure avoids heavy computation in order to achieve the significance of a multi-locus model. The proposed UM-MDR approach is flexible in the sense that it is able to incorporate different types of traits and evaluate significances of the existing MDR extensions. RESULTS The simulation studies and the analysis of a real example are provided to demonstrate the utility of the proposed method. UM-MDR can achieve at least the same power as MDR for most scenarios, and it outperforms MDR especially when there are some single nucleotide polymorphisms that only have marginal effects, which masks the detection of causal epistasis for the existing MDR approaches. CONCLUSIONS UM-MDR provides a very good supplement of existing MDR method due to its efficiency in achieving significance for every multi-locus model, its power and its flexibility of handling different types of traits. AVAILABILITY AND IMPLEMENTATION A R package "umMDR" and other source codes are freely available at http://statgen.snu.ac.kr/software/umMDR/ CONTACT: tspark@stats.snu.ac.kr SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wenbao Yu
- Department of Statistics, Seoul National University, Shilim-Dong, Kwanak-Gu, Seoul 151-742, Korea
| | - Seungyeoun Lee
- Department of Mathematics and Statistics, Sejong University, Seoul 143-747, Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Shilim-Dong, Kwanak-Gu, Seoul 151-742, Korea
| |
Collapse
|
27
|
Abo Alchamlat S, Farnir F. KNN-MDR: a learning approach for improving interactions mapping performances in genome wide association studies. BMC Bioinformatics 2017; 18:184. [PMID: 28327091 PMCID: PMC5361736 DOI: 10.1186/s12859-017-1599-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Accepted: 03/11/2017] [Indexed: 12/30/2022] Open
Abstract
Background Finding epistatic interactions in large association studies like genome-wide association studies (GWAS) with the nowadays-available large volume of genomic data is a challenging and largely unsolved issue. Few previous studies could handle genome-wide data due to the intractable difficulties met in searching a combinatorial explosive search space and statistically evaluating epistatic interactions given a limited number of samples. Our work is a contribution to this field. We propose a novel approach combining K-Nearest Neighbors (KNN) and Multi Dimensional Reduction (MDR) methods for detecting gene-gene interactions as a possible alternative to existing algorithms, e especially in situations where the number of involved determinants is high. After describing the approach, a comparison of our method (KNN-MDR) to a set of the other most performing methods (i.e., MDR, BOOST, BHIT, MegaSNPHunter and AntEpiSeeker) is carried on to detect interactions using simulated data as well as real genome-wide data. Results Experimental results on both simulated data and real genome-wide data show that KNN-MDR has interesting properties in terms of accuracy and power, and that, in many cases, it significantly outperforms its recent competitors. Conclusions The presented methodology (KNN-MDR) is valuable in the context of loci and interactions mapping and can be seen as an interesting addition to the arsenal used in complex traits analyses. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1599-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sinan Abo Alchamlat
- Department of Biostatistics, Faculty of Veterinary Medicine, FARAH, University of Liège, Sart Tilman B43, 4000, Liege, Belgium
| | - Frédéric Farnir
- Department of Biostatistics, Faculty of Veterinary Medicine, FARAH, University of Liège, Sart Tilman B43, 4000, Liege, Belgium.
| |
Collapse
|
28
|
Gupta U, Mir SS, Garg N, Agarwal SK, Pande S, Mittal B. Association study of inflammatory genes with rheumatic heart disease in North Indian population: A multi-analytical approach. Immunol Lett 2016; 174:53-62. [PMID: 27118427 DOI: 10.1016/j.imlet.2016.04.012] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2016] [Revised: 04/13/2016] [Accepted: 04/13/2016] [Indexed: 10/21/2022]
Abstract
Rheumatic heart disease (RHD) is an inflammatory, autoimmune disease; occurring as a consequence of group A streptococcal infection complicated by rheumatic fever (RF). An inappropriate immune response is the central signature tune to the complex pathogenesis of RHD. However, some of those infected develop RHD, and genetic host susceptibility factors are thought to play a key role in diseasedevelopment. Therefore, the present study was designed to explore the role of genetic variants in inflammatory genes in conferring risk of RHD. The study recruited total of 700 subjects, including 400 RHD patients and 300 healthy controls. We examined the associations of 8 selected polymorphisms in seven inflammatory genes: IL-6 [rs1800795G/C], IL-10 [rs1800896G/A], TNF-A [rs1800629G/A], IL-1β [rs2853550C/T], IL-1VNTR [rs2234663], TGF-β1 [rs1800469C/T]; [rs1982073T/C], and CTLA-4 [rs5742909C/T] with RHD risk. Genotyping for all the polymorphisms was done using PCR-ARMS/PCR/RFLP methods. Multifactor dimensionality reduction and classification and regression tree approaches were combined with logistic regression to discover high-order gene-gene interactions in studiedgenes involved in RHD susceptibility.In univariate logistic regression analysis, we found significant association of variant-containing genotypes (CT&TT) of TGF-β1 869T/C [rs1982073]; [p=0.0.004 & 0.001, OR (95% CI)=1.65 (1.2-2.3) & 2.25 (1.4-3.6) respectively], variant genotype (CC) of IL-1β -511C/T [rs2853550]; [p=0.001, OR (95% CI)=2.33 (1.4-3.8)] and IL-1 VNTR [rs2234663]; [p=0.03, OR (95% CI)=5.25 (1.2-23.4)] SNPs with RHD risk. CART analysis revealed that individuals with the combined genotypes of TGF-β1T/C_ rs1982073 (CT/TT) and IL-1 β_ rs2853550 (CC) had significantly higher susceptibility for RHD [p=0.0005, OR (95% CI)=5.91 (2.9-12.5)]. In MDR analysis, TGF-β1 869T>C yielded the highest testing accuracy of 0.562. In conclusion, using multi-analytical approaches, our study revealed important role of TGF-β1 869T/C [rs1982073] in RHD susceptibility.
Collapse
Affiliation(s)
- Usha Gupta
- Department of Genetics, Sanjay Gandhi Postgraduate Institute of Medical Sciences (SGPGIMS), Lucknow, India
| | - Snober S Mir
- Department of Bioengineering, Integral University, Lucknow, India
| | - Naveen Garg
- Department of Cardiology, Sanjay Gandhi Postgraduate Institute of Medical Sciences (SGPGIMS), Lucknow, India
| | - Surendra K Agarwal
- Department of Cardiovascular and Thoracic Surgery, Sanjay Gandhi Postgraduate Institute of Medical Sciences (SGPGIMS), Lucknow, India
| | - Shantanu Pande
- Department of Cardiovascular and Thoracic Surgery, Sanjay Gandhi Postgraduate Institute of Medical Sciences (SGPGIMS), Lucknow, India
| | - Balraj Mittal
- Department of Genetics, Sanjay Gandhi Postgraduate Institute of Medical Sciences (SGPGIMS), Lucknow, India.
| |
Collapse
|
29
|
Lishout FV, Gadaleta F, Moore JH, Wehenkel L, Steen KV. gammaMAXT: a fast multiple-testing correction algorithm. BioData Min 2015; 8:36. [PMID: 26594243 PMCID: PMC4654922 DOI: 10.1186/s13040-015-0069-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2015] [Accepted: 11/08/2015] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND The purpose of the MaxT algorithm is to provide a significance test algorithm that controls the family-wise error rate (FWER) during simultaneous hypothesis testing. However, the requirements in terms of computing time and memory of this procedure are proportional to the number of investigated hypotheses. The memory issue has been solved in 2013 by Van Lishout's implementation of MaxT, which makes the memory usage independent from the size of the dataset. This algorithm is implemented in MBMDR-3.0.3, a software that is able to identify genetic interactions, for a variety of SNP-SNP based epistasis models effectively. On the other hand, that implementation turned out to be less suitable for genome-wide interaction analysis studies, due to the prohibitive computational burden. RESULTS In this work we introduce gammaMAXT, a novel implementation of the maxT algorithm for multiple testing correction. The algorithm was implemented in software MBMDR-4.2.2, as part of the MB-MDR framework to screen for SNP-SNP, SNP-environment or SNP-SNP-environment interactions at a genome-wide level. We show that, in the absence of interaction effects, test-statistics produced by the MB-MDR methodology follow a mixture distribution with a point mass at zero and a shifted gamma distribution for the top 10 % of the strictly positive values. We show that the gammaMAXT algorithm has a power comparable to MaxT and maintains FWER, but requires less computational resources and time. We analyze a dataset composed of 10(6) SNPs and 1000 individuals within one day on a 256-core computer cluster. The same analysis would take about 10(4) times longer with MBMDR-3.0.3. CONCLUSIONS These results are promising for future GWAIs. However, the proposed gammaMAXT algorithm offers a general significance assessment and multiple testing approach, applicable to any context that requires performing hundreds of thousands of tests. It offers new perspectives for fast and efficient permutation-based significance assessment in large-scale (integrated) omics studies.
Collapse
Affiliation(s)
- François Van Lishout
- Systems and Modeling Unit, Montefiore Institute, University of Liège, Allée de la découverte 10, Liège, 4000 Belgium ; Bioinformatics and Modeling, GIGA-R, Avenue de l'Hôpital 1, Sart-Tilman, 4000 Belgium
| | - Francesco Gadaleta
- Systems and Modeling Unit, Montefiore Institute, University of Liège, Allée de la découverte 10, Liège, 4000 Belgium ; Bioinformatics and Modeling, GIGA-R, Avenue de l'Hôpital 1, Sart-Tilman, 4000 Belgium
| | - Jason H Moore
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, 19104-6021 PA USA
| | - Louis Wehenkel
- Systems and Modeling Unit, Montefiore Institute, University of Liège, Allée de la découverte 10, Liège, 4000 Belgium ; Bioinformatics and Modeling, GIGA-R, Avenue de l'Hôpital 1, Sart-Tilman, 4000 Belgium
| | - Kristel Van Steen
- Systems and Modeling Unit, Montefiore Institute, University of Liège, Allée de la découverte 10, Liège, 4000 Belgium ; Bioinformatics and Modeling, GIGA-R, Avenue de l'Hôpital 1, Sart-Tilman, 4000 Belgium
| |
Collapse
|
30
|
Kim Y, Park T. Robust Gene-Gene Interaction Analysis in Genome Wide Association Studies. PLoS One 2015; 10:e0135016. [PMID: 26267341 PMCID: PMC4534386 DOI: 10.1371/journal.pone.0135016] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2014] [Accepted: 07/17/2015] [Indexed: 11/19/2022] Open
Abstract
Genome-wide association studies (GWAS) have successfully discovered hundreds of associations between genetic variants and complex traits. Most GWAS have focused on the identification of single variants. It has been shown that most of the variants that were discovered by GWAS could only partially explain disease heritability. The explanation for this missing heritability is generally believed to be gene-gene (GG) or gene-environment (GE) interactions and other structural variants. Generalized multifactor dimensionality reduction (GMDR) has been proven to be reasonably powerful in detecting GG and GE interactions; however, its performance has been found to decline when outlying quantitative traits are present. This paper proposes a robust GMDR estimation method (based on the L-estimator and M-estimator estimation methods) in an attempt to reduce the effects caused by outlying traits. A comparison of robust GMDR with the original MDR based on simulation studies showed the former method to outperform the latter. The performance of robust GMDR is illustrated through a real GWA example consisting of 8,577 samples from the Korean population using the Homeostasis Model Assessment of Insulin Resistance (HOMA-IR) level as a phenotype. Robust GMDR identified the KCNH1 gene to have strong interaction effects with other genes on the function of insulin secretion.
Collapse
Affiliation(s)
- Yongkang Kim
- Department of Statistics, Seoul National University, Seoul, South Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul, South Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 151–741, South Korea
| |
Collapse
|
31
|
Choudhury JH, Singh SA, Kundu S, Choudhury B, Talukdar FR, Srivasta S, Laskar RS, Dhar B, Das R, Laskar S, Kumar M, Kapfo W, Mondal R, Ghosh SK. Tobacco carcinogen-metabolizing genes CYP1A1, GSTM1, and GSTT1 polymorphisms and their interaction with tobacco exposure influence the risk of head and neck cancer in Northeast Indian population. Tumour Biol 2015; 36:5773-83. [PMID: 25724184 DOI: 10.1007/s13277-015-3246-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2014] [Accepted: 02/10/2015] [Indexed: 11/29/2022] Open
Abstract
Genetic polymorphisms in tobacco-metabolizing genes may modulate the risk of head and neck cancer (HNC). In Northeast India, head and neck cancers and tobacco consumption remains most prevalent. The aim of the study was to investigate the combined effect of cytochrome P450 1A1 (CYP1A1) T3801C, glutathione S-transferases (GSTs) genes polymorphisms and smoking and tobacco-betel quid chewing in the risk of HNC. The study included 420 subjects (180 cases and 240 controls) from Northeast Indian population. Polymorphisms of CYP1A1 T3801C and GST (M1 & T1) were studied by polymerase chain reaction restriction fragment length polymorphism (PCR-RFLP) and multiplex PCR, respectively. Logistic regression (LR) and multifactor dimensionality reduction (MDR) approach were applied for statistical analysis. LR analysis revealed that subjects carrying CYP1A1 TC/CC + GSTM1 null genotypes had 3.52-fold (P < 0.001) increase the risk of head and neck squamous cell carcinoma (HNSCC). Smokers carrying CYP1A1 TC/CC + GSTM1 null and CYP1A1 TC/CC + GSTT1 null genotypes showed significant association with HNC risk (odds ratio [OR] = 6.42; P < 0.001 and 3.86; P = 0.005, respectively). Similarly, tobacco-betel quid chewers carrying CYP1A1 TC/CC + GSTM1 null genotypes also had several fold increased risk of HNC (P < 0.001). In MDR analysis, the best model for HNSCC risk was the four-factor model of tobacco-betel quid chewing, smoking, CYP1A1 TC/CC, and GSTM1 null genotypes (testing balance accuracy [TBA] = 0.6292; cross-validation consistency [CVC] = 9/10 and P < 0.0001). These findings suggest that interaction of combined genotypes of carcinogen-metabolizing genes with environmental factors might modulate susceptibility of HNC in Northeast Indian population.
Collapse
Affiliation(s)
- Javed Hussain Choudhury
- Molecular Medicine Laboratory, Department of Biotechnology, Assam University, Silchar, Assam, 788011, India
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Yu W, Kwon MS, Park T. Multivariate Quantitative Multifactor Dimensionality Reduction for Detecting Gene-Gene Interactions. Hum Hered 2015. [PMID: 26201702 DOI: 10.1159/000377723] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVES To determine gene-gene interactions and missing heritability of complex diseases is a challenging topic in genome-wide association studies. The multifactor dimensionality reduction (MDR) method is one of the most commonly used methods for identifying gene-gene interactions with dichotomous phenotypes. For quantitative phenotypes, the generalized MDR or quantitative MDR (QMDR) methods have been proposed. These methods are known as univariate methods because they consider only one phenotype. To date, there are few methods for analyzing multiple phenotypes. METHODS To address this problem, we propose a multivariate QMDR method (Multi-QMDR) for multivariate correlated phenotypes. We summarize the multivariate phenotypes into a univariate score by dimensional reduction analysis, and then classify the samples accordingly into high-risk and low-risk groups. We use different ways of summarizing mainly based on the principal components. Multi-QMDR is model-free and easy to implement. RESULTS Multi-QMDR is applied to lipid-related traits. The properties of Multi- QMDR were investigated through simulation studies. Empirical studies show that Multi-QMDR outperforms existing univariate and multivariate methods at identifying causal interactions. CONCLUSIONS The Multi-QMDR approach improves the performance of QMDR when multiple quantitative phenotypes are available.
Collapse
Affiliation(s)
- Wenbao Yu
- Department of Statistic, Seoul National University, Seoul, South Korea
| | | | | |
Collapse
|
33
|
Fouladi R, Bessonov K, Van Lishout F, Van Steen K. Model-Based Multifactor Dimensionality Reduction for Rare Variant Association Analysis. Hum Hered 2015. [PMID: 26201701 DOI: 10.1159/000381286] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Genome-wide association studies have revealed a vast amount of common loci associated to human complex diseases. Still, a large proportion of heritability remains unexplained. The extent to which rare genetic variants (RVs) are able to explain a relevant portion of the genetic heritability for complex traits leaves room for several debates and paves the way to the collection of RV databases and the development of novel analytic tools to analyze these. To date, several statistical methods have been proposed to uncover the association of RVs with complex diseases, but none of them is the clear winner in all possible scenarios of study design and assumed underlying disease model. The latter may involve differences in the distributions of effect sizes, proportions of causal variants, and ratios of protective to deleterious variants at distinct regions throughout the genome. Therefore, there is a need for robust scalable methods with acceptable overall performance in terms of power and type I error under various realistic scenarios. In this paper, we propose a novel RV association analysis strategy, which satisfies several of the desired properties that a RV analysis tool should exhibit.
Collapse
Affiliation(s)
- Ramouna Fouladi
- Systems and Modeling Unit, Montefiore Institute, and Bioinformatics and Modeling, GIGA-R, University of Liège, Liège, Belgium
| | | | | | | |
Collapse
|
34
|
Gola D, Mahachie John JM, van Steen K, König IR. A roadmap to multifactor dimensionality reduction methods. Brief Bioinform 2015; 17:293-308. [PMID: 26108231 PMCID: PMC4793893 DOI: 10.1093/bib/bbv038] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2015] [Indexed: 02/02/2023] Open
Abstract
Complex diseases are defined to be determined by multiple genetic and environmental factors alone as well as in interactions. To analyze interactions in genetic data, many statistical methods have been suggested, with most of them relying on statistical regression models. Given the known limitations of classical methods, approaches from the machine-learning community have also become attractive. From this latter family, a fast-growing collection of methods emerged that are based on the Multifactor Dimensionality Reduction (MDR) approach. Since its first introduction, MDR has enjoyed great popularity in applications and has been extended and modified multiple times. Based on a literature search, we here provide a systematic and comprehensive overview of these suggested methods. The methods are described in detail, and the availability of implementations is listed. Most recent approaches offer to deal with large-scale data sets and rare variants, which is why we expect these methods to even gain in popularity.
Collapse
|
35
|
Rule-based analysis for detecting epistasis using associative classification mining. ACTA ACUST UNITED AC 2015. [DOI: 10.1007/s13721-015-0084-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
36
|
Bridging the gap between statistical and biological epistasis in Alzheimer's disease. BIOMED RESEARCH INTERNATIONAL 2015; 2015:870123. [PMID: 26075270 PMCID: PMC4449899 DOI: 10.1155/2015/870123] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 03/06/2015] [Accepted: 05/05/2015] [Indexed: 12/17/2022]
Abstract
Alzheimer's disease affects millions of people worldwide and incidence is expected to rise as the population ages, but no effective therapies exist despite decades of research and more than 20 known disease markers. Research has shown that Alzheimer's disease's missing heritability remains extensive with an estimated 25% of phenotypic variance unexplained by known variants. The missing heritability may be explained by missing variants or by epistasis. Researchers often focus on individual loci rather than epistatic interactions, which is likely an oversimplification of the underlying biology since most phenotypes are affected by multiple genes. Focusing research efforts on epistasis will be critical to resolving Alzheimer's disease etiology, and a major key to identifying and properly interpreting key epistatic interactions will be bridging the gap between statistical and biological epistasis. This review covers the current state of epistasis research in Alzheimer's disease and how researchers can bridge the gap between statistical and biological epistasis to help resolve Alzheimer's disease etiology.
Collapse
|
37
|
Bessonov K, Gusareva ES, Van Steen K. A cautionary note on the impact of protocol changes for genome-wide association SNP × SNP interaction studies: an example on ankylosing spondylitis. Hum Genet 2015; 134:761-73. [DOI: 10.1007/s00439-015-1560-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Accepted: 04/26/2015] [Indexed: 12/11/2022]
|
38
|
Grange L, Bureau JF, Nikolayeva I, Paul R, Van Steen K, Schwikowski B, Sakuntabhai A. Filter-free exhaustive odds ratio-based genome-wide interaction approach pinpoints evidence for interaction in the HLA region in psoriasis. BMC Genet 2015; 16:11. [PMID: 25655172 PMCID: PMC4341885 DOI: 10.1186/s12863-015-0174-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Accepted: 01/23/2015] [Indexed: 12/02/2022] Open
Abstract
Background Deciphering the genetic architecture of complex traits is still a major challenge for human genetics. In most cases, genome-wide association studies have only partially explained the heritability of traits and diseases. Epistasis, one potentially important cause of this missing heritability, is difficult to explore at the genome-wide level. Here, we develop and assess a tool based on interactive odds ratios (IOR), Fast Odds Ratio-based sCan for Epistasis (FORCE), as a novel approach for exhaustive genome-wide epistasis search. IOR is the ratio between the multiplicative term of the odds ratio (OR) of having each variant over the OR of having both of them. By definition, an IOR that significantly deviates from 1 suggests the occurrence of an interaction (epistasis). As the IOR is fast to calculate, we used the IOR to rank and select pairs of interacting polymorphisms for P value estimation, which is more time consuming. Results FORCE displayed power and accuracy similar to existing parametric and non-parametric methods, and is fast enough to complete a filter-free genome-wide epistasis search in a few days on a standard computer. Analysis of psoriasis data uncovered novel epistatic interactions in the HLA region, corroborating the known major and complex role of the HLA region in psoriasis susceptibility. Conclusions Our systematic study revealed the ability of FORCE to uncover novel interactions, highlighted the importance of exhaustiveness, as well as its specificity for certain types of interactions that were not detected by existing approaches. We therefore believe that FORCE is a valuable new tool for decoding the genetic basis of complex diseases. Electronic supplementary material The online version of this article (doi:10.1186/s12863-015-0174-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Laura Grange
- Department of Genomes and Genetics, Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris, 75015, France. .,CNRS URA3012, Paris, 75015, France. .,Université Paris Diderot, Paris, 75013, France.
| | - Jean-François Bureau
- Department of Genomes and Genetics, Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris, 75015, France. .,CNRS URA3012, Paris, 75015, France.
| | - Iryna Nikolayeva
- Department of Genomes and Genetics, Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris, 75015, France. .,Department of Genomes and Genetics, Institut Pasteur, Systems Biology Lab, Paris, 75015, France. .,Université Paris-Descartes, Sorbonne Paris Cité, Paris, France.
| | - Richard Paul
- Department of Genomes and Genetics, Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris, 75015, France. .,CNRS URA3012, Paris, 75015, France.
| | - Kristel Van Steen
- Systems and Modeling Unit, Montefiore institute, University of Liège, Liège, Belgium. .,Bioinformatics and Modeling, GiGA-R, University of Liège, Liège, Belgium.
| | - Benno Schwikowski
- Department of Genomes and Genetics, Institut Pasteur, Systems Biology Lab, Paris, 75015, France.
| | - Anavaj Sakuntabhai
- Department of Genomes and Genetics, Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris, 75015, France. .,CNRS URA3012, Paris, 75015, France.
| |
Collapse
|
39
|
Gusareva ES, Carrasquillo MM, Bellenguez C, Cuyvers E, Colon S, Graff-Radford NR, Petersen RC, Dickson DW, Mahachie John JM, Bessonov K, Van Broeckhoven C, Harold D, Williams J, Amouyel P, Sleegers K, Ertekin-Taner N, Lambert JC, Van Steen K. Genome-wide association interaction analysis for Alzheimer's disease. Neurobiol Aging 2014; 35:2436-2443. [PMID: 24958192 PMCID: PMC4370231 DOI: 10.1016/j.neurobiolaging.2014.05.014] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2013] [Revised: 05/19/2014] [Accepted: 05/21/2014] [Indexed: 12/23/2022]
Abstract
We propose a minimal protocol for exhaustive genome-wide association interaction analysis that involves screening for epistasis over large-scale genomic data combining strengths of different methods and statistical tools. The different steps of this protocol are illustrated on a real-life data application for Alzheimer's disease (AD) (2259 patients and 6017 controls from France). Particularly, in the exhaustive genome-wide epistasis screening we identified AD-associated interacting SNPs-pair from chromosome 6q11.1 (rs6455128, the KHDRBS2 gene) and 13q12.11 (rs7989332, the CRYL1 gene) (p = 0.006, corrected for multiple testing). A replication analysis in the independent AD cohort from Germany (555 patients and 824 controls) confirmed the discovered epistasis signal (p = 0.036). This signal was also supported by a meta-analysis approach in 5 independent AD cohorts that was applied in the context of epistasis for the first time. Transcriptome analysis revealed negative correlation between expression levels of KHDRBS2 and CRYL1 in both the temporal cortex (β = -0.19, p = 0.0006) and cerebellum (β = -0.23, p < 0.0001) brain regions. This is the first time a replicable epistasis associated with AD was identified using a hypothesis free screening approach.
Collapse
Affiliation(s)
- Elena S Gusareva
- Systems and Modeling Unit, Montefiore Institute, University of Liege, Belgium; Bioinformatics and Modeling, GIGA-R, University of Liege, Belgium.
| | | | - Céline Bellenguez
- INSERM U744, Lille, France; Department of Public Health and Molecular Epidemiology of Aging Related Diseases, Institut Pasteur de Lille, Lille, France; Universite de Lille Nord de France, Lille, France
| | - Elise Cuyvers
- Department of Molecular Genetics, VIB, Antwerp, Belgium; Department of Neurology, Institute Born-Bunge, University of Antwerp, Antwerp, Belgium
| | - Samuel Colon
- Department of Neuroscience, Mayo Clinic Florida, Jacksonville, FL, USA
| | | | | | - Dennis W Dickson
- Department of Neuroscience, Mayo Clinic Florida, Jacksonville, FL, USA
| | - Jestinah M Mahachie John
- Systems and Modeling Unit, Montefiore Institute, University of Liege, Belgium; Bioinformatics and Modeling, GIGA-R, University of Liege, Belgium
| | - Kyrylo Bessonov
- Systems and Modeling Unit, Montefiore Institute, University of Liege, Belgium; Bioinformatics and Modeling, GIGA-R, University of Liege, Belgium
| | - Christine Van Broeckhoven
- Department of Molecular Genetics, VIB, Antwerp, Belgium; Department of Neurology, Institute Born-Bunge, University of Antwerp, Antwerp, Belgium
| | - Denise Harold
- Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Institute of Psychological Medicine and Clinical Neurosciences, Cardiff University School of Medicine, Cardiff, UK
| | - Julie Williams
- Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Institute of Psychological Medicine and Clinical Neurosciences, Cardiff University School of Medicine, Cardiff, UK
| | - Philippe Amouyel
- INSERM U744, Lille, France; Department of Public Health and Molecular Epidemiology of Aging Related Diseases, Institut Pasteur de Lille, Lille, France; Universite de Lille Nord de France, Lille, France
| | - Kristel Sleegers
- Department of Molecular Genetics, VIB, Antwerp, Belgium; Department of Neurology, Institute Born-Bunge, University of Antwerp, Antwerp, Belgium
| | - Nilüfer Ertekin-Taner
- Department of Neuroscience, Mayo Clinic Florida, Jacksonville, FL, USA; Department of Neurology, Mayo Clinic Florida, Jacksonville, FL, USA
| | - Jean-Charles Lambert
- INSERM U744, Lille, France; Department of Public Health and Molecular Epidemiology of Aging Related Diseases, Institut Pasteur de Lille, Lille, France; Universite de Lille Nord de France, Lille, France
| | - Kristel Van Steen
- Systems and Modeling Unit, Montefiore Institute, University of Liege, Belgium; Bioinformatics and Modeling, GIGA-R, University of Liege, Belgium
| |
Collapse
|
40
|
Gusareva ES, Van Steen K. Practical aspects of genome-wide association interaction analysis. Hum Genet 2014; 133:1343-58. [DOI: 10.1007/s00439-014-1480-y] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 08/18/2014] [Indexed: 12/31/2022]
|
41
|
Guo X, Meng Y, Yu N, Pan Y. Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering. BMC Bioinformatics 2014; 15:102. [PMID: 24717145 PMCID: PMC4021249 DOI: 10.1186/1471-2105-15-102] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2013] [Accepted: 03/17/2014] [Indexed: 11/25/2022] Open
Abstract
Backgroud Taking the advan tage of high-throughput single nucleotide polymorphism (SNP) genotyping technology, large genome-wide association studies (GWASs) have been considered to hold promise for unravelling complex relationships between genotype and phenotype. At present, traditional single-locus-based methods are insufficient to detect interactions consisting of multiple-locus, which are broadly existing in complex traits. In addition, statistic tests for high order epistatic interactions with more than 2 SNPs propose computational and analytical challenges because the computation increases exponentially as the cardinality of SNPs combinations gets larger. Results In this paper, we provide a simple, fast and powerful method using dynamic clustering and cloud computing to detect genome-wide multi-locus epistatic interactions. We have constructed systematic experiments to compare powers performance against some recently proposed algorithms, including TEAM, SNPRuler, EDCF and BOOST. Furthermore, we have applied our method on two real GWAS datasets, Age-related macular degeneration (AMD) and Rheumatoid arthritis (RA) datasets, where we find some novel potential disease-related genetic factors which are not shown up in detections of 2-loci epistatic interactions. Conclusions Experimental results on simulated data demonstrate that our method is more powerful than some recently proposed methods on both two- and three-locus disease models. Our method has discovered many novel high-order associations that are significantly enriched in cases from two real GWAS datasets. Moreover, the running time of the cloud implementation for our method on AMD dataset and RA dataset are roughly 2 hours and 50 hours on a cluster with forty small virtual machines for detecting two-locus interactions, respectively. Therefore, we believe that our method is suitable and effective for the full-scale analysis of multiple-locus epistatic interactions in GWAS.
Collapse
Affiliation(s)
| | | | | | - Yi Pan
- Department of Computer Science, Georgia State University, 34 Peachtree Street, Atlanta, USA.
| |
Collapse
|
42
|
Liu J, Calhoun VD. A review of multivariate analyses in imaging genetics. Front Neuroinform 2014; 8:29. [PMID: 24723883 PMCID: PMC3972473 DOI: 10.3389/fninf.2014.00029] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Accepted: 03/04/2014] [Indexed: 12/13/2022] Open
Abstract
Recent advances in neuroimaging technology and molecular genetics provide the unique opportunity to investigate genetic influence on the variation of brain attributes. Since the year 2000, when the initial publication on brain imaging and genetics was released, imaging genetics has been a rapidly growing research approach with increasing publications every year. Several reviews have been offered to the research community focusing on various study designs. In addition to study design, analytic tools and their proper implementation are also critical to the success of a study. In this review, we survey recent publications using data from neuroimaging and genetics, focusing on methods capturing multivariate effects accommodating the large number of variables from both imaging data and genetic data. We group the analyses of genetic or genomic data into either a priori driven or data driven approach, including gene-set enrichment analysis, multifactor dimensionality reduction, principal component analysis, independent component analysis (ICA), and clustering. For the analyses of imaging data, ICA and extensions of ICA are the most widely used multivariate methods. Given detailed reviews of multivariate analyses of imaging data available elsewhere, we provide a brief summary here that includes a recently proposed method known as independent vector analysis. Finally, we review methods focused on bridging the imaging and genetic data by establishing multivariate and multiple genotype-phenotype-associations, including sparse partial least squares, sparse canonical correlation analysis, sparse reduced rank regression and parallel ICA. These methods are designed to extract latent variables from both genetic and imaging data, which become new genotypes and phenotypes, and the links between the new genotype-phenotype pairs are maximized using different cost functions. The relationship between these methods along with their assumptions, advantages, and limitations are discussed.
Collapse
Affiliation(s)
- Jingyu Liu
- The Mind Research Network and Lovelace Biomedical and Environmental Research InstituteAlbuquerque, NM, USA
- Department of Electrical and Computer Engineering, University of New MexicoAlbuquerque, NM, USA
| | - Vince D. Calhoun
- The Mind Research Network and Lovelace Biomedical and Environmental Research InstituteAlbuquerque, NM, USA
- Department of Electrical and Computer Engineering, University of New MexicoAlbuquerque, NM, USA
| |
Collapse
|
43
|
Hoefkens E, Nys K, John JM, Van Steen K, Arijs I, Van der Goten J, Van Assche G, Agostinis P, Rutgeerts P, Vermeire S, Cleynen I. Genetic association and functional role of Crohn disease risk alleles involved in microbial sensing, autophagy, and endoplasmic reticulum (ER) stress. Autophagy 2013; 9:2046-55. [PMID: 24247223 DOI: 10.4161/auto.26337] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Genome-wide association studies have identified several genes implicated in autophagy (ATG16L1, IRGM, ULK1, LRRK2, and MTMR3), intracellular bacterial sensing (NOD2), and endoplasmic reticulum (ER) stress (XBP1 and ORMDL3) to be associated with Crohn disease (CD). We studied the known CD-associated variants in these genes in a large cohort of 3451 individuals (1744 CD patients, 793 ulcerative colitis (UC) patients and 914 healthy controls). We also investigated the functional phenotype linked to these genetic variants. Association with CD was confirmed for NOD2, ATG16L1, IRGM, MTMR3, and ORMDL3. The risk for developing CD increased with an increasing number of risk alleles for these genes (P<0.001, OR 1.26 [1.20 to 1.32]). Three times as many (34.8%) CD patients carried a risk allele in all three pathways, in contrast to 13.3% of the controls (P<0.0001, OR = 3.46 [2.77 to 4.32]). For UC, no significant association for one single nucleotide polymorphism (SNP) was found, but the risk for development of UC increased with an increasing total number of risk alleles (P = 0.001, OR = 1.10 [1.04 to 1.17]). We found a genetic interaction between reference SNP (rs)2241880 (ATG16L1) and rs10065172 (IRGM) in CD. Functional experiments hinted toward an association between an increased genetic risk and an augmented inflammatory status, highlighting the relevance of the genetic findings.
Collapse
Affiliation(s)
- Eveline Hoefkens
- Department of Clinical and Experimental Medicine; Translational Research Center for Gastrointestinal Disorders (TARGID); KU Leuven; Leuven, Belgium
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Mahachie John JM, Van Lishout F, Gusareva ES, Van Steen K. A robustness study of parametric and non-parametric tests in model-based multifactor dimensionality reduction for epistasis detection. BioData Min 2013; 6:9. [PMID: 23618370 PMCID: PMC3668290 DOI: 10.1186/1756-0381-6-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Accepted: 04/20/2013] [Indexed: 11/10/2022] Open
Abstract
Background Applying a statistical method implies identifying underlying (model) assumptions and checking their validity in the particular context. One of these contexts is association modeling for epistasis detection. Here, depending on the technique used, violation of model assumptions may result in increased type I error, power loss, or biased parameter estimates. Remedial measures for violated underlying conditions or assumptions include data transformation or selecting a more relaxed modeling or testing strategy. Model-Based Multifactor Dimensionality Reduction (MB-MDR) for epistasis detection relies on association testing between a trait and a factor consisting of multilocus genotype information. For quantitative traits, the framework is essentially Analysis of Variance (ANOVA) that decomposes the variability in the trait amongst the different factors. In this study, we assess through simulations, the cumulative effect of deviations from normality and homoscedasticity on the overall performance of quantitative Model-Based Multifactor Dimensionality Reduction (MB-MDR) to detect 2-locus epistasis signals in the absence of main effects. Methodology Our simulation study focuses on pure epistasis models with varying degrees of genetic influence on a quantitative trait. Conditional on a multilocus genotype, we consider quantitative trait distributions that are normal, chi-square or Student’s t with constant or non-constant phenotypic variances. All data are analyzed with MB-MDR using the built-in Student’s t-test for association, as well as a novel MB-MDR implementation based on Welch’s t-test. Traits are either left untransformed or are transformed into new traits via logarithmic, standardization or rank-based transformations, prior to MB-MDR modeling. Results Our simulation results show that MB-MDR controls type I error and false positive rates irrespective of the association test considered. Empirically-based MB-MDR power estimates for MB-MDR with Welch’s t-tests are generally lower than those for MB-MDR with Student’s t-tests. Trait transformations involving ranks tend to lead to increased power compared to the other considered data transformations. Conclusions When performing MB-MDR screening for gene-gene interactions with quantitative traits, we recommend to first rank-transform traits to normality and then to apply MB-MDR modeling with Student’s t-tests as internal tests for association.
Collapse
|
45
|
Van Lishout F, Mahachie John JM, Gusareva ES, Urrea V, Cleynen I, Théâtre E, Charloteaux B, Calle ML, Wehenkel L, Van Steen K. An efficient algorithm to perform multiple testing in epistasis screening. BMC Bioinformatics 2013; 14:138. [PMID: 23617239 PMCID: PMC3648350 DOI: 10.1186/1471-2105-14-138] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2012] [Accepted: 04/12/2013] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Research in epistasis or gene-gene interaction detection for human complex traits has grown over the last few years. It has been marked by promising methodological developments, improved translation efforts of statistical epistasis to biological epistasis and attempts to integrate different omics information sources into the epistasis screening to enhance power. The quest for gene-gene interactions poses severe multiple-testing problems. In this context, the maxT algorithm is one technique to control the false-positive rate. However, the memory needed by this algorithm rises linearly with the amount of hypothesis tests. Gene-gene interaction studies will require a memory proportional to the squared number of SNPs. A genome-wide epistasis search would therefore require terabytes of memory. Hence, cache problems are likely to occur, increasing the computation time. In this work we present a new version of maxT, requiring an amount of memory independent from the number of genetic effects to be investigated. This algorithm was implemented in C++ in our epistasis screening software MBMDR-3.0.3. We evaluate the new implementation in terms of memory efficiency and speed using simulated data. The software is illustrated on real-life data for Crohn's disease. RESULTS In the case of a binary (affected/unaffected) trait, the parallel workflow of MBMDR-3.0.3 analyzes all gene-gene interactions with a dataset of 100,000 SNPs typed on 1000 individuals within 4 days and 9 hours, using 999 permutations of the trait to assess statistical significance, on a cluster composed of 10 blades, containing each four Quad-Core AMD Opteron(tm) Processor 2352 2.1 GHz. In the case of a continuous trait, a similar run takes 9 days. Our program found 14 SNP-SNP interactions with a multiple-testing corrected p-value of less than 0.05 on real-life Crohn's disease (CD) data. CONCLUSIONS Our software is the first implementation of the MB-MDR methodology able to solve large-scale SNP-SNP interactions problems within a few days, without using much memory, while adequately controlling the type I error rates. A new implementation to reach genome-wide epistasis screening is under construction. In the context of Crohn's disease, MBMDR-3.0.3 could identify epistasis involving regions that are well known in the field and could be explained from a biological point of view. This demonstrates the power of our software to find relevant phenotype-genotype higher-order associations.
Collapse
Affiliation(s)
- François Van Lishout
- Systems and Modeling Unit, Montefiore Institute, University of Liège, 4000 Liège, Belgium.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
46
|
Abstract
Genome-wide association studies (GWASs) and other high-throughput initiatives have led to an information explosion in human genetics and genetic epidemiology. Conversion of this wealth of new information about genomic variation to knowledge about public health and human biology will depend critically on the complexity of the genotype to phenotype mapping relationship. We review here computational approaches to genetic analysis that embrace, rather than ignore, the complexity of human health. We focus on multifactor dimensionality reduction (MDR) as an approach for modeling one of these complexities: epistasis or gene-gene interaction.
Collapse
Affiliation(s)
- Qinxin Pan
- Computational Genetics Laboratory, Dartmouth Medical School, Dartmouth College, Lebanon, NH, USA
| | | | | |
Collapse
|
47
|
Applications of multifactor dimensionality reduction to genome-wide data using the R package 'MDR'. Methods Mol Biol 2013; 1019:479-98. [PMID: 23756907 DOI: 10.1007/978-1-62703-447-0_23] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
This chapter describes how to use the R package 'MDR' to search and identify gene-gene interactions in high-dimensional data and illustrates applications for exploratory analysis of multi-locus models by providing specific examples.
Collapse
|
48
|
Upstill-Goddard R, Eccles D, Fliege J, Collins A. Machine learning approaches for the discovery of gene-gene interactions in disease data. Brief Bioinform 2012; 14:251-60. [DOI: 10.1093/bib/bbs024] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
|
49
|
Kazma R, Bailey JN. Population-based and family-based designs to analyze rare variants in complex diseases. Genet Epidemiol 2012; 35 Suppl 1:S41-7. [PMID: 22128057 DOI: 10.1002/gepi.20648] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Genotyping of rare variants on a large scale is now possible using next-generation sequencing. Sample selection is a crucial step in designing the genetic study of a complex disease, and knowledge of the efficiency and limitations of population-based and family-based designs can help researchers make the appropriate choice. The nine contributions to Group 5 of Genetic Analysis Workshop 17 evaluate population-based and family-based designs by comparing the results obtained with various methods applied to the mini-exome simulations. These simulations consisted of 200 replicates composed of unrelated individuals and eight extended pedigrees with genotypes and various phenotypes. The methods tested for association with a population-based and/or a family-based design, tested for linkage with a family-based design, or estimated heritability. We summarize the strengths and weaknesses of both designs. Although population-based designs seem more suitable for detecting the effect of multiple rare variants, family-based designs can potentially enrich the sample in rare variants, for which the effect would be concealed at the population level. However, as of today, the main limitation is still the high cost of next-generation sequencing.
Collapse
Affiliation(s)
- Rémi Kazma
- Department of Epidemiology and Biostatistics and Institute for Human Genetics, University of California, San Francisco, CA 94143-3110, USA.
| | | |
Collapse
|
50
|
Rodin AS, Gogoshin G, Boerwinkle E. Systems biology data analysis methodology in pharmacogenomics. Pharmacogenomics 2012; 12:1349-60. [PMID: 21919609 DOI: 10.2217/pgs.11.76] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Pharmacogenetics aims to elucidate the genetic factors underlying the individual's response to pharmacotherapy. Coupled with the recent (and ongoing) progress in high-throughput genotyping, sequencing and other genomic technologies, pharmacogenetics is rapidly transforming into pharmacogenomics, while pursuing the primary goals of identifying and studying the genetic contribution to drug therapy response and adverse effects, and existing drug characterization and new drug discovery. Accomplishment of both of these goals hinges on gaining a better understanding of the underlying biological systems; however, reverse-engineering biological system models from the massive datasets generated by the large-scale genetic epidemiology studies presents a formidable data analysis challenge. In this article, we review the recent progress made in developing such data analysis methodology within the paradigm of systems biology research that broadly aims to gain a 'holistic', or 'mechanistic' understanding of biological systems by attempting to capture the entirety of interactions between the components (genetic and otherwise) of the system.
Collapse
Affiliation(s)
- Andrei S Rodin
- Human Genetics Center, School of Public Health, University of Texas Health Science Center, Houston, TX 77030, USA.
| | | | | |
Collapse
|