1
|
Veyssiere M, Rodriguez Ordonez MDP, Chalabi S, Michou L, Cornelis F, Boland A, Olaso R, Deleuze JF, Petit-Teixeira E, Chaudru V. MYLK* FLNB and DOCK1* LAMA2 gene-gene interactions associated with rheumatoid arthritis in the focal adhesion pathway. Front Genet 2024; 15:1375036. [PMID: 38803542 PMCID: PMC11128622 DOI: 10.3389/fgene.2024.1375036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 04/18/2024] [Indexed: 05/29/2024] Open
Abstract
Rheumatoid arthritis (RA) is a chronic, systemic autoimmune disease caused by a combination of genetic and environmental factors. Rare variants with low predicted effects in genes participating in the same biological function might be involved in developing complex diseases such as RA. From whole-exome sequencing (WES) data, we identified genes containing rare non-neutral variants with complete penetrance and no phenocopy in at least one of nine French multiplex families. Further enrichment analysis highlighted focal adhesion as the most significant pathway. We then tested if interactions between the genes participating in this function would increase or decrease the risk of developing RA disease. The model-based multifactor dimensionality reduction (MB-MDR) approach was used to detect epistasis in a discovery sample (19 RA cases and 11 healthy individuals from 9 families and 98 unrelated CEU controls from the International Genome Sample Resource). We identified 9 significant interactions involving 11 genes (MYLK, FLNB, DOCK1, LAMA2, RELN, PIP5K1C, TNC, PRKCA, VEGFB, ITGB5, and FLT1). One interaction (MYLK*FLNB) increasing RA risk and one interaction decreasing RA risk (DOCK1*LAMA2) were confirmed in a replication sample (200 unrelated RA cases and 91 GBR unrelated controls). Functional and genomic data in RA samples or relevant cell types argue the key role of these genes in RA.
Collapse
Affiliation(s)
- Maëva Veyssiere
- Institut National de la Santé et de la Recherche Médicale, Université de Paris, Paris, France
| | | | - Smahane Chalabi
- GenHotel—Univ Evry, University of Paris Saclay, Evry, France
| | - Laetitia Michou
- Division of Rheumatology, Department of Medicine, CHU de Québec-Université Laval, Québec City, QC, Canada
| | - François Cornelis
- Génétiqe-Oncogénétique Adulte-Prévention, Institut National de la Santé et de la Recherche Médicale, Clermont-Auvergne University and CHU, Clermont-Ferrand, France
| | - Anne Boland
- Commissariat à l'Energie Atomique, Centre National de Recherche en Génomique Humaine (CNRGH), Université Paris-Saclay, Evry, France
| | - Robert Olaso
- Commissariat à l'Energie Atomique, Centre National de Recherche en Génomique Humaine (CNRGH), Université Paris-Saclay, Evry, France
| | - Jean-François Deleuze
- Commissariat à l'Energie Atomique, Centre National de Recherche en Génomique Humaine (CNRGH), Université Paris-Saclay, Evry, France
| | | | - Valérie Chaudru
- Institut National de la Santé et de la Recherche Médicale, Université de Paris, Paris, France
- GenHotel—Univ Evry, University of Paris Saclay, Evry, France
| |
Collapse
|
2
|
Abegaz F, Van Lishout F, Mahachie John JM, Chiachoompu K, Bhardwaj A, Duroux D, Gusareva ES, Wei Z, Hakonarson H, Van Steen K. Performance of model-based multifactor dimensionality reduction methods for epistasis detection by controlling population structure. BioData Min 2021; 14:16. [PMID: 33608043 PMCID: PMC7893746 DOI: 10.1186/s13040-021-00247-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Accepted: 02/07/2021] [Indexed: 12/15/2022] Open
Abstract
Background In genome-wide association studies the extent and impact of confounding due to population structure have been well recognized. Inadequate handling of such confounding is likely to lead to spurious associations, hampering replication, and the identification of causal variants. Several strategies have been developed for protecting associations against confounding, the most popular one is based on Principal Component Analysis. In contrast, the extent and impact of confounding due to population structure in gene-gene interaction association epistasis studies are much less investigated and understood. In particular, the role of nonlinear genetic population substructure in epistasis detection is largely under-investigated, especially outside a regression framework. Methods To identify causal variants in synergy, to improve interpretability and replicability of epistasis results, we introduce three strategies based on a model-based multifactor dimensionality reduction approach for structured populations, namely MBMDR-PC, MBMDR-PG, and MBMDR-GC. Results Simulation results comparing the performance of various approaches show that in the presence of population structure MBMDR-PC and MBMDR-PG consistently better control type I error rate at the nominal level than MBMDR-GC. Moreover, our proposed three methods of population structure correction outperform MDR-SP in terms of statistical power. Conclusion We demonstrate through extensive simulation studies the effect of various degrees of genetic population structure and relatedness on epistasis detection and propose appropriate remedial measures based on linear and nonlinear sample genetic similarity. Supplementary Information The online version contains supplementary material available at 10.1186/s13040-021-00247-w.
Collapse
Affiliation(s)
- Fentaw Abegaz
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium.
| | | | | | | | - Archana Bhardwaj
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium
| | - Diane Duroux
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium
| | - Elena S Gusareva
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium
| | - Zhi Wei
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
| | - Hakon Hakonarson
- Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Pediatrics, Division of Human Genetics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Kristel Van Steen
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium.,WELBIO (Walloon Excellence in Lifesciences and Biotechnology), University of Liège, Liège, Belgium
| |
Collapse
|
3
|
Gola D, König IR. Empowering individual trait prediction using interactions for precision medicine. BMC Bioinformatics 2021; 22:74. [PMID: 33602124 PMCID: PMC7890638 DOI: 10.1186/s12859-021-04011-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Accepted: 02/08/2021] [Indexed: 11/11/2022] Open
Abstract
Background One component of precision medicine is to construct prediction models with their predicitve ability as high as possible, e.g. to enable individual risk prediction. In genetic epidemiology, complex diseases like coronary artery disease, rheumatoid arthritis, and type 2 diabetes, have a polygenic basis and a common assumption is that biological and genetic features affect the outcome under consideration via interactions. In the case of omics data, the use of standard approaches such as generalized linear models may be suboptimal and machine learning methods are appealing to make individual predictions. However, most of these algorithms focus mostly on main or marginal effects of the single features in a dataset. On the other hand, the detection of interacting features is an active area of research in the realm of genetic epidemiology. One big class of algorithms to detect interacting features is based on the multifactor dimensionality reduction (MDR). Here, we further develop the model-based MDR (MB-MDR), a powerful extension of the original MDR algorithm, to enable interaction empowered individual prediction. Results Using a comprehensive simulation study we show that our new algorithm (median AUC: 0.66) can use information hidden in interactions and outperforms two other state-of-the-art algorithms, namely the Random Forest (median AUC: 0.54) and Elastic Net (median AUC: 0.50), if interactions are present in a scenario of two pairs of two features having small effects. The performance of these algorithms is comparable if no interactions are present. Further, we show that our new algorithm is applicable to real data by comparing the performance of the three algorithms on a dataset of rheumatoid arthritis cases and healthy controls. As our new algorithm is not only applicable to biological/genetic data but to all datasets with discrete features, it may have practical implications in other research fields where interactions between features have to be considered as well, and we made our method available as an R package (https://github.com/imbs-hl/MBMDRClassifieR). Conclusions The explicit use of interactions between features can improve the prediction performance and thus should be included in further attempts to move precision medicine forward.
Collapse
Affiliation(s)
- Damian Gola
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Lübeck, Germany
| | - Inke R König
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Lübeck, Germany.
| |
Collapse
|
4
|
Riahi P, Kazemnejad A, Mostafaei S, Meguro A, Mizuki N, Ashraf-Ganjouei A, Javinani A, Faezi ST, Shahram F, Mahmoudi M. ERAP1 polymorphisms interactions and their association with Behçet's disease susceptibly: Application of Model-Based Multifactor Dimension Reduction Algorithm (MB-MDR). PLoS One 2020; 15:e0227997. [PMID: 32023277 PMCID: PMC7001967 DOI: 10.1371/journal.pone.0227997] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Accepted: 01/03/2020] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Behçet's disease (BD) is a chronic multi-systemic vasculitis with a considerable prevalence in Asian countries. There are many genes associated with a higher risk of developing BD, one of which is endoplasmic reticulum aminopeptidase-1 (ERAP1). In this study, we aimed to investigate the interactions of ERAP1 single nucleotide polymorphisms (SNPs) using a novel data mining method called Model-based multifactor dimensionality reduction (MB-MDR). METHODS We have included 748 BD patients and 776 healthy controls. A peripheral blood sample was collected, and eleven SNPs were assessed. Furthermore, we have applied the MB-MDR method to evaluate the interactions of ERAP1 gene polymorphisms. RESULTS The TT genotype of rs1065407 had a synergistic effect on BD susceptibility, considering the significant main effect. In the second order of interactions, CC genotype of rs2287987 and GG genotype of rs1065407 had the most prominent synergistic effect (β = 12.74). The mentioned genotypes also had significant interactions with CC genotype of rs26653 and TT genotype of rs30187 in the third-order (β = 12.74 and β = 12.73, respectively). CONCLUSION To the best of our knowledge, this is the first study investigating the interaction of a particular gene's SNPs in BD patients by applying a novel data mining method. However, future studies investigating the interactions of various genes could clarify this issue.
Collapse
Affiliation(s)
- Parisa Riahi
- Department of Biostatistics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| | - Anoshirvan Kazemnejad
- Department of Biostatistics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
- * E-mail: (MM); (AK)
| | - Shayan Mostafaei
- Medical Biology Research Center, Health Technology Institute, Kermanshah University of Medical Sciences, Kermanshah, Iran
| | - Akira Meguro
- Department of Ophthalmology and Visual Science, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Nobuhisa Mizuki
- Department of Ophthalmology and Visual Science, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Amir Ashraf-Ganjouei
- Rheumatology Research Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Ali Javinani
- Rheumatology Research Center, Tehran University of Medical Sciences, Tehran, Iran
| | | | - Farhad Shahram
- Rheumatology Research Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Mahdi Mahmoudi
- Rheumatology Research Center, Tehran University of Medical Sciences, Tehran, Iran
- Inflammation Research Center, Tehran University of Medical Sciences, Tehran, Iran
- * E-mail: (MM); (AK)
| |
Collapse
|
5
|
Abegaz F, Van Lishout F, Mahachie John JM, Chiachoompu K, Bhardwaj A, Gusareva ES, Wei Z, Hakonarson H, Van Steen K. Epistasis Detection in Genome-Wide Screening for Complex Human Diseases in Structured Populations. SYSTEMS MEDICINE 2019. [DOI: 10.1089/sysm.2019.0003] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Affiliation(s)
- Fentaw Abegaz
- GIGA-R, Medical Genomics—BIO3, University of Liege, Liege, Belgium
| | | | | | | | - Archana Bhardwaj
- GIGA-R, Medical Genomics—BIO3, University of Liege, Liege, Belgium
| | | | - Zhi Wei
- Department of Computer Science, New Jersey Institute of Technology, Newark, New Jersey
| | - Hakon Hakonarson
- Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania
- Division of Human Genetics, Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Kristel Van Steen
- GIGA-R, Medical Genomics—BIO3, University of Liege, Liege, Belgium
- WELBIO (Walloon Excellence in Lifesciences and Biotechnology), University of Liege, Liege, Belgium
| | | |
Collapse
|
6
|
Sandri TL, Andrade FA, Lidani KCF, Einig E, Boldt ABW, Mordmüller B, Esen M, Messias-Reason IJ. Human collectin-11 (COLEC11) and its synergic genetic interaction with MASP2 are associated with the pathophysiology of Chagas Disease. PLoS Negl Trop Dis 2019; 13:e0007324. [PMID: 30995222 PMCID: PMC6488100 DOI: 10.1371/journal.pntd.0007324] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Revised: 04/29/2019] [Accepted: 03/22/2019] [Indexed: 12/27/2022] Open
Abstract
Chagas Disease (CD) is an anthropozoonosis caused by Trypanosoma cruzi. With complex pathophysiology and variable clinical presentation, CD outcome can be influenced by parasite persistence and the host immune response. Complement activation is one of the primary defense mechanisms against pathogens, which can be initiated via pathogen recognition by pattern recognition molecules (PRMs). Collectin-11 is a multifunctional soluble PRM lectin, widely distributed throughout the body, with important participation in host defense, homeostasis, and embryogenesis. In complex with mannose-binding lectin-associated serine proteases (MASPs), collectin-11 may initiate the activation of complement, playing a role against pathogens, including T. cruzi. In this study, collectin-11 plasma levels and COLEC11 variants in exon 7 were assessed in a Brazilian cohort of 251 patients with chronic CD and 108 healthy controls. Gene-gene interactions between COLEC11 and MASP2 variants were analyzed. Collectin-11 levels were significantly decreased in CD patients compared to controls (p<0.0001). The allele rs7567833G, the genotypes rs7567833AG and rs7567833GG, and the COLEC11*GGC haplotype were related to T. cruzi infection and clinical progression towards symptomatic CD. COLEC11 and MASP2*CD risk genotypes were associated with cardiomyopathy (p = 0.014; OR 9.3, 95% CI 1.2–74) and with the cardiodigestive form of CD (p = 0.005; OR 15.2, 95% CI 1.7–137), suggesting that both loci act synergistically in immune modulation of the disease. The decreased levels of collectin-11 in CD patients may be associated with the disease process. The COLEC11 variant rs7567833G and also the COLEC11 and MASP2*CD risk genotype interaction were associated with the pathophysiology of CD. The heterogeneity of clinical progression during chronic Trypanosoma cruzi infection and the mechanisms determining why some individuals develop symptoms whereas others remain asymptomatic are still poorly understood. The pathogenesis of chronic Chagas Disease (CD) has been attributed mainly to the persistence of the causing parasite and the character of individual host immune responses. Collectin-11 is a host immune response molecule with affinity for sugars found on the T. cruzi’s surface. Together with mannose-binding lectin-associated serine proteases (MASPs), it triggers the host defense response against pathogens. Genetic variants and protein levels of MASP-2 and the mannose-binding lectin (MBL), a molecule structurally similar to collectin-11, have been found to be associated with susceptibility to T. cruzi infection and clinical progression to cardiomyopathy. This prompted us to investigate collectin-11 genetic variants and protein levels in 251 patients with chronic CD and 108 healthy individuals, and to examine the effect of gene interaction between COLEC11 and MASP2 risk mutations. We found an association to CD infection with COLEC11 gene variants and reduced collectin-11 levels. The concomitant presence of these genetic variants and MASP2 risk mutations greatly increased the odds for cardiomyopathy. This is the first study to reveal a role for collectin-11 and COLEC11-MASP2 gene interaction in the pathogenesis of CD.
Collapse
Affiliation(s)
- Thaisa Lucas Sandri
- Institute of Tropical Medicine, University of Tübingen, Tübingen, Germany
- Laboratory of Molecular Immunopathology, Department of Clinical Pathology, Federal University of Paraná, Curitiba, Brazil
- * E-mail:
| | - Fabiana Antunes Andrade
- Laboratory of Molecular Immunopathology, Department of Clinical Pathology, Federal University of Paraná, Curitiba, Brazil
| | - Kárita Cláudia Freitas Lidani
- Laboratory of Molecular Immunopathology, Department of Clinical Pathology, Federal University of Paraná, Curitiba, Brazil
| | - Elias Einig
- Institute of Tropical Medicine, University of Tübingen, Tübingen, Germany
| | - Angelica Beate Winter Boldt
- Laboratory of Molecular Immunopathology, Department of Clinical Pathology, Federal University of Paraná, Curitiba, Brazil
- Laboratory of Human Molecular Genetics, Department of Genetics, Federal University of Paraná, Curitiba, Brazil
| | | | - Meral Esen
- Institute of Tropical Medicine, University of Tübingen, Tübingen, Germany
| | - Iara J. Messias-Reason
- Laboratory of Molecular Immunopathology, Department of Clinical Pathology, Federal University of Paraná, Curitiba, Brazil
| |
Collapse
|
7
|
Van Steen K, Moore JH. How to increase our belief in discovered statistical interactions via large-scale association studies? Hum Genet 2019; 138:293-305. [PMID: 30840129 PMCID: PMC6483943 DOI: 10.1007/s00439-019-01987-w] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 02/20/2019] [Indexed: 12/31/2022]
Abstract
The understanding that differences in biological epistasis may impact disease risk, diagnosis, or disease management stands in wide contrast to the unavailability of widely accepted large-scale epistasis analysis protocols. Several choices in the analysis workflow will impact false-positive and false-negative rates. One of these choices relates to the exploitation of particular modelling or testing strategies. The strengths and limitations of these need to be well understood, as well as the contexts in which these hold. This will contribute to determining the potentially complementary value of epistasis detection workflows and is expected to increase replication success with biological relevance. In this contribution, we take a recently introduced regression-based epistasis detection tool as a leading example to review the key elements that need to be considered to fully appreciate the value of analytical epistasis detection performance assessments. We point out unresolved hurdles and give our perspectives towards overcoming these.
Collapse
Affiliation(s)
- K Van Steen
- WELBIO, GIGA-R Medical Genomics-BIO3, University of Liège, Liege, Belgium.
- Department of Human Genetics, University of Leuven, Leuven, Belgium.
| | - J H Moore
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, USA
| |
Collapse
|
8
|
Sharma V, Nandan A, Sharma AK, Singh H, Bharadwaj M, Sinha DN, Mehrotra R. Signature of genetic associations in oral cancer. Tumour Biol 2017; 39:1010428317725923. [DOI: 10.1177/1010428317725923] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Affiliation(s)
- Vishwas Sharma
- Department of Health Research, National Institute of Cancer Prevention and Research (NICPR), Noida, India
| | - Amrita Nandan
- Society for Life Science and Human Health, Allahabad, India
| | - Amitesh Kumar Sharma
- Data Management Laboratory, National Institute of Cancer Prevention and Research (NICPR), Noida, India
- Department of Bioinformatics, Indian Council of Medical Research, New Delhi, India
| | - Harpreet Singh
- Data Management Laboratory, National Institute of Cancer Prevention and Research (NICPR), Noida, India
- Department of Bioinformatics, Indian Council of Medical Research, New Delhi, India
| | - Mausumi Bharadwaj
- Department of Health Research, National Institute of Cancer Prevention and Research (NICPR), Noida, India
- Division of Molecular Genetics & Biochemistry
| | - Dhirendra Narain Sinha
- WHO FCTC Global Knowledge Hub on Smokeless Tobacco, National Institute of Cancer Prevention and Research (NICPR), Noida, India
| | - Ravi Mehrotra
- Department of Health Research, National Institute of Cancer Prevention and Research (NICPR), Noida, India
- Data Management Laboratory, National Institute of Cancer Prevention and Research (NICPR), Noida, India
| |
Collapse
|
9
|
Abo Alchamlat S, Farnir F. KNN-MDR: a learning approach for improving interactions mapping performances in genome wide association studies. BMC Bioinformatics 2017; 18:184. [PMID: 28327091 PMCID: PMC5361736 DOI: 10.1186/s12859-017-1599-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Accepted: 03/11/2017] [Indexed: 12/30/2022] Open
Abstract
Background Finding epistatic interactions in large association studies like genome-wide association studies (GWAS) with the nowadays-available large volume of genomic data is a challenging and largely unsolved issue. Few previous studies could handle genome-wide data due to the intractable difficulties met in searching a combinatorial explosive search space and statistically evaluating epistatic interactions given a limited number of samples. Our work is a contribution to this field. We propose a novel approach combining K-Nearest Neighbors (KNN) and Multi Dimensional Reduction (MDR) methods for detecting gene-gene interactions as a possible alternative to existing algorithms, e especially in situations where the number of involved determinants is high. After describing the approach, a comparison of our method (KNN-MDR) to a set of the other most performing methods (i.e., MDR, BOOST, BHIT, MegaSNPHunter and AntEpiSeeker) is carried on to detect interactions using simulated data as well as real genome-wide data. Results Experimental results on both simulated data and real genome-wide data show that KNN-MDR has interesting properties in terms of accuracy and power, and that, in many cases, it significantly outperforms its recent competitors. Conclusions The presented methodology (KNN-MDR) is valuable in the context of loci and interactions mapping and can be seen as an interesting addition to the arsenal used in complex traits analyses. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1599-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sinan Abo Alchamlat
- Department of Biostatistics, Faculty of Veterinary Medicine, FARAH, University of Liège, Sart Tilman B43, 4000, Liege, Belgium
| | - Frédéric Farnir
- Department of Biostatistics, Faculty of Veterinary Medicine, FARAH, University of Liège, Sart Tilman B43, 4000, Liege, Belgium.
| |
Collapse
|
10
|
Zhang F, Xie D, Liang M, Xiong M. Functional Regression Models for Epistasis Analysis of Multiple Quantitative Traits. PLoS Genet 2016; 12:e1005965. [PMID: 27104857 PMCID: PMC4841563 DOI: 10.1371/journal.pgen.1005965] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Accepted: 03/08/2016] [Indexed: 12/02/2022] Open
Abstract
To date, most genetic analyses of phenotypes have focused on analyzing single traits or analyzing each phenotype independently. However, joint epistasis analysis of multiple complementary traits will increase statistical power and improve our understanding of the complicated genetic structure of the complex diseases. Despite their importance in uncovering the genetic structure of complex traits, the statistical methods for identifying epistasis in multiple phenotypes remains fundamentally unexplored. To fill this gap, we formulate a test for interaction between two genes in multiple quantitative trait analysis as a multiple functional regression (MFRG) in which the genotype functions (genetic variant profiles) are defined as a function of the genomic position of the genetic variants. We use large-scale simulations to calculate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare the power with multivariate pairwise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for epistasis analysis is applied to five phenotypes of exome sequence data from the NHLBI’s Exome Sequencing Project (ESP) to detect pleiotropic epistasis. A total of 267 pairs of genes that formed a genetic interaction network showed significant evidence of epistasis influencing five traits. The results demonstrate that the joint interaction analysis of multiple phenotypes has a much higher power to detect interaction than the interaction analysis of a single trait and may open a new direction to fully uncovering the genetic structure of multiple phenotypes. The widely used statistical methods test interaction for single phenotype. However, we often observe pleotropic genetic interaction effects. The simultaneous gene-gene (GxG) interaction analysis of multiple complementary traits will increase statistical power to detect GxG interactions. Although GxG interactions play an important role in uncovering the genetic structure of complex traits, the statistical methods for detecting GxG interactions in multiple phenotypes remains less developed owing to its potential complexity. Therefore, we extend functional regression model from single variate to multivariate for simultaneous GxG interaction analysis of multiple correlated phenotypes. Large-scale simulations are conducted to evaluate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare power with traditional multivariate pair-wise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for interaction analysis is applied to five phenotypes of exome sequence data from the NHLBI’s Exome Sequencing Project (ESP) to detect pleiotropic GxG interactions. 267 pairs of genes that formed a genetic interaction network showed significant evidence of interactions influencing five traits.
Collapse
Affiliation(s)
- Futao Zhang
- Department of Computer Science, College of Internet of Things, Hohai University, Changzhou, China
| | - Dan Xie
- College of Information Engineering, Hubei University of Chinese Medicine, Hubei, China
| | - Meimei Liang
- Institute of Bioinformatics, Zhejiang University, Hangzhou, Zhejiang, China
| | - Momiao Xiong
- Human Genetics Center, Division of Biostatistics, The University of Texas School of Public Health, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
11
|
Lishout FV, Gadaleta F, Moore JH, Wehenkel L, Steen KV. gammaMAXT: a fast multiple-testing correction algorithm. BioData Min 2015; 8:36. [PMID: 26594243 PMCID: PMC4654922 DOI: 10.1186/s13040-015-0069-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2015] [Accepted: 11/08/2015] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND The purpose of the MaxT algorithm is to provide a significance test algorithm that controls the family-wise error rate (FWER) during simultaneous hypothesis testing. However, the requirements in terms of computing time and memory of this procedure are proportional to the number of investigated hypotheses. The memory issue has been solved in 2013 by Van Lishout's implementation of MaxT, which makes the memory usage independent from the size of the dataset. This algorithm is implemented in MBMDR-3.0.3, a software that is able to identify genetic interactions, for a variety of SNP-SNP based epistasis models effectively. On the other hand, that implementation turned out to be less suitable for genome-wide interaction analysis studies, due to the prohibitive computational burden. RESULTS In this work we introduce gammaMAXT, a novel implementation of the maxT algorithm for multiple testing correction. The algorithm was implemented in software MBMDR-4.2.2, as part of the MB-MDR framework to screen for SNP-SNP, SNP-environment or SNP-SNP-environment interactions at a genome-wide level. We show that, in the absence of interaction effects, test-statistics produced by the MB-MDR methodology follow a mixture distribution with a point mass at zero and a shifted gamma distribution for the top 10 % of the strictly positive values. We show that the gammaMAXT algorithm has a power comparable to MaxT and maintains FWER, but requires less computational resources and time. We analyze a dataset composed of 10(6) SNPs and 1000 individuals within one day on a 256-core computer cluster. The same analysis would take about 10(4) times longer with MBMDR-3.0.3. CONCLUSIONS These results are promising for future GWAIs. However, the proposed gammaMAXT algorithm offers a general significance assessment and multiple testing approach, applicable to any context that requires performing hundreds of thousands of tests. It offers new perspectives for fast and efficient permutation-based significance assessment in large-scale (integrated) omics studies.
Collapse
Affiliation(s)
- François Van Lishout
- Systems and Modeling Unit, Montefiore Institute, University of Liège, Allée de la découverte 10, Liège, 4000 Belgium ; Bioinformatics and Modeling, GIGA-R, Avenue de l'Hôpital 1, Sart-Tilman, 4000 Belgium
| | - Francesco Gadaleta
- Systems and Modeling Unit, Montefiore Institute, University of Liège, Allée de la découverte 10, Liège, 4000 Belgium ; Bioinformatics and Modeling, GIGA-R, Avenue de l'Hôpital 1, Sart-Tilman, 4000 Belgium
| | - Jason H Moore
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, 19104-6021 PA USA
| | - Louis Wehenkel
- Systems and Modeling Unit, Montefiore Institute, University of Liège, Allée de la découverte 10, Liège, 4000 Belgium ; Bioinformatics and Modeling, GIGA-R, Avenue de l'Hôpital 1, Sart-Tilman, 4000 Belgium
| | - Kristel Van Steen
- Systems and Modeling Unit, Montefiore Institute, University of Liège, Allée de la découverte 10, Liège, 4000 Belgium ; Bioinformatics and Modeling, GIGA-R, Avenue de l'Hôpital 1, Sart-Tilman, 4000 Belgium
| |
Collapse
|
12
|
Fouladi R, Bessonov K, Van Lishout F, Van Steen K. Model-Based Multifactor Dimensionality Reduction for Rare Variant Association Analysis. Hum Hered 2015. [PMID: 26201701 DOI: 10.1159/000381286] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Genome-wide association studies have revealed a vast amount of common loci associated to human complex diseases. Still, a large proportion of heritability remains unexplained. The extent to which rare genetic variants (RVs) are able to explain a relevant portion of the genetic heritability for complex traits leaves room for several debates and paves the way to the collection of RV databases and the development of novel analytic tools to analyze these. To date, several statistical methods have been proposed to uncover the association of RVs with complex diseases, but none of them is the clear winner in all possible scenarios of study design and assumed underlying disease model. The latter may involve differences in the distributions of effect sizes, proportions of causal variants, and ratios of protective to deleterious variants at distinct regions throughout the genome. Therefore, there is a need for robust scalable methods with acceptable overall performance in terms of power and type I error under various realistic scenarios. In this paper, we propose a novel RV association analysis strategy, which satisfies several of the desired properties that a RV analysis tool should exhibit.
Collapse
Affiliation(s)
- Ramouna Fouladi
- Systems and Modeling Unit, Montefiore Institute, and Bioinformatics and Modeling, GIGA-R, University of Liège, Liège, Belgium
| | | | | | | |
Collapse
|
13
|
Gola D, Mahachie John JM, van Steen K, König IR. A roadmap to multifactor dimensionality reduction methods. Brief Bioinform 2015; 17:293-308. [PMID: 26108231 PMCID: PMC4793893 DOI: 10.1093/bib/bbv038] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2015] [Indexed: 02/02/2023] Open
Abstract
Complex diseases are defined to be determined by multiple genetic and environmental factors alone as well as in interactions. To analyze interactions in genetic data, many statistical methods have been suggested, with most of them relying on statistical regression models. Given the known limitations of classical methods, approaches from the machine-learning community have also become attractive. From this latter family, a fast-growing collection of methods emerged that are based on the Multifactor Dimensionality Reduction (MDR) approach. Since its first introduction, MDR has enjoyed great popularity in applications and has been extended and modified multiple times. Based on a literature search, we here provide a systematic and comprehensive overview of these suggested methods. The methods are described in detail, and the availability of implementations is listed. Most recent approaches offer to deal with large-scale data sets and rare variants, which is why we expect these methods to even gain in popularity.
Collapse
|
14
|
Bessonov K, Gusareva ES, Van Steen K. A cautionary note on the impact of protocol changes for genome-wide association SNP × SNP interaction studies: an example on ankylosing spondylitis. Hum Genet 2015; 134:761-73. [DOI: 10.1007/s00439-015-1560-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Accepted: 04/26/2015] [Indexed: 12/11/2022]
|
15
|
Talluri R, Shete S. Evaluating methods for modeling epistasis networks with application to head and neck cancer. Cancer Inform 2015; 14:17-23. [PMID: 25733798 PMCID: PMC4332043 DOI: 10.4137/cin.s17289] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2014] [Revised: 01/05/2015] [Accepted: 01/06/2015] [Indexed: 11/23/2022] Open
Abstract
Epistasis helps to explain how multiple single-nucleotide polymorphisms (SNPs) interact to cause disease. A variety of tools have been developed to detect epistasis. In this article, we explore the strengths and weaknesses of an information theory approach for detecting epistasis and compare it to the logistic regression approach through simulations. We consider several scenarios to simulate the involvement of SNPs in an epistasis network with respect to linkage disequilibrium patterns among them and the presence or absence of main and interaction effects. We conclude that the information theory approach more efficiently detects interaction effects when main effects are absent, whereas, in general, the logistic regression approach is appropriate in all scenarios but results in higher false positives. We compute epistasis networks for SNPs in the FSD1L gene using a two-phase head and neck cancer genome-wide association study involving 2,185 cases and 4,507 controls to demonstrate the practical application of the methods.
Collapse
Affiliation(s)
- Rajesh Talluri
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Sanjay Shete
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
16
|
Grange L, Bureau JF, Nikolayeva I, Paul R, Van Steen K, Schwikowski B, Sakuntabhai A. Filter-free exhaustive odds ratio-based genome-wide interaction approach pinpoints evidence for interaction in the HLA region in psoriasis. BMC Genet 2015; 16:11. [PMID: 25655172 PMCID: PMC4341885 DOI: 10.1186/s12863-015-0174-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Accepted: 01/23/2015] [Indexed: 12/02/2022] Open
Abstract
Background Deciphering the genetic architecture of complex traits is still a major challenge for human genetics. In most cases, genome-wide association studies have only partially explained the heritability of traits and diseases. Epistasis, one potentially important cause of this missing heritability, is difficult to explore at the genome-wide level. Here, we develop and assess a tool based on interactive odds ratios (IOR), Fast Odds Ratio-based sCan for Epistasis (FORCE), as a novel approach for exhaustive genome-wide epistasis search. IOR is the ratio between the multiplicative term of the odds ratio (OR) of having each variant over the OR of having both of them. By definition, an IOR that significantly deviates from 1 suggests the occurrence of an interaction (epistasis). As the IOR is fast to calculate, we used the IOR to rank and select pairs of interacting polymorphisms for P value estimation, which is more time consuming. Results FORCE displayed power and accuracy similar to existing parametric and non-parametric methods, and is fast enough to complete a filter-free genome-wide epistasis search in a few days on a standard computer. Analysis of psoriasis data uncovered novel epistatic interactions in the HLA region, corroborating the known major and complex role of the HLA region in psoriasis susceptibility. Conclusions Our systematic study revealed the ability of FORCE to uncover novel interactions, highlighted the importance of exhaustiveness, as well as its specificity for certain types of interactions that were not detected by existing approaches. We therefore believe that FORCE is a valuable new tool for decoding the genetic basis of complex diseases. Electronic supplementary material The online version of this article (doi:10.1186/s12863-015-0174-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Laura Grange
- Department of Genomes and Genetics, Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris, 75015, France. .,CNRS URA3012, Paris, 75015, France. .,Université Paris Diderot, Paris, 75013, France.
| | - Jean-François Bureau
- Department of Genomes and Genetics, Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris, 75015, France. .,CNRS URA3012, Paris, 75015, France.
| | - Iryna Nikolayeva
- Department of Genomes and Genetics, Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris, 75015, France. .,Department of Genomes and Genetics, Institut Pasteur, Systems Biology Lab, Paris, 75015, France. .,Université Paris-Descartes, Sorbonne Paris Cité, Paris, France.
| | - Richard Paul
- Department of Genomes and Genetics, Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris, 75015, France. .,CNRS URA3012, Paris, 75015, France.
| | - Kristel Van Steen
- Systems and Modeling Unit, Montefiore institute, University of Liège, Liège, Belgium. .,Bioinformatics and Modeling, GiGA-R, University of Liège, Liège, Belgium.
| | - Benno Schwikowski
- Department of Genomes and Genetics, Institut Pasteur, Systems Biology Lab, Paris, 75015, France.
| | - Anavaj Sakuntabhai
- Department of Genomes and Genetics, Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris, 75015, France. .,CNRS URA3012, Paris, 75015, France.
| |
Collapse
|
17
|
Wang X, Zhang D, Tzeng JY. Pathway-guided identification of gene-gene interactions. Ann Hum Genet 2014; 78:478-91. [PMID: 25227508 PMCID: PMC4363308 DOI: 10.1111/ahg.12080] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2014] [Accepted: 07/03/2014] [Indexed: 12/26/2022]
Abstract
Assessing gene-gene interactions (GxG) at the gene level can permit examination of epistasis at biologically functional units with amplified interaction signals from marker-marker pairs. While current gene-based GxG methods tend to be designed for two or a few genes, for complex traits, it is often common to have a list of many candidate genes to explore GxG. We propose a regression model with pathway-guided regularization for detecting interactions among genes. Specifically, we use the principal components to summarize the SNP-SNP interactions between a gene pair, and use an L1 penalty that incorporates adaptive weights based on biological guidance and trait supervision to identify important main and interaction effects. Our approach aims to combine biological guidance and data adaptiveness, and yields credible findings that may be likely to shed insights in order to formulate biological hypotheses for further molecular studies. The proposed approach can be used to explore the GxG with a list of many candidate genes and is applicable even when sample size is smaller than the number of predictors studied. We evaluate the utility of the proposed method using simulation and real data analysis. The results suggest improved performance over methods not utilizing pathway and trait guidance.
Collapse
Affiliation(s)
- Xin Wang
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA
- Department of Statistics, North Carolina State University, Raleigh, NC, USA
| | - Daowen Zhang
- Department of Statistics, North Carolina State University, Raleigh, NC, USA
| | - Jung-Ying Tzeng
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA
- Department of Statistics, North Carolina State University, Raleigh, NC, USA
- Department of Statistics, National Cheng-Kung University, Tainan, Taiwan
| |
Collapse
|
18
|
Abstract
Genome-wide association studies (GWASs) have become the focus of the statistical analysis of complex traits in humans, successfully shedding light on several aspects of genetic architecture and biological aetiology. Single-nucleotide polymorphisms (SNPs) are usually modelled as having additive, cumulative and independent effects on the phenotype. Although evidently a useful approach, it is often argued that this is not a realistic biological model and that epistasis (that is, the statistical interaction between SNPs) should be included. The purpose of this Review is to summarize recent directions in methodology for detecting epistasis and to discuss evidence of the role of epistasis in human complex trait variation. We also discuss the relevance of epistasis in the context of GWASs and potential hazards in the interpretation of statistical interaction terms.
Collapse
|
19
|
Maciukiewicz M, Dmitrzak-Weglarz M, Pawlak J, Leszczynska-Rodziewicz A, Zaremba D, Skibinska M, Hauser J. Analysis of genetic association and epistasis interactions between circadian clock genes and symptom dimensions of bipolar affective disorder. Chronobiol Int 2014; 31:770-8. [PMID: 24673294 DOI: 10.3109/07420528.2014.899244] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Bipolar affective disorder (BD) is a severe psychiatric disorder characterized by periodic changes in mood from depression to mania. Disruptions of biological rhythms increase risk of mood disorders. Because clinical representation of disease is heterogeneous, homogenous sets of patients are suggested to use in the association analyses. In our study, we aimed to apply previously computed structure of bipolar disorder symptom dimension for analyses of genetic association. We based quantitative trait on: main depression, sleep disturbances, appetite disturbances, excitement and psychotic dimensions consisted of OPCRIT checklist items. We genotyped 42 polymorphisms from circadian clock genes: PER3, ARNTL, CLOCK and TIMELSSS from 511 patients BD (n = 292 women and n = 219 men). As quantitative trait we used clinical dimensions, described above. Genetic associations between alleles and quantitative trait were performed using applied regression models applied in PLINK. In addition, we used the Kruskal-Wallis test to look for associations between genotypes and quantitative trait. During second stage of our analyses, we used multidimensional scaling (multifactor dimensionality reduction) for quantitative trait to compute pairwise epistatic interactions between circadian gene variants. We found association between ARNTL variant rs11022778 main depression (p = 0.00047) and appetite disturbances (p = 0.004). In epistatic interaction analyses, we observed two locus interactions between sleep disturbances (p = 0.007; rs11824092 of ARNTL and rs11932595 of CLOCK) as well as interactions of subdimension in main depression and ARNTL variants (p = 0.0011; rs3789327, rs10766075) and appetite disturbances in depression and ARNTL polymorphism (p = 7 × 10(-4); rs11022778, rs156243).
Collapse
Affiliation(s)
- Malgorzata Maciukiewicz
- Laboratory of Psychiatric Genetics, Department of Psychiatry, Poznan University of Medical Sciences , Poznan , Poland
| | | | | | | | | | | | | |
Collapse
|
20
|
Li X, Price MA, He D, Kamali A, Karita E, Lakhi S, Sanders EJ, Anzala O, Amornkul PN, Allen S, Hunter E, Kaslow RA, Gilmour J, Tang J. Host genetics and viral load in primary HIV-1 infection: clear evidence for gene by sex interactions. Hum Genet 2014; 133:1187-97. [PMID: 24969460 PMCID: PMC4127002 DOI: 10.1007/s00439-014-1465-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2014] [Accepted: 06/16/2014] [Indexed: 01/09/2023]
Abstract
Research in the past two decades has generated unequivocal evidence that host genetic variations substantially account for the heterogeneous outcomes following human immunodeficiency virus type 1 (HIV-1) infection. In particular, genes encoding human leukocyte antigens (HLA) have various alleles, haplotypes, or specific motifs that can dictate the set-point (a relatively steady state) of plasma viral load (VL), although rapid viral evolution driven by innate and acquired immune responses can obscure the long-term relationships between HLA genotypes and HIV-1-related outcomes. In our analyses of VL data from 521 recent HIV-1 seroconverters enrolled from eastern and southern Africa, HLA-A*03:01 was strongly and persistently associated with low VL in women (frequency = 11.3 %, P < 0.0001) but not in men (frequency = 7.7 %, P = 0.66). This novel sex by HLA interaction (P = 0.003, q = 0.090) did not extend to other frequent HLA class I alleles (n = 34), although HLA-C*18:01 also showed a weak association with low VL in women only (frequency = 9.3 %, P = 0.042, q > 0.50). In a reduced multivariable model, age, sex, geography (clinical sites), previously identified HLA factors (HLA-B*18, B*45, B*53, and B*57), and the interaction term for female sex and HLA-A*03:01 collectively explained 17.0 % of the overall variance in geometric mean VL over a 3-year follow-up period (P < 0.0001). Multiple sensitivity analyses of longitudinal and cross-sectional VL data yielded consistent results. These findings can serve as a proof of principle that the gap of "missing heritability" in quantitative genetics can be partially bridged by a systematic evaluation of sex-specific associations.
Collapse
Affiliation(s)
- Xuelin Li
- Department of Medicine, University of Alabama at Birmingham, 1665 University Boulevard, Birmingham, AL 35294 USA
| | - Matthew A. Price
- International AIDS Vaccine Initiative, New York City, NY USA
- Department of Epidemiology and Biostatistics, UCSF, San Francisco, CA USA
| | - Dongning He
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL USA
| | - Anatoli Kamali
- MRC/UVRI Uganda Virus Research Unit on AIDS, Masaka Site, Masaka, Uganda
| | | | - Shabir Lakhi
- Zambia-Emory HIV-1 Research Project, Lusaka, Zambia
| | - Eduard J. Sanders
- Centre for Geographic Medicine Research, Kenya Medical Research Institute (KEMRI), Kilifi, Kenya
- Centre for Clinical Vaccinology and Tropical Medicine, University of Oxford, Headington, UK
| | - Omu Anzala
- Kenya AIDS Vaccine Initiative (KAVI), Nairobi, Kenya
| | - Pauli N. Amornkul
- International AIDS Vaccine Initiative, New York City, NY USA
- Department of Epidemiology and Biostatistics, UCSF, San Francisco, CA USA
| | - Susan Allen
- Projet San Francisco, Kigali, Rwanda
- Zambia-Emory HIV-1 Research Project, Lusaka, Zambia
- Department of Pathology and Laboratory Medicine, Emory University, Atlanta, GA USA
| | - Eric Hunter
- Vaccine Research Center, Emory University, Atlanta, GA USA
| | - Richard A. Kaslow
- International AIDS Vaccine Initiative, New York City, NY USA
- Present Address: Department of Veterans Affairs, Washington, DC, 20420 USA
| | - Jill Gilmour
- International AIDS Vaccine Initiative, Human Immunology Laboratory, Chelsea and Westminster Hospital, London, UK
| | - Jianming Tang
- Department of Medicine, University of Alabama at Birmingham, 1665 University Boulevard, Birmingham, AL 35294 USA
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL USA
| | | |
Collapse
|
21
|
Mahachie John JM, Van Lishout F, Gusareva ES, Van Steen K. A robustness study of parametric and non-parametric tests in model-based multifactor dimensionality reduction for epistasis detection. BioData Min 2013; 6:9. [PMID: 23618370 PMCID: PMC3668290 DOI: 10.1186/1756-0381-6-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Accepted: 04/20/2013] [Indexed: 11/10/2022] Open
Abstract
Background Applying a statistical method implies identifying underlying (model) assumptions and checking their validity in the particular context. One of these contexts is association modeling for epistasis detection. Here, depending on the technique used, violation of model assumptions may result in increased type I error, power loss, or biased parameter estimates. Remedial measures for violated underlying conditions or assumptions include data transformation or selecting a more relaxed modeling or testing strategy. Model-Based Multifactor Dimensionality Reduction (MB-MDR) for epistasis detection relies on association testing between a trait and a factor consisting of multilocus genotype information. For quantitative traits, the framework is essentially Analysis of Variance (ANOVA) that decomposes the variability in the trait amongst the different factors. In this study, we assess through simulations, the cumulative effect of deviations from normality and homoscedasticity on the overall performance of quantitative Model-Based Multifactor Dimensionality Reduction (MB-MDR) to detect 2-locus epistasis signals in the absence of main effects. Methodology Our simulation study focuses on pure epistasis models with varying degrees of genetic influence on a quantitative trait. Conditional on a multilocus genotype, we consider quantitative trait distributions that are normal, chi-square or Student’s t with constant or non-constant phenotypic variances. All data are analyzed with MB-MDR using the built-in Student’s t-test for association, as well as a novel MB-MDR implementation based on Welch’s t-test. Traits are either left untransformed or are transformed into new traits via logarithmic, standardization or rank-based transformations, prior to MB-MDR modeling. Results Our simulation results show that MB-MDR controls type I error and false positive rates irrespective of the association test considered. Empirically-based MB-MDR power estimates for MB-MDR with Welch’s t-tests are generally lower than those for MB-MDR with Student’s t-tests. Trait transformations involving ranks tend to lead to increased power compared to the other considered data transformations. Conclusions When performing MB-MDR screening for gene-gene interactions with quantitative traits, we recommend to first rank-transform traits to normality and then to apply MB-MDR modeling with Student’s t-tests as internal tests for association.
Collapse
|
22
|
Van Lishout F, Mahachie John JM, Gusareva ES, Urrea V, Cleynen I, Théâtre E, Charloteaux B, Calle ML, Wehenkel L, Van Steen K. An efficient algorithm to perform multiple testing in epistasis screening. BMC Bioinformatics 2013; 14:138. [PMID: 23617239 PMCID: PMC3648350 DOI: 10.1186/1471-2105-14-138] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2012] [Accepted: 04/12/2013] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Research in epistasis or gene-gene interaction detection for human complex traits has grown over the last few years. It has been marked by promising methodological developments, improved translation efforts of statistical epistasis to biological epistasis and attempts to integrate different omics information sources into the epistasis screening to enhance power. The quest for gene-gene interactions poses severe multiple-testing problems. In this context, the maxT algorithm is one technique to control the false-positive rate. However, the memory needed by this algorithm rises linearly with the amount of hypothesis tests. Gene-gene interaction studies will require a memory proportional to the squared number of SNPs. A genome-wide epistasis search would therefore require terabytes of memory. Hence, cache problems are likely to occur, increasing the computation time. In this work we present a new version of maxT, requiring an amount of memory independent from the number of genetic effects to be investigated. This algorithm was implemented in C++ in our epistasis screening software MBMDR-3.0.3. We evaluate the new implementation in terms of memory efficiency and speed using simulated data. The software is illustrated on real-life data for Crohn's disease. RESULTS In the case of a binary (affected/unaffected) trait, the parallel workflow of MBMDR-3.0.3 analyzes all gene-gene interactions with a dataset of 100,000 SNPs typed on 1000 individuals within 4 days and 9 hours, using 999 permutations of the trait to assess statistical significance, on a cluster composed of 10 blades, containing each four Quad-Core AMD Opteron(tm) Processor 2352 2.1 GHz. In the case of a continuous trait, a similar run takes 9 days. Our program found 14 SNP-SNP interactions with a multiple-testing corrected p-value of less than 0.05 on real-life Crohn's disease (CD) data. CONCLUSIONS Our software is the first implementation of the MB-MDR methodology able to solve large-scale SNP-SNP interactions problems within a few days, without using much memory, while adequately controlling the type I error rates. A new implementation to reach genome-wide epistasis screening is under construction. In the context of Crohn's disease, MBMDR-3.0.3 could identify epistasis involving regions that are well known in the field and could be explained from a biological point of view. This demonstrates the power of our software to find relevant phenotype-genotype higher-order associations.
Collapse
Affiliation(s)
- François Van Lishout
- Systems and Modeling Unit, Montefiore Institute, University of Liège, 4000 Liège, Belgium.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Wang M, Wang Q, Pan Y. From QTL to QTN: candidate gene set approach and a case study in porcine IGF1-FoxO pathway. PLoS One 2013; 8:e53452. [PMID: 23341942 PMCID: PMC3544924 DOI: 10.1371/journal.pone.0053452] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2012] [Accepted: 11/30/2012] [Indexed: 01/15/2023] Open
Abstract
Unraveling the genetic background of economic traits is a major goal in modern animal genetics and breeding. Both candidate gene analysis and QTL mapping have previously been used for identifying genes and chromosome regions related to studied traits. However, most of these studies may be limited in their ability to fully consider how multiple genetic factors may influence a particular phenotype of interest. If possible, taking advantage of the combined effect of multiple genetic factors is expected to be more powerful than analyzing single sites, as the joint action of multiple loci within a gene or across multiple genes acting in the same gene set will likely have a greater influence on phenotypic variation. Thus, we proposed a pipeline of gene set analysis that utilized information from multiple loci to improve statistical power. We assessed the performance of this approach by both simulated and a real IGF1-FoxO pathway data set. The results showed that our new method can identify the association between genetic variation and phenotypic variation with higher statistical power and unravel the mechanisms of complex traits in a point of gene set. Additionally, the proposed pipeline is flexible to be extended to model complex genetic structures that include the interactions between different gene sets and between gene sets and environments.
Collapse
Affiliation(s)
- Minghui Wang
- School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, People’s Republic of China
- Shanghai Key Lab of Animal Biotechnology, Shanghai, People’s Republic of China
| | - Qishan Wang
- School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, People’s Republic of China
- Shanghai Key Lab of Animal Biotechnology, Shanghai, People’s Republic of China
| | - Yuchun Pan
- School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, People’s Republic of China
- Shanghai Key Lab of Animal Biotechnology, Shanghai, People’s Republic of China
- * E-mail:
| |
Collapse
|
24
|
Aschard H, Lutz S, Maus B, Duell EJ, Fingerlin TE, Chatterjee N, Kraft P, Van Steen K. Challenges and opportunities in genome-wide environmental interaction (GWEI) studies. Hum Genet 2012; 131:1591-613. [PMID: 22760307 DOI: 10.1007/s00439-012-1192-0] [Citation(s) in RCA: 110] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2012] [Accepted: 06/11/2012] [Indexed: 02/03/2023]
Abstract
The interest in performing gene-environment interaction studies has seen a significant increase with the increase of advanced molecular genetics techniques. Practically, it became possible to investigate the role of environmental factors in disease risk and hence to investigate their role as genetic effect modifiers. The understanding that genetics is important in the uptake and metabolism of toxic substances is an example of how genetic profiles can modify important environmental risk factors to disease. Several rationales exist to set up gene-environment interaction studies and the technical challenges related to these studies-when the number of environmental or genetic risk factors is relatively small-has been described before. In the post-genomic era, it is now possible to study thousands of genes and their interaction with the environment. This brings along a whole range of new challenges and opportunities. Despite a continuing effort in developing efficient methods and optimal bioinformatics infrastructures to deal with the available wealth of data, the challenge remains how to best present and analyze genome-wide environmental interaction (GWEI) studies involving multiple genetic and environmental factors. Since GWEIs are performed at the intersection of statistical genetics, bioinformatics and epidemiology, usually similar problems need to be dealt with as for genome-wide association gene-gene interaction studies. However, additional complexities need to be considered which are typical for large-scale epidemiological studies, but are also related to "joining" two heterogeneous types of data in explaining complex disease trait variation or for prediction purposes.
Collapse
Affiliation(s)
- Hugues Aschard
- Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA.
| | | | | | | | | | | | | | | |
Collapse
|
25
|
Mahachie John JM, Cattaert T, Van Lishout F, Gusareva ES, Van Steen K. Lower-order effects adjustment in quantitative traits model-based multifactor dimensionality reduction. PLoS One 2012; 7:e29594. [PMID: 22242176 PMCID: PMC3252336 DOI: 10.1371/journal.pone.0029594] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2011] [Accepted: 12/01/2011] [Indexed: 11/18/2022] Open
Abstract
Identifying gene-gene interactions or gene-environment interactions in studies of human complex diseases remains a big challenge in genetic epidemiology. An additional challenge, often forgotten, is to account for important lower-order genetic effects. These may hamper the identification of genuine epistasis. If lower-order genetic effects contribute to the genetic variance of a trait, identified statistical interactions may simply be due to a signal boost of these effects. In this study, we restrict attention to quantitative traits and bi-allelic SNPs as genetic markers. Moreover, our interaction study focuses on 2-way SNP-SNP interactions. Via simulations, we assess the performance of different corrective measures for lower-order genetic effects in Model-Based Multifactor Dimensionality Reduction epistasis detection, using additive and co-dominant coding schemes. Performance is evaluated in terms of power and familywise error rate. Our simulations indicate that empirical power estimates are reduced with correction of lower-order effects, likewise familywise error rates. Easy-to-use automatic SNP selection procedures, SNP selection based on “top” findings, or SNP selection based on p-value criterion for interesting main effects result in reduced power but also almost zero false positive rates. Always accounting for main effects in the SNP-SNP pair under investigation during Model-Based Multifactor Dimensionality Reduction analysis adequately controls false positive epistasis findings. This is particularly true when adopting a co-dominant corrective coding scheme. In conclusion, automatic search procedures to identify lower-order effects to correct for during epistasis screening should be avoided. The same is true for procedures that adjust for lower-order effects prior to Model-Based Multifactor Dimensionality Reduction and involve using residuals as the new trait. We advocate using “on-the-fly” lower-order effects adjusting when screening for SNP-SNP interactions using Model-Based Multifactor Dimensionality Reduction analysis.
Collapse
Affiliation(s)
- Jestinah M. Mahachie John
- Systems and Modeling Unit, Montefiore Institute, University of Liege, Liege, Belgium
- Bioinformatics and Modeling, GIGA-R, University of Liege, Liege, Belgium
- * E-mail:
| | - Tom Cattaert
- Systems and Modeling Unit, Montefiore Institute, University of Liege, Liege, Belgium
- Bioinformatics and Modeling, GIGA-R, University of Liege, Liege, Belgium
| | - François Van Lishout
- Systems and Modeling Unit, Montefiore Institute, University of Liege, Liege, Belgium
- Bioinformatics and Modeling, GIGA-R, University of Liege, Liege, Belgium
| | - Elena S. Gusareva
- Systems and Modeling Unit, Montefiore Institute, University of Liege, Liege, Belgium
- Bioinformatics and Modeling, GIGA-R, University of Liege, Liege, Belgium
| | - Kristel Van Steen
- Systems and Modeling Unit, Montefiore Institute, University of Liege, Liege, Belgium
- Bioinformatics and Modeling, GIGA-R, University of Liege, Liege, Belgium
| |
Collapse
|