Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Mahachie John JM, Van Lishout F, Van Steen K. Model-Based Multifactor Dimensionality Reduction to detect epistasis for quantitative traits in the presence of error-free and noisy data. Eur J Hum Genet 2011;19:696-703. [PMID: 21407267 DOI: 10.1038/ejhg.2011.17] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open

For:	Mahachie John JM, Van Lishout F, Van Steen K. Model-Based Multifactor Dimensionality Reduction to detect epistasis for quantitative traits in the presence of error-free and noisy data. Eur J Hum Genet 2011;19:696-703. [PMID: 21407267 DOI: 10.1038/ejhg.2011.17] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open

Number

Cited by Other Article(s)

Veyssiere M, Rodriguez Ordonez MDP, Chalabi S, Michou L, Cornelis F, Boland A, Olaso R, Deleuze JF, Petit-Teixeira E, Chaudru V. MYLK*FLNB and DOCK1*LAMA2 gene-gene interactions associated with rheumatoid arthritis in the focal adhesion pathway. Front Genet 2024;15:1375036. [PMID: 38803542 PMCID: PMC11128622 DOI: 10.3389/fgene.2024.1375036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 04/18/2024] [Indexed: 05/29/2024] Open

Abegaz F, Van Lishout F, Mahachie John JM, Chiachoompu K, Bhardwaj A, Duroux D, Gusareva ES, Wei Z, Hakonarson H, Van Steen K. Performance of model-based multifactor dimensionality reduction methods for epistasis detection by controlling population structure. BioData Min 2021;14:16. [PMID: 33608043 PMCID: PMC7893746 DOI: 10.1186/s13040-021-00247-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Accepted: 02/07/2021] [Indexed: 12/15/2022] Open

Gola D, König IR. Empowering individual trait prediction using interactions for precision medicine. BMC Bioinformatics 2021;22:74. [PMID: 33602124 PMCID: PMC7890638 DOI: 10.1186/s12859-021-04011-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Accepted: 02/08/2021] [Indexed: 11/11/2022] Open

Abstract

Background

One component of precision medicine is to construct prediction models with their predicitve ability as high as possible, e.g. to enable individual risk prediction. In genetic epidemiology, complex diseases like coronary artery disease, rheumatoid arthritis, and type 2 diabetes, have a polygenic basis and a common assumption is that biological and genetic features affect the outcome under consideration via interactions. In the case of omics data, the use of standard approaches such as generalized linear models may be suboptimal and machine learning methods are appealing to make individual predictions. However, most of these algorithms focus mostly on main or marginal effects of the single features in a dataset. On the other hand, the detection of interacting features is an active area of research in the realm of genetic epidemiology. One big class of algorithms to detect interacting features is based on the multifactor dimensionality reduction (MDR). Here, we further develop the model-based MDR (MB-MDR), a powerful extension of the original MDR algorithm, to enable interaction empowered individual prediction.

Results

Using a comprehensive simulation study we show that our new algorithm (median AUC: 0.66) can use information hidden in interactions and outperforms two other state-of-the-art algorithms, namely the Random Forest (median AUC: 0.54) and Elastic Net (median AUC: 0.50), if interactions are present in a scenario of two pairs of two features having small effects. The performance of these algorithms is comparable if no interactions are present. Further, we show that our new algorithm is applicable to real data by comparing the performance of the three algorithms on a dataset of rheumatoid arthritis cases and healthy controls. As our new algorithm is not only applicable to biological/genetic data but to all datasets with discrete features, it may have practical implications in other research fields where interactions between features have to be considered as well, and we made our method available as an R package (https://github.com/imbs-hl/MBMDRClassifieR).

Conclusions

The explicit use of interactions between features can improve the prediction performance and thus should be included in further attempts to move precision medicine forward.

Collapse

Riahi P, Kazemnejad A, Mostafaei S, Meguro A, Mizuki N, Ashraf-Ganjouei A, Javinani A, Faezi ST, Shahram F, Mahmoudi M. ERAP1 polymorphisms interactions and their association with Behçet's disease susceptibly: Application of Model-Based Multifactor Dimension Reduction Algorithm (MB-MDR). PLoS One 2020;15:e0227997. [PMID: 32023277 PMCID: PMC7001967 DOI: 10.1371/journal.pone.0227997] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Accepted: 01/03/2020] [Indexed: 12/15/2022] Open

Abegaz F, Van Lishout F, Mahachie John JM, Chiachoompu K, Bhardwaj A, Gusareva ES, Wei Z, Hakonarson H, Van Steen K. Epistasis Detection in Genome-Wide Screening for Complex Human Diseases in Structured Populations. SYSTEMS MEDICINE 2019. [DOI: 10.1089/sysm.2019.0003] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open

Sandri TL, Andrade FA, Lidani KCF, Einig E, Boldt ABW, Mordmüller B, Esen M, Messias-Reason IJ. Human collectin-11 (COLEC11) and its synergic genetic interaction with MASP2 are associated with the pathophysiology of Chagas Disease. PLoS Negl Trop Dis 2019;13:e0007324. [PMID: 30995222 PMCID: PMC6488100 DOI: 10.1371/journal.pntd.0007324] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Revised: 04/29/2019] [Accepted: 03/22/2019] [Indexed: 12/27/2022] Open

Abstract

Chagas Disease (CD) is an anthropozoonosis caused by Trypanosoma cruzi. With complex pathophysiology and variable clinical presentation, CD outcome can be influenced by parasite persistence and the host immune response. Complement activation is one of the primary defense mechanisms against pathogens, which can be initiated via pathogen recognition by pattern recognition molecules (PRMs). Collectin-11 is a multifunctional soluble PRM lectin, widely distributed throughout the body, with important participation in host defense, homeostasis, and embryogenesis. In complex with mannose-binding lectin-associated serine proteases (MASPs), collectin-11 may initiate the activation of complement, playing a role against pathogens, including T. cruzi. In this study, collectin-11 plasma levels and COLEC11 variants in exon 7 were assessed in a Brazilian cohort of 251 patients with chronic CD and 108 healthy controls. Gene-gene interactions between COLEC11 and MASP2 variants were analyzed. Collectin-11 levels were significantly decreased in CD patients compared to controls (p<0.0001). The allele rs7567833G, the genotypes rs7567833AG and rs7567833GG, and the COLEC11*GGC haplotype were related to T. cruzi infection and clinical progression towards symptomatic CD. COLEC11 and MASP2*CD risk genotypes were associated with cardiomyopathy (p = 0.014; OR 9.3, 95% CI 1.2–74) and with the cardiodigestive form of CD (p = 0.005; OR 15.2, 95% CI 1.7–137), suggesting that both loci act synergistically in immune modulation of the disease. The decreased levels of collectin-11 in CD patients may be associated with the disease process. The COLEC11 variant rs7567833G and also the COLEC11 and MASP2*CD risk genotype interaction were associated with the pathophysiology of CD.

The heterogeneity of clinical progression during chronic Trypanosoma cruzi infection and the mechanisms determining why some individuals develop symptoms whereas others remain asymptomatic are still poorly understood. The pathogenesis of chronic Chagas Disease (CD) has been attributed mainly to the persistence of the causing parasite and the character of individual host immune responses. Collectin-11 is a host immune response molecule with affinity for sugars found on the T. cruzi’s surface. Together with mannose-binding lectin-associated serine proteases (MASPs), it triggers the host defense response against pathogens. Genetic variants and protein levels of MASP-2 and the mannose-binding lectin (MBL), a molecule structurally similar to collectin-11, have been found to be associated with susceptibility to T. cruzi infection and clinical progression to cardiomyopathy. This prompted us to investigate collectin-11 genetic variants and protein levels in 251 patients with chronic CD and 108 healthy individuals, and to examine the effect of gene interaction between COLEC11 and MASP2 risk mutations. We found an association to CD infection with COLEC11 gene variants and reduced collectin-11 levels. The concomitant presence of these genetic variants and MASP2 risk mutations greatly increased the odds for cardiomyopathy. This is the first study to reveal a role for collectin-11 and COLEC11-MASP2 gene interaction in the pathogenesis of CD.

Collapse

Van Steen K, Moore JH. How to increase our belief in discovered statistical interactions via large-scale association studies? Hum Genet 2019;138:293-305. [PMID: 30840129 PMCID: PMC6483943 DOI: 10.1007/s00439-019-01987-w] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 02/20/2019] [Indexed: 12/31/2022]

Sharma V, Nandan A, Sharma AK, Singh H, Bharadwaj M, Sinha DN, Mehrotra R. Signature of genetic associations in oral cancer. Tumour Biol 2017;39:1010428317725923. [DOI: 10.1177/1010428317725923] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open

Abo Alchamlat S, Farnir F. KNN-MDR: a learning approach for improving interactions mapping performances in genome wide association studies. BMC Bioinformatics 2017;18:184. [PMID: 28327091 PMCID: PMC5361736 DOI: 10.1186/s12859-017-1599-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Accepted: 03/11/2017] [Indexed: 12/30/2022] Open

Zhang F, Xie D, Liang M, Xiong M. Functional Regression Models for Epistasis Analysis of Multiple Quantitative Traits. PLoS Genet 2016;12:e1005965. [PMID: 27104857 PMCID: PMC4841563 DOI: 10.1371/journal.pgen.1005965] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Accepted: 03/08/2016] [Indexed: 12/02/2022] Open

Abstract

To date, most genetic analyses of phenotypes have focused on analyzing single traits or analyzing each phenotype independently. However, joint epistasis analysis of multiple complementary traits will increase statistical power and improve our understanding of the complicated genetic structure of the complex diseases. Despite their importance in uncovering the genetic structure of complex traits, the statistical methods for identifying epistasis in multiple phenotypes remains fundamentally unexplored. To fill this gap, we formulate a test for interaction between two genes in multiple quantitative trait analysis as a multiple functional regression (MFRG) in which the genotype functions (genetic variant profiles) are defined as a function of the genomic position of the genetic variants. We use large-scale simulations to calculate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare the power with multivariate pairwise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for epistasis analysis is applied to five phenotypes of exome sequence data from the NHLBI’s Exome Sequencing Project (ESP) to detect pleiotropic epistasis. A total of 267 pairs of genes that formed a genetic interaction network showed significant evidence of epistasis influencing five traits. The results demonstrate that the joint interaction analysis of multiple phenotypes has a much higher power to detect interaction than the interaction analysis of a single trait and may open a new direction to fully uncovering the genetic structure of multiple phenotypes.

The widely used statistical methods test interaction for single phenotype. However, we often observe pleotropic genetic interaction effects. The simultaneous gene-gene (GxG) interaction analysis of multiple complementary traits will increase statistical power to detect GxG interactions. Although GxG interactions play an important role in uncovering the genetic structure of complex traits, the statistical methods for detecting GxG interactions in multiple phenotypes remains less developed owing to its potential complexity. Therefore, we extend functional regression model from single variate to multivariate for simultaneous GxG interaction analysis of multiple correlated phenotypes. Large-scale simulations are conducted to evaluate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare power with traditional multivariate pair-wise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for interaction analysis is applied to five phenotypes of exome sequence data from the NHLBI’s Exome Sequencing Project (ESP) to detect pleiotropic GxG interactions. 267 pairs of genes that formed a genetic interaction network showed significant evidence of interactions influencing five traits.

Collapse

Lishout FV, Gadaleta F, Moore JH, Wehenkel L, Steen KV. gammaMAXT: a fast multiple-testing correction algorithm. BioData Min 2015;8:36. [PMID: 26594243 PMCID: PMC4654922 DOI: 10.1186/s13040-015-0069-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2015] [Accepted: 11/08/2015] [Indexed: 02/07/2023] Open

Abstract

BACKGROUND

The purpose of the MaxT algorithm is to provide a significance test algorithm that controls the family-wise error rate (FWER) during simultaneous hypothesis testing. However, the requirements in terms of computing time and memory of this procedure are proportional to the number of investigated hypotheses. The memory issue has been solved in 2013 by Van Lishout's implementation of MaxT, which makes the memory usage independent from the size of the dataset. This algorithm is implemented in MBMDR-3.0.3, a software that is able to identify genetic interactions, for a variety of SNP-SNP based epistasis models effectively. On the other hand, that implementation turned out to be less suitable for genome-wide interaction analysis studies, due to the prohibitive computational burden.

RESULTS

In this work we introduce gammaMAXT, a novel implementation of the maxT algorithm for multiple testing correction. The algorithm was implemented in software MBMDR-4.2.2, as part of the MB-MDR framework to screen for SNP-SNP, SNP-environment or SNP-SNP-environment interactions at a genome-wide level. We show that, in the absence of interaction effects, test-statistics produced by the MB-MDR methodology follow a mixture distribution with a point mass at zero and a shifted gamma distribution for the top 10 % of the strictly positive values. We show that the gammaMAXT algorithm has a power comparable to MaxT and maintains FWER, but requires less computational resources and time. We analyze a dataset composed of 10(6) SNPs and 1000 individuals within one day on a 256-core computer cluster. The same analysis would take about 10(4) times longer with MBMDR-3.0.3.

CONCLUSIONS

These results are promising for future GWAIs. However, the proposed gammaMAXT algorithm offers a general significance assessment and multiple testing approach, applicable to any context that requires performing hundreds of thousands of tests. It offers new perspectives for fast and efficient permutation-based significance assessment in large-scale (integrated) omics studies.

Collapse

Fouladi R, Bessonov K, Van Lishout F, Van Steen K. Model-Based Multifactor Dimensionality Reduction for Rare Variant Association Analysis. Hum Hered 2015. [PMID: 26201701 DOI: 10.1159/000381286] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open

Gola D, Mahachie John JM, van Steen K, König IR. A roadmap to multifactor dimensionality reduction methods. Brief Bioinform 2015;17:293-308. [PMID: 26108231 PMCID: PMC4793893 DOI: 10.1093/bib/bbv038] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2015] [Indexed: 02/02/2023] Open

Bessonov K, Gusareva ES, Van Steen K. A cautionary note on the impact of protocol changes for genome-wide association SNP × SNP interaction studies: an example on ankylosing spondylitis. Hum Genet 2015;134:761-73. [DOI: 10.1007/s00439-015-1560-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Accepted: 04/26/2015] [Indexed: 12/11/2022]

Talluri R, Shete S. Evaluating methods for modeling epistasis networks with application to head and neck cancer. Cancer Inform 2015;14:17-23. [PMID: 25733798 PMCID: PMC4332043 DOI: 10.4137/cin.s17289] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2014] [Revised: 01/05/2015] [Accepted: 01/06/2015] [Indexed: 11/23/2022] Open

Grange L, Bureau JF, Nikolayeva I, Paul R, Van Steen K, Schwikowski B, Sakuntabhai A. Filter-free exhaustive odds ratio-based genome-wide interaction approach pinpoints evidence for interaction in the HLA region in psoriasis. BMC Genet 2015;16:11. [PMID: 25655172 PMCID: PMC4341885 DOI: 10.1186/s12863-015-0174-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Accepted: 01/23/2015] [Indexed: 12/02/2022] Open

Abstract

Background

Deciphering the genetic architecture of complex traits is still a major challenge for human genetics. In most cases, genome-wide association studies have only partially explained the heritability of traits and diseases. Epistasis, one potentially important cause of this missing heritability, is difficult to explore at the genome-wide level. Here, we develop and assess a tool based on interactive odds ratios (I_OR), Fast Odds Ratio-based sCan for Epistasis (FORCE), as a novel approach for exhaustive genome-wide epistasis search. I_OR is the ratio between the multiplicative term of the odds ratio (OR) of having each variant over the OR of having both of them. By definition, an I_OR that significantly deviates from 1 suggests the occurrence of an interaction (epistasis). As the I_OR is fast to calculate, we used the I_OR to rank and select pairs of interacting polymorphisms for P value estimation, which is more time consuming.

Results

FORCE displayed power and accuracy similar to existing parametric and non-parametric methods, and is fast enough to complete a filter-free genome-wide epistasis search in a few days on a standard computer. Analysis of psoriasis data uncovered novel epistatic interactions in the HLA region, corroborating the known major and complex role of the HLA region in psoriasis susceptibility.

Conclusions

Our systematic study revealed the ability of FORCE to uncover novel interactions, highlighted the importance of exhaustiveness, as well as its specificity for certain types of interactions that were not detected by existing approaches. We therefore believe that FORCE is a valuable new tool for decoding the genetic basis of complex diseases.

Electronic supplementary material

The online version of this article (doi:10.1186/s12863-015-0174-3) contains supplementary material, which is available to authorized users.

Collapse

Wang X, Zhang D, Tzeng JY. Pathway-guided identification of gene-gene interactions. Ann Hum Genet 2014;78:478-91. [PMID: 25227508 PMCID: PMC4363308 DOI: 10.1111/ahg.12080] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2014] [Accepted: 07/03/2014] [Indexed: 12/26/2022]

Detecting epistasis in human complex traits. Nat Rev Genet 2014;15:722-33. [PMID: 25200660 DOI: 10.1038/nrg3747] [Citation(s) in RCA: 259] [Impact Index Per Article: 25.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Maciukiewicz M, Dmitrzak-Weglarz M, Pawlak J, Leszczynska-Rodziewicz A, Zaremba D, Skibinska M, Hauser J. Analysis of genetic association and epistasis interactions between circadian clock genes and symptom dimensions of bipolar affective disorder. Chronobiol Int 2014;31:770-8. [PMID: 24673294 DOI: 10.3109/07420528.2014.899244] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

Abstract

Bipolar affective disorder (BD) is a severe psychiatric disorder characterized by periodic changes in mood from depression to mania. Disruptions of biological rhythms increase risk of mood disorders. Because clinical representation of disease is heterogeneous, homogenous sets of patients are suggested to use in the association analyses. In our study, we aimed to apply previously computed structure of bipolar disorder symptom dimension for analyses of genetic association. We based quantitative trait on: main depression, sleep disturbances, appetite disturbances, excitement and psychotic dimensions consisted of OPCRIT checklist items. We genotyped 42 polymorphisms from circadian clock genes: PER3, ARNTL, CLOCK and TIMELSSS from 511 patients BD (n = 292 women and n = 219 men). As quantitative trait we used clinical dimensions, described above. Genetic associations between alleles and quantitative trait were performed using applied regression models applied in PLINK. In addition, we used the Kruskal-Wallis test to look for associations between genotypes and quantitative trait. During second stage of our analyses, we used multidimensional scaling (multifactor dimensionality reduction) for quantitative trait to compute pairwise epistatic interactions between circadian gene variants. We found association between ARNTL variant rs11022778 main depression (p = 0.00047) and appetite disturbances (p = 0.004). In epistatic interaction analyses, we observed two locus interactions between sleep disturbances (p = 0.007; rs11824092 of ARNTL and rs11932595 of CLOCK) as well as interactions of subdimension in main depression and ARNTL variants (p = 0.0011; rs3789327, rs10766075) and appetite disturbances in depression and ARNTL polymorphism (p = 7 × 10(-4); rs11022778, rs156243).

Collapse

Li X, Price MA, He D, Kamali A, Karita E, Lakhi S, Sanders EJ, Anzala O, Amornkul PN, Allen S, Hunter E, Kaslow RA, Gilmour J, Tang J. Host genetics and viral load in primary HIV-1 infection: clear evidence for gene by sex interactions. Hum Genet 2014;133:1187-97. [PMID: 24969460 PMCID: PMC4127002 DOI: 10.1007/s00439-014-1465-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2014] [Accepted: 06/16/2014] [Indexed: 01/09/2023]

Mahachie John JM, Van Lishout F, Gusareva ES, Van Steen K. A robustness study of parametric and non-parametric tests in model-based multifactor dimensionality reduction for epistasis detection. BioData Min 2013;6:9. [PMID: 23618370 PMCID: PMC3668290 DOI: 10.1186/1756-0381-6-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Accepted: 04/20/2013] [Indexed: 11/10/2022] Open

Abstract

Background

Applying a statistical method implies identifying underlying (model) assumptions and checking their validity in the particular context. One of these contexts is association modeling for epistasis detection. Here, depending on the technique used, violation of model assumptions may result in increased type I error, power loss, or biased parameter estimates. Remedial measures for violated underlying conditions or assumptions include data transformation or selecting a more relaxed modeling or testing strategy. Model-Based Multifactor Dimensionality Reduction (MB-MDR) for epistasis detection relies on association testing between a trait and a factor consisting of multilocus genotype information. For quantitative traits, the framework is essentially Analysis of Variance (ANOVA) that decomposes the variability in the trait amongst the different factors. In this study, we assess through simulations, the cumulative effect of deviations from normality and homoscedasticity on the overall performance of quantitative Model-Based Multifactor Dimensionality Reduction (MB-MDR) to detect 2-locus epistasis signals in the absence of main effects.

Methodology

Our simulation study focuses on pure epistasis models with varying degrees of genetic influence on a quantitative trait. Conditional on a multilocus genotype, we consider quantitative trait distributions that are normal, chi-square or Student’s t with constant or non-constant phenotypic variances. All data are analyzed with MB-MDR using the built-in Student’s t-test for association, as well as a novel MB-MDR implementation based on Welch’s t-test. Traits are either left untransformed or are transformed into new traits via logarithmic, standardization or rank-based transformations, prior to MB-MDR modeling.

Results

Our simulation results show that MB-MDR controls type I error and false positive rates irrespective of the association test considered. Empirically-based MB-MDR power estimates for MB-MDR with Welch’s t-tests are generally lower than those for MB-MDR with Student’s t-tests. Trait transformations involving ranks tend to lead to increased power compared to the other considered data transformations.

Conclusions

When performing MB-MDR screening for gene-gene interactions with quantitative traits, we recommend to first rank-transform traits to normality and then to apply MB-MDR modeling with Student’s t-tests as internal tests for association.

Collapse

Van Lishout F, Mahachie John JM, Gusareva ES, Urrea V, Cleynen I, Théâtre E, Charloteaux B, Calle ML, Wehenkel L, Van Steen K. An efficient algorithm to perform multiple testing in epistasis screening. BMC Bioinformatics 2013;14:138. [PMID: 23617239 PMCID: PMC3648350 DOI: 10.1186/1471-2105-14-138] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2012] [Accepted: 04/12/2013] [Indexed: 12/22/2022] Open

Abstract

BACKGROUND

Research in epistasis or gene-gene interaction detection for human complex traits has grown over the last few years. It has been marked by promising methodological developments, improved translation efforts of statistical epistasis to biological epistasis and attempts to integrate different omics information sources into the epistasis screening to enhance power. The quest for gene-gene interactions poses severe multiple-testing problems. In this context, the maxT algorithm is one technique to control the false-positive rate. However, the memory needed by this algorithm rises linearly with the amount of hypothesis tests. Gene-gene interaction studies will require a memory proportional to the squared number of SNPs. A genome-wide epistasis search would therefore require terabytes of memory. Hence, cache problems are likely to occur, increasing the computation time. In this work we present a new version of maxT, requiring an amount of memory independent from the number of genetic effects to be investigated. This algorithm was implemented in C++ in our epistasis screening software MBMDR-3.0.3. We evaluate the new implementation in terms of memory efficiency and speed using simulated data. The software is illustrated on real-life data for Crohn's disease.

RESULTS

In the case of a binary (affected/unaffected) trait, the parallel workflow of MBMDR-3.0.3 analyzes all gene-gene interactions with a dataset of 100,000 SNPs typed on 1000 individuals within 4 days and 9 hours, using 999 permutations of the trait to assess statistical significance, on a cluster composed of 10 blades, containing each four Quad-Core AMD Opteron(tm) Processor 2352 2.1 GHz. In the case of a continuous trait, a similar run takes 9 days. Our program found 14 SNP-SNP interactions with a multiple-testing corrected p-value of less than 0.05 on real-life Crohn's disease (CD) data.

CONCLUSIONS

Our software is the first implementation of the MB-MDR methodology able to solve large-scale SNP-SNP interactions problems within a few days, without using much memory, while adequately controlling the type I error rates. A new implementation to reach genome-wide epistasis screening is under construction. In the context of Crohn's disease, MBMDR-3.0.3 could identify epistasis involving regions that are well known in the field and could be explained from a biological point of view. This demonstrates the power of our software to find relevant phenotype-genotype higher-order associations.

Collapse

Wang M, Wang Q, Pan Y. From QTL to QTN: candidate gene set approach and a case study in porcine IGF1-FoxO pathway. PLoS One 2013;8:e53452. [PMID: 23341942 PMCID: PMC3544924 DOI: 10.1371/journal.pone.0053452] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2012] [Accepted: 11/30/2012] [Indexed: 01/15/2023] Open

Aschard H, Lutz S, Maus B, Duell EJ, Fingerlin TE, Chatterjee N, Kraft P, Van Steen K. Challenges and opportunities in genome-wide environmental interaction (GWEI) studies. Hum Genet 2012;131:1591-613. [PMID: 22760307 DOI: 10.1007/s00439-012-1192-0] [Citation(s) in RCA: 110] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2012] [Accepted: 06/11/2012] [Indexed: 02/03/2023]

Mahachie John JM, Cattaert T, Van Lishout F, Gusareva ES, Van Steen K. Lower-order effects adjustment in quantitative traits model-based multifactor dimensionality reduction. PLoS One 2012;7:e29594. [PMID: 22242176 PMCID: PMC3252336 DOI: 10.1371/journal.pone.0029594] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2011] [Accepted: 12/01/2011] [Indexed: 11/18/2022] Open

Abstract

Identifying gene-gene interactions or gene-environment interactions in studies of human complex diseases remains a big challenge in genetic epidemiology. An additional challenge, often forgotten, is to account for important lower-order genetic effects. These may hamper the identification of genuine epistasis. If lower-order genetic effects contribute to the genetic variance of a trait, identified statistical interactions may simply be due to a signal boost of these effects. In this study, we restrict attention to quantitative traits and bi-allelic SNPs as genetic markers. Moreover, our interaction study focuses on 2-way SNP-SNP interactions. Via simulations, we assess the performance of different corrective measures for lower-order genetic effects in Model-Based Multifactor Dimensionality Reduction epistasis detection, using additive and co-dominant coding schemes. Performance is evaluated in terms of power and familywise error rate. Our simulations indicate that empirical power estimates are reduced with correction of lower-order effects, likewise familywise error rates. Easy-to-use automatic SNP selection procedures, SNP selection based on “top” findings, or SNP selection based on p-value criterion for interesting main effects result in reduced power but also almost zero false positive rates. Always accounting for main effects in the SNP-SNP pair under investigation during Model-Based Multifactor Dimensionality Reduction analysis adequately controls false positive epistasis findings. This is particularly true when adopting a co-dominant corrective coding scheme. In conclusion, automatic search procedures to identify lower-order effects to correct for during epistasis screening should be avoided. The same is true for procedures that adjust for lower-order effects prior to Model-Based Multifactor Dimensionality Reduction and involve using residuals as the new trait. We advocate using “on-the-fly” lower-order effects adjusting when screening for SNP-SNP interactions using Model-Based Multifactor Dimensionality Reduction analysis.

Collapse