Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Yu W, Lee S, Park T. A unified model based multifactor dimensionality reduction framework for detecting gene-gene interactions. Bioinformatics 2017;32:i605-i610. [PMID: 27587680 DOI: 10.1093/bioinformatics/btw424] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

For:	Yu W, Lee S, Park T. A unified model based multifactor dimensionality reduction framework for detecting gene-gene interactions. Bioinformatics 2017;32:i605-i610. [PMID: 27587680 DOI: 10.1093/bioinformatics/btw424] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Number

Cited by Other Article(s)

Hoffmann M, Poschenrieder JM, Incudini M, Baier S, Fitz A, Maier A, Hartung M, Hoffmann C, Trummer N, Adamowicz K, Picciani M, Scheibling E, Harl MV, Lesch I, Frey H, Kayser S, Wissenberg P, Schwartz L, Hafner L, Acharya A, Hackl L, Grabert G, Lee SG, Cho G, Cloward M, Jankowski J, Lee HK, Tsoy O, Wenke N, Pedersen AG, Bønnelykke K, Mandarino A, Melograna F, Schulz L, Climente-González H, Wilhelm M, Iapichino L, Wienbrandt L, Ellinghaus D, Van Steen K, Grossi M, Furth PA, Hennighausen L, Di Pierro A, Baumbach J, Kacprowski T, List M, Blumenthal DB. Network medicine-based epistasis detection in complex diseases: ready for quantum computing. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.11.07.23298205. [PMID: 38076997 PMCID: PMC10705612 DOI: 10.1101/2023.11.07.23298205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]

Affiliation(s)

Markus Hoffmann Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany Institute for Advanced Study (Lichtenbergstrasse 2 a, D-85748 Garching, Germany), Technical University of Munich, Germany National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
Julian M. Poschenrieder Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany Institute for Computational Systems Biology, University of Hamburg, Germany
Massimiliano Incudini Dipartimento di Informatica, Universit’a di Verona, Strada le Grazie 15 - 34137, Verona, Italy
Sylvie Baier Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
Amelie Fitz Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, DTU, 2800 Kgs. Lyngby, Denmark Copenhagen Prospective Studies on Asthma in Childhood (COPSAC), Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
Andreas Maier Institute for Computational Systems Biology, University of Hamburg, Germany
Michael Hartung Institute for Computational Systems Biology, University of Hamburg, Germany
Christian Hoffmann Institute for Computational Systems Biology, University of Hamburg, Germany
Nico Trummer Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
Klaudia Adamowicz Institute for Computational Systems Biology, University of Hamburg, Germany
Mario Picciani Computational Mass Spectrometry, Technical University of Munich, Freising, Germany
Evelyn Scheibling Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
Maximilian V. Harl Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
Ingmar Lesch Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
Hunor Frey Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
Simon Kayser Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
Paul Wissenberg Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
Leon Schwartz Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
Leon Hafner Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany Institute for Advanced Study (Lichtenbergstrasse 2 a, D-85748 Garching, Germany), Technical University of Munich, Germany
Aakriti Acharya Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics, Technische Universität Braunschweig and Hannover Medical School, Rebenring 56, 38106 Braunschweig, Germany Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106 Braunschweig, Braunschweig, Germany
Lena Hackl Institute for Computational Systems Biology, University of Hamburg, Germany
Gordon Grabert Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics, Technische Universität Braunschweig and Hannover Medical School, Rebenring 56, 38106 Braunschweig, Germany Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106 Braunschweig, Braunschweig, Germany
Sung-Gwon Lee National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America School of Biological Sciences and Technology, Chonnam National University, Gwangju, Korea
Gyuhyeok Cho Department of Chemistry, Gwangju Institute of Science and Technology, Gwangju, Korea
Matthew Cloward Department of Biology, Brigham Young University, Provo, UT, USA
Jakub Jankowski National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
Hye Kyung Lee National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
Olga Tsoy Institute for Computational Systems Biology, University of Hamburg, Germany
Nina Wenke Institute for Computational Systems Biology, University of Hamburg, Germany
Anders Gorm Pedersen Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, DTU, 2800 Kgs. Lyngby, Denmark
Klaus Bønnelykke Copenhagen Prospective Studies on Asthma in Childhood (COPSAC), Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
Antonio Mandarino International Centre for Theory of Quantum Technologies, University of Gdańsk, 80-309 Gdańsk, Poland
Federico Melograna BIO3 - Systems Genetics; GIGA-R Medical Genomics, University of Liège, Liège, Belgium BIO3 - Systems Medicine; Department of Human Genetics, KU Leuven, Leuven, Belgium
Laura Schulz Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities (LRZ), Garching b. München, Germany
Héctor Climente-González RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
Mathias Wilhelm Computational Mass Spectrometry, Technical University of Munich, Freising, Germany
Luigi Iapichino Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities (LRZ), Garching b. München, Germany
Lars Wienbrandt Institute of Clinical Molecular Biology, Christian Albrechts University of Kiel, Kiel, Germany
David Ellinghaus Institute of Clinical Molecular Biology, Christian Albrechts University of Kiel, Kiel, Germany
Kristel Van Steen BIO3 - Systems Genetics; GIGA-R Medical Genomics, University of Liège, Liège, Belgium BIO3 - Systems Medicine; Department of Human Genetics, KU Leuven, Leuven, Belgium
Michele Grossi European Organization for Nuclear Research (CERN), Geneva 1211, Switzerland
Priscilla A. Furth National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America Departments of Oncology & Medicine, Georgetown University, Washington, DC, USA
Lothar Hennighausen Institute for Advanced Study (Lichtenbergstrasse 2 a, D-85748 Garching, Germany), Technical University of Munich, Germany National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
Alessandra Di Pierro Dipartimento di Informatica, Universit’a di Verona, Strada le Grazie 15 - 34137, Verona, Italy
Jan Baumbach Institute for Computational Systems Biology, University of Hamburg, Germany Computational BioMedicine Lab, University of Southern Denmark, Denmark
Tim Kacprowski Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics, Technische Universität Braunschweig and Hannover Medical School, Rebenring 56, 38106 Braunschweig, Germany Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106 Braunschweig, Braunschweig, Germany
Markus List Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
David B. Blumenthal Department Artificial Intelligence in Biomedical Engineering (AIBE), Friedrich-Alexander University Erlangen-Nürnberg (FAU), Erlangen, Germany

Collapse

Yang X, Yang C, Lei J, Liu J. An Approach of Epistasis Detection Using Integer Linear Programming Optimizing Bayesian Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:2654-2671. [PMID: 34181547 DOI: 10.1109/tcbb.2021.3092719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Sakai T, Abe A, Shimizu M, Terauchi R. RIL-StEp: epistasis analysis of rice recombinant inbred lines reveals candidate interacting genes that control seed hull color and leaf chlorophyll content. G3 (BETHESDA, MD.) 2021;11:jkab130. [PMID: 33871605 PMCID: PMC8496299 DOI: 10.1093/g3journal/jkab130] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 04/10/2021] [Indexed: 11/19/2022]

Wang L, Wang Y, Fu Y, Gao Y, Du J, Yang C, Liu J. AFSBN: A Method of Artificial Fish Swarm Optimizing Bayesian Network for Epistasis Detection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:1369-1383. [PMID: 31670676 DOI: 10.1109/tcbb.2019.2949780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Lee JW, Lee S. A comparative study on the unified model based multifactor dimensionality reduction methods for identifying gene-gene interactions associated with the survival phenotype. BioData Min 2021;14:17. [PMID: 33648540 PMCID: PMC7923479 DOI: 10.1186/s13040-021-00248-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Accepted: 02/11/2021] [Indexed: 12/04/2022] Open

Abstract

Background

For gene-gene interaction analysis, the multifactor dimensionality reduction (MDR) method has been widely employed to reduce multi-levels of gene-gene interactions into high- or low-risk groups using a binary attribute. For the survival phenotype, the Cox-MDR method has been proposed using a martingale residual of a Cox model since Surv-MDR was first proposed using a log-rank test statistic. Recently, the KM-MDR method was proposed using the Kaplan-Meier median survival time as a classifier. All three methods used the cross-validation procedure to identify single nucleotide polymorphism (SNP) using SNP interactions among all possible SNP pairs. Furthermore, these methods require the permutation test to verify the significance of the selected SNP pairs. However, the unified model-based multifactor dimensionality reduction method (UM-MDR) overcomes this shortcoming of MDR by unifying the significance testing with the MDR algorithm within the framework of the regression model. Neither cross-validation nor permutation testing is required to identify SNP by SNP interactions in the UM-MDR method. The UM-MDR method comprises two steps: in the first step, multi-level genotypes are classified into high- or low-risk groups, and an indicator variable for the high-risk group is defined. In the second step, the significance of the indicator variable of the high-risk group is tested in the regression model included with other adjusting covariates. The Cox-UMMDR method was recently proposed by combining Cox-MDR with UM-MDR to identify gene-gene interactions associated with the survival phenotype. In this study, we propose two simple methods either by combining KM-MDR with UM-MDR, called KM-UMMDR or by modifying Cox-UMMDR by adjusting for the covariate effect in step 1, rather than in step 2, a process called Cox2-UMMDR. The KM-UMMDR method allows the covariate effect to be adjusted for in the regression model of step 2, although KM-MDR cannot adjust for the covariate effect in the classification procedure of step 1. In contrast, Cox2-UMMDR differs from Cox-UMMDR in the sense that the martingale residuals are obtained from a Cox model by adjusting for the covariate effect in step 1 of Cox2-UMMDR whereas Cox-UMMDR adjusts for the covariate effect in the regression model in step 2. We performed simulation studies to compare the power of several methods such as KM-UMMDR, Cox-UMMDR, Cox2-UMMDR, Cox-MDR, and KM-MDR by considering the effect of covariates and the marginal effect of SNPs. We also analyzed a real example of Korean leukemia patient data for illustration and a short discussion is provided.

Results

In the simulation study, two different scenarios are considered: the first scenario compares the power of the cases with and without the covariate effect. The second scenario is to compare the power of cases with the main effect of SNPs versus without the main effect of SNPs. From the simulation results, Cox-UMMDR performs the best across all scenarios among KM-UMMDR, Cox2-UMMDR, Cox-MDR and KM-MDR. As expected, both Cox-UMMDR and Cox-MDR perform better than KM-UMMDR and KM-MDR when a covariate effect exists because the former adjusts for the covariate effect but the latter cannot. However, Cox2-UMMDR behaves similarly to KM-UMMDR and KM-MDR even though there is a covariate effect. This implies that the covariate effect would be more efficiently adjusted for in the regression model of the second step rather than under the classification procedure of the first step. When there is a main effect of any SNP, Cox-UMMDR, Cox2-UMMDR and KM-UMMDR perform better than Cox-MDR and KM-MDR if the main effects of SNPs are properly adjusted for in the regression model. From the simulation results of two different scenarios, Cox-UMMDR seems to be the most robust when there is either any covariate effect adjusting for or any SNP that has a main effect on the survival phenotype. In addition, the power of all methods decreased as the censoring fraction increased from 0.1 to 0.3, as heritability increased. The power of all methods seems to be greater under MAF = 0.2 than under MAF = 0.4. For illustration, both KM-UMMDR and Cox2-UMMDR were applied to identify SNP by SNP interactions with the survival phenotype to a real dataset of Korean leukemia patients.

Conclusion

Both KM-UMMDR and Cox2-UMMDR were easily implemented by combining KM-MDR and Cox-MDR with UM-MDR, respectively, to detect significant gene-gene interactions associated with survival time without cross-validation and permutation testing. The simulation results demonstrate the utility of KM-UMMDR, Cox2-UMMDR and Cox-UMMDR, which outperforms Cox-MDR and KM-MDR when some SNPs with only marginal effects might mask the detection of causal epistasis. In addition, Cox-UMMDR, Cox2-UMMDR and Cox-MDR performed better than KM-UMMDR and KM-MDR when there were potentially confounding covariate effects.

Collapse

Epi-GTBN: an approach of epistasis mining based on genetic Tabu algorithm and Bayesian network. BMC Bioinformatics 2019;20:444. [PMID: 31455207 PMCID: PMC6712799 DOI: 10.1186/s12859-019-3022-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Accepted: 08/07/2019] [Indexed: 12/31/2022] Open

Abstract

Background

Mining epistatic loci which affects specific phenotypic traits is an important research issue in the field of biology. Bayesian network (BN) is a graphical model which can express the relationship between genetic loci and phenotype. Until now, it has been widely used into epistasis mining in many research work. However, this method has two disadvantages: low learning efficiency and easy to fall into local optimum. Genetic algorithm has the excellence of rapid global search and avoiding falling into local optimum. It is scalable and easy to integrate with other algorithms. This work proposes an epistasis mining approach based on genetic tabu algorithm and Bayesian network (Epi-GTBN). It uses genetic algorithm into the heuristic search strategy of Bayesian network. The individual structure can be evolved through the genetic operations of selection, crossover and mutation. It can help to find the optimal network structure, and then further to mine the epistasis loci effectively. In order to enhance the diversity of the population and obtain a more effective global optimal solution, we use the tabu search strategy into the operations of crossover and mutation in genetic algorithm. It can help to accelerate the convergence of the algorithm.

Results

We compared Epi-GTBN with other recent algorithms using both simulated and real datasets. The experimental results demonstrate that our method has much better epistasis detection accuracy in the case of not affecting the efficiency for different datasets.

Conclusions

The presented methodology (Epi-GTBN) is an effective method for epistasis detection, and it can be seen as an interesting addition to the arsenal used in complex traits analyses.

Electronic supplementary material

The online version of this article (10.1186/s12859-019-3022-z) contains supplementary material, which is available to authorized users.

Collapse

Yang CH, Chuang LY, Lin YD. Multiobjective multifactor dimensionality reduction to detect SNP-SNP interactions. Bioinformatics 2019;34:2228-2236. [PMID: 29471406 DOI: 10.1093/bioinformatics/bty076] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2017] [Accepted: 02/16/2018] [Indexed: 11/12/2022] Open

Guan B, Zhao Y. Self-Adjusting Ant Colony Optimization Based on Information Entropy for Detecting Epistatic Interactions. Genes (Basel) 2019;10:genes10020114. [PMID: 30717303 PMCID: PMC6409693 DOI: 10.3390/genes10020114] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 01/21/2019] [Accepted: 01/28/2019] [Indexed: 12/15/2022] Open

Leem S, Park T. EFMDR-Fast: An Application of Empirical Fuzzy Multifactor Dimensionality Reduction for Fast Execution. Genomics Inform 2019;16:e37. [PMID: 30602098 PMCID: PMC6440656 DOI: 10.5808/gi.2018.16.4.e37] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2018] [Accepted: 12/16/2018] [Indexed: 12/04/2022] Open

Lee S, Son D, Kim Y, Yu W, Park T. Unified Cox model based multifactor dimensionality reduction method for gene-gene interaction analysis of the survival phenotype. BioData Min 2018;11:27. [PMID: 30564286 PMCID: PMC6295107 DOI: 10.1186/s13040-018-0189-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Accepted: 11/26/2018] [Indexed: 12/04/2022] Open

Abstract

Background

One strategy for addressing missing heritability in genome-wide association study is gene-gene interaction analysis, which, unlike a single gene approach, involves high-dimensionality. The multifactor dimensionality reduction method (MDR) has been widely applied to reduce multi-levels of genotypes into high or low risk groups. The Cox-MDR method has been proposed to detect gene-gene interactions associated with the survival phenotype by using the martingale residuals from a Cox model. However, this method requires a cross-validation procedure to find the best SNP pair among all possible pairs and the permutation procedure should be followed for the significance of gene-gene interactions. Recently, the unified model based multifactor dimensionality reduction method (UM-MDR) has been proposed to unify the significance testing with the MDR algorithm within the regression model framework, in which neither cross-validation nor permutation testing are needed. In this paper, we proposed a simple approach, called Cox UM-MDR, which combines Cox-MDR with the key procedure of UM-MDR to identify gene-gene interactions associated with the survival phenotype.

Results

The simulation study was performed to compare Cox UM-MDR with Cox-MDR with and without the marginal effects of SNPs. We found that Cox UM-MDR has similar power to Cox-MDR without marginal effects, whereas it outperforms Cox-MDR with marginal effects and more robust to heavy censoring. We also applied Cox UM-MDR to a dataset of leukemia patients and detected gene-gene interactions with regard to the survival time.

Conclusion

Cox UM-MDR is easily implemented by combining Cox-MDR with UM-MDR to detect the significant gene-gene interactions associated with the survival time without cross-validation and permutation testing. The simulation results are shown to demonstrate the utility of the proposed method, which achieves at least the same power as Cox-MDR in most scenarios, and outperforms Cox-MDR when some SNPs having only marginal effects might mask the detection of the causal epistasis.

Collapse

Guan B, Zhao Y, Sun W. Ant colony optimization with an automatic adjustment mechanism for detecting epistatic interactions. Comput Biol Chem 2018;77:354-362. [DOI: 10.1016/j.compbiolchem.2018.11.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Revised: 10/01/2018] [Accepted: 11/05/2018] [Indexed: 12/13/2022]

Choi S, Lee S, Kim Y, Hwang H, Park T. HisCoM-GGI: Hierarchical structural component analysis of gene-gene interactions. J Bioinform Comput Biol 2018;16:1840026. [PMID: 30567476 DOI: 10.1142/s0219720018400267] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Abstract

Although genome-wide association studies (GWAS) have successfully identified thousands of single nucleotide polymorphisms (SNPs) associated with common diseases, these observations are limited for fully explaining "missing heritability". Determining gene-gene interactions (GGI) are one possible avenue for addressing the missing heritability problem. While many statistical approaches have been proposed to detect GGI, most of these focus primarily on SNP-to-SNP interactions. While there are many advantages of gene-based GGI analyses, such as reducing the burden of multiple-testing correction, and increasing power by aggregating multiple causal signals across SNPs in specific genes, only a few methods are available. In this study, we proposed a new statistical approach for gene-based GGI analysis, "Hierarchical structural CoMponent analysis of Gene-Gene Interactions" (HisCoM-GGI). HisCoM-GGI is based on generalized structured component analysis, and can consider hierarchical structural relationships between genes and SNPs. For a pair of genes, HisCoM-GGI first effectively summarizes all possible pairwise SNP-SNP interactions into a latent variable, from which it then performs GGI analysis. HisCoM-GGI can evaluate both gene-level and SNP-level interactions. Through simulation studies, HisCoM-GGI demonstrated higher statistical power than existing gene-based GGI methods, in analyzing a GWAS of a Korean population for identifying GGI associated with body mass index. Resultantly, HisCoM-GGI successfully identified 14 potential GGI, two of which, (NCOR2 <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mo>×</mml:mo></mml:math> SPOCK1) and (LINGO2 <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mo>×</mml:mo></mml:math> ZNF385D) were successfully replicated in independent datasets. We conclude that HisCoM-GGI method may be a valuable tool for genome to identify GGI in missing heritability, allowing us to better understand the biological genetic mechanisms of complex traits. We conclude that HisCoM-GGI method may be a valuable tool for genome to identify GGI in missing heritability, allowing us to better understand biological genetic mechanisms of complex traits. An implementation of HisCoM-GGI can be downloaded from the website ( http://statgen.snu.ac.kr/software/hiscom-ggi ).

Collapse

TrioMDR: Detecting SNP interactions in trio families with model-based multifactor dimensionality reduction. Genomics 2018;111:1176-1182. [PMID: 30055230 DOI: 10.1016/j.ygeno.2018.07.014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Revised: 07/11/2018] [Accepted: 07/15/2018] [Indexed: 12/18/2022]

Yang CH, Lin YD, Chuang LY. Multiple-Criteria Decision Analysis-Based Multifactor Dimensionality Reduction for Detecting Gene-Gene Interactions. IEEE J Biomed Health Inform 2018;23:416-426. [PMID: 29993963 DOI: 10.1109/jbhi.2018.2790951] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Verma SS, Lucas A, Zhang X, Veturi Y, Dudek S, Li B, Li R, Urbanowicz R, Moore JH, Kim D, Ritchie MD. Collective feature selection to identify crucial epistatic variants. BioData Min 2018;11:5. [PMID: 29713383 PMCID: PMC5907720 DOI: 10.1186/s13040-018-0168-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Accepted: 04/04/2018] [Indexed: 01/17/2023] Open

Abstract

Background

Machine learning methods have gained popularity and practicality in identifying linear and non-linear effects of variants associated with complex disease/traits. Detection of epistatic interactions still remains a challenge due to the large number of features and relatively small sample size as input, thus leading to the so-called "short fat data" problem. The efficiency of machine learning methods can be increased by limiting the number of input features. Thus, it is very important to perform variable selection before searching for epistasis. Many methods have been evaluated and proposed to perform feature selection, but no single method works best in all scenarios. We demonstrate this by conducting two separate simulation analyses to evaluate the proposed collective feature selection approach.

Results

Through our simulation study we propose a collective feature selection approach to select features that are in the "union" of the best performing methods. We explored various parametric, non-parametric, and data mining approaches to perform feature selection. We choose our top performing methods to select the union of the resulting variables based on a user-defined percentage of variants selected from each method to take to downstream analysis. Our simulation analysis shows that non-parametric data mining approaches, such as MDR, may work best under one simulation criteria for the high effect size (penetrance) datasets, while non-parametric methods designed for feature selection, such as Ranger and Gradient boosting, work best under other simulation criteria. Thus, using a collective approach proves to be more beneficial for selecting variables with epistatic effects also in low effect size datasets and different genetic architectures. Following this, we applied our proposed collective feature selection approach to select the top 1% of variables to identify potential interacting variables associated with Body Mass Index (BMI) in ~ 44,000 samples obtained from Geisinger's MyCode Community Health Initiative (on behalf of DiscovEHR collaboration).

Conclusions

In this study, we were able to show that selecting variables using a collective feature selection approach could help in selecting true positive epistatic variables more frequently than applying any single method for feature selection via simulation studies. We were able to demonstrate the effectiveness of collective feature selection along with a comparison of many methods in our simulation analysis. We also applied our method to identify non-linear networks associated with obesity.

Collapse

Affiliation(s)

Shefali S Verma 1Biomedical and Translational Bioinformatics Institute, Geisinger Health System, 100 N Academy Avenue, Danville, PA 17822 USA.,2Huck Institute of Life Sciences, The Pennsylvania State University, University Park, PA USA.,3Institute for Biomedical Informatics, University of Pennsylvania, Perelman School of Medicine, Richards Building, 3700 Hamilton Walk, Philadelphia, PA 19104 USA
Anastasia Lucas 1Biomedical and Translational Bioinformatics Institute, Geisinger Health System, 100 N Academy Avenue, Danville, PA 17822 USA.,3Institute for Biomedical Informatics, University of Pennsylvania, Perelman School of Medicine, Richards Building, 3700 Hamilton Walk, Philadelphia, PA 19104 USA
Xinyuan Zhang 2Huck Institute of Life Sciences, The Pennsylvania State University, University Park, PA USA.,3Institute for Biomedical Informatics, University of Pennsylvania, Perelman School of Medicine, Richards Building, 3700 Hamilton Walk, Philadelphia, PA 19104 USA
Yogasudha Veturi 1Biomedical and Translational Bioinformatics Institute, Geisinger Health System, 100 N Academy Avenue, Danville, PA 17822 USA.,3Institute for Biomedical Informatics, University of Pennsylvania, Perelman School of Medicine, Richards Building, 3700 Hamilton Walk, Philadelphia, PA 19104 USA
Scott Dudek 1Biomedical and Translational Bioinformatics Institute, Geisinger Health System, 100 N Academy Avenue, Danville, PA 17822 USA.,3Institute for Biomedical Informatics, University of Pennsylvania, Perelman School of Medicine, Richards Building, 3700 Hamilton Walk, Philadelphia, PA 19104 USA
Binglan Li 2Huck Institute of Life Sciences, The Pennsylvania State University, University Park, PA USA.,3Institute for Biomedical Informatics, University of Pennsylvania, Perelman School of Medicine, Richards Building, 3700 Hamilton Walk, Philadelphia, PA 19104 USA
Ruowang Li 3Institute for Biomedical Informatics, University of Pennsylvania, Perelman School of Medicine, Richards Building, 3700 Hamilton Walk, Philadelphia, PA 19104 USA
Ryan Urbanowicz 3Institute for Biomedical Informatics, University of Pennsylvania, Perelman School of Medicine, Richards Building, 3700 Hamilton Walk, Philadelphia, PA 19104 USA
Jason H Moore 3Institute for Biomedical Informatics, University of Pennsylvania, Perelman School of Medicine, Richards Building, 3700 Hamilton Walk, Philadelphia, PA 19104 USA
Dokyoon Kim 1Biomedical and Translational Bioinformatics Institute, Geisinger Health System, 100 N Academy Avenue, Danville, PA 17822 USA
Marylyn D Ritchie 1Biomedical and Translational Bioinformatics Institute, Geisinger Health System, 100 N Academy Avenue, Danville, PA 17822 USA.,2Huck Institute of Life Sciences, The Pennsylvania State University, University Park, PA USA.,3Institute for Biomedical Informatics, University of Pennsylvania, Perelman School of Medicine, Richards Building, 3700 Hamilton Walk, Philadelphia, PA 19104 USA

Collapse

Leem S, Park T. An empirical fuzzy multifactor dimensionality reduction method for detecting gene-gene interactions. BMC Genomics 2017;18:115. [PMID: 28361694 PMCID: PMC5374597 DOI: 10.1186/s12864-017-3496-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open

Abstract

BACKGROUND

Detection of gene-gene interaction (GGI) is a key challenge towards solving the problem of missing heritability in genetics. The multifactor dimensionality reduction (MDR) method has been widely studied for detecting GGIs. MDR reduces the dimensionality of multi-factor by means of binary classification into high-risk (H) or low-risk (L) groups. Unfortunately, this simple binary classification does not reflect the uncertainty of H/L classification. Thus, we proposed Fuzzy MDR to overcome limitations of binary classification by introducing the degree of membership of two fuzzy sets H/L. While Fuzzy MDR demonstrated higher power than that of MDR, its performance is highly dependent on the several tuning parameters. In real applications, it is not easy to choose appropriate tuning parameter values.

RESULT

In this work, we propose an empirical fuzzy MDR (EF-MDR) which does not require specifying tuning parameters values. Here, we propose an empirical approach to estimating the membership degree that can be directly estimated from the data. In EF-MDR, the membership degree is estimated by the maximum likelihood estimator of the proportion of cases(controls) in each genotype combination. We also show that the balanced accuracy measure derived from this new membership function is a linear function of the standard chi-square statistics. This relationship allows us to perform the standard significance test using p-values in the MDR framework without permutation. Through two simulation studies, the power of the proposed EF-MDR is shown to be higher than those of MDR and Fuzzy MDR. We illustrate the proposed EF-MDR by analyzing Crohn's disease (CD) and bipolar disorder (BD) in the Wellcome Trust Case Control Consortium (WTCCC) dataset.

CONCLUSION

We propose an empirical Fuzzy MDR for detecting GGI using the maximum likelihood of the proportion of cases(controls) as the membership degree of the genotype combination. The program written in R for EF-MDR is available at http://statgen.snu.ac.kr/software/EF-MDR .

Collapse

Lee S, Son D, Yu W, Park T. Gene-Gene Interaction Analysis for the Accelerated Failure Time Model Using a Unified Model-Based Multifactor Dimensionality Reduction Method. Genomics Inform 2016;14:166-172. [PMID: 28154507 PMCID: PMC5287120 DOI: 10.5808/gi.2016.14.4.166] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2016] [Revised: 12/09/2016] [Accepted: 12/09/2016] [Indexed: 11/20/2022] Open