1
|
Zhang Q, Bhatia M, Park T, Ott J. A multi-threaded approach to genotype pattern mining for detecting digenic disease genes. Front Genet 2023; 14:1222517. [PMID: 37693313 PMCID: PMC10483394 DOI: 10.3389/fgene.2023.1222517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Accepted: 07/31/2023] [Indexed: 09/12/2023] Open
Abstract
To locate disease-causing DNA variants on the human gene map, the customary approach has been to carry out a genome-wide association study for one variant after another by testing for genotype frequency differences between individuals affected and unaffected with disease. So-called digenic traits are due to the combined effects of two variants, often on different chromosomes, while individual variants may have little or no effect on disease. Machine learning approaches have been developed to find variant pairs underlying digenic traits. However, many of these methods have large memory requirements so that only small datasets can be analyzed. The increasing availability of desktop computers with large numbers of processors and suitable programming to distribute the workload evenly over all processors in a machine make a new and relatively straightforward approach possible, that is, to evaluate all existing variant and genotype pairs for disease association. We present a prototype of such a method with two components, Vpairs and Gpairs, and demonstrate its advantages over existing implementations of such well-known algorithms as Apriori and FP-growth. We apply these methods to published case-control datasets on age-related macular degeneration and Parkinson disease and construct an ROC curve for a large set of genotype patterns.
Collapse
Affiliation(s)
- Qingrun Zhang
- Department of Mathematics and Statistics, University of Calgary, Calgary, AB, Canada
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, AB, Canada
| | - Muskan Bhatia
- Amity Institute of Biotechnology, Amity University Madhya Pradesh, Gwalior, India
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul, Republic of Korea
| | - Jurg Ott
- Laboratory of Statistical Genetics, Rockefeller University, New York, NY, United States
| |
Collapse
|
2
|
Tuo S, Li C, Liu F, Zhu Y, Chen T, Feng Z, Liu H, Li A. A Novel Multitasking Ant Colony Optimization Method for Detecting Multiorder SNP Interactions. Interdiscip Sci 2022; 14:814-832. [PMID: 35788965 DOI: 10.1007/s12539-022-00530-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 05/29/2022] [Accepted: 06/01/2022] [Indexed: 06/15/2023]
Abstract
MOTIVATION Linear or nonlinear interactions of multiple single-nucleotide polymorphisms (SNPs) play an important role in understanding the genetic basis of complex human diseases. However, combinatorial analytics in high-dimensional space makes it extremely challenging to detect multiorder SNP interactions. Most classic approaches can only perform one task (for detecting k-order SNP interactions) in each run. Since prior knowledge of a complex disease is usually not available, it is difficult to determine the value of k for detecting k-order SNP interactions. METHODS A novel multitasking ant colony optimization algorithm (named MTACO-DMSI) is proposed to detect multiorder SNP interactions, and it is divided into two stages: searching and testing. In the searching stage, multiple multiorder SNP interaction detection tasks (from 2nd-order to kth-order) are executed in parallel, and two subpopulations that separately adopt the Bayesian network-based K2-score and Jensen-Shannon divergence (JS-score) as evaluation criteria are generated for each task to improve the global search capability and the discrimination ability for various disease models. In the testing stage, the G test statistical test is adopted to further verify the authenticity of candidate solutions to reduce the error rate. RESULT Three multiorder simulated disease models with different interaction effects and three real age-related macular degeneration (AMD), rheumatoid arthritis (RA) and type 1 diabetes (T1D) datasets were used to investigate the performance of the proposed MTACO-DMSI. The experimental results show that the MTACO-DMSI has a faster search speed and higher discriminatory power for diverse simulation disease models than traditional single-task algorithms. The results on real AMD data and RA and T1D datasets indicate that MTACO-DMSI has the ability to detect multiorder SNP interactions at a genome-wide scale. Availability and implementation: https://github.com/shouhengtuo/MTACO-DMSI/.
Collapse
Affiliation(s)
- Shouheng Tuo
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China.
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China.
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China.
| | - Chao Li
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - Fan Liu
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - YanLing Zhu
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - TianRui Chen
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - ZengYu Feng
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - Haiyan Liu
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - Aimin Li
- School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, 710048, Shaanxi, China
| |
Collapse
|
3
|
Multi-Objective Artificial Bee Colony Algorithm Based on Scale-Free Network for Epistasis Detection. Genes (Basel) 2022; 13:genes13050871. [PMID: 35627256 PMCID: PMC9140669 DOI: 10.3390/genes13050871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2022] [Revised: 04/30/2022] [Accepted: 05/10/2022] [Indexed: 12/04/2022] Open
Abstract
In genome-wide association studies, epistasis detection is of great significance for the occurrence and diagnosis of complex human diseases, but it also faces challenges such as high dimensionality and a small data sample size. In order to cope with these challenges, several swarm intelligence methods have been introduced to identify epistasis in recent years. However, the existing methods still have some limitations, such as high-consumption and premature convergence. In this study, we proposed a multi-objective artificial bee colony (ABC) algorithm based on the scale-free network (SFMOABC). The SFMOABC incorporates the scale-free network into the ABC algorithm to guide the update and selection of solutions. In addition, the SFMOABC uses mutual information and the K2-Score of the Bayesian network as objective functions, and the opposition-based learning strategy is used to improve the search ability. Experiments were performed on both simulation datasets and a real dataset of age-related macular degeneration (AMD). The results of the simulation experiments showed that the SFMOABC has better detection power and efficiency than seven other epistasis detection methods. In the real AMD data experiment, most of the single nucleotide polymorphism combinations detected by the SFMOABC have been shown to be associated with AMD disease. Therefore, SFMOABC is a promising method for epistasis detection.
Collapse
|
4
|
Ponte-Fernandez C, Gonzalez-Dominguez J, Carvajal-Rodriguez A, Martin MJ. Evaluation of Existing Methods for High-Order Epistasis Detection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:912-926. [PMID: 33055017 DOI: 10.1109/tcbb.2020.3030312] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Finding epistatic interactions among loci when expressing a phenotype is a widely employed strategy to understand the genetic architecture of complex traits in GWAS. The abundance of methods dedicated to the same purpose, however, makes it increasingly difficult for scientists to decide which method is more suitable for their studies. This work compares the different epistasis detection methods published during the last decade in terms of runtime, detection power and type I error rate, with a special emphasis on high-order interactions. Results show that in terms of detection power, the only methods that perform well across all experiments are the exhaustive methods, although their computational cost may be prohibitive in large-scale studies. Regarding non-exhaustive methods, not one could consistently find epistasis interactions when marginal effects are absent. If marginal effects are present, there are methods that perform well for high-order interactions, such as BADTrees, FDHE-IW, SingleMI or SNPHarvester. As for false-positive control, only SNPHarvester, FDHE-IW and DCHE show good results. The study concludes that there is no single epistasis detection method to recommend in all scenarios. Authors should prioritize exhaustive methods when sufficient computational resources are available considering the data set size, and resort to non-exhaustive methods when the analysis time is prohibitive.
Collapse
|
5
|
Guan B, Zhao Y, Yin Y, Li Y. Detecting Disease-Associated SNP-SNP Interactions Using Progressive Screening Memetic Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:878-887. [PMID: 32857698 DOI: 10.1109/tcbb.2020.3019256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Hundreds of thousands of single nucleotide polymorphisms (SNPs)are currently available for genome-wide association study (GWAS). Detecting disease-associated SNP-SNP interactions is considered an important way to capture the underlying genetic causes of complex diseases. In the combinatorially explosive search space, evolutionary algorithms are promising in solving this difficult problem because of their controllable time complexity. However, in existing evolutionary algorithms, some possible SNP-SNP interactions are evaluated multiple times by the fitness function. Such reevaluations not only waste computing resources but also make these algorithms easy to fall into local optima. To tackle this drawback, a progressive screening memetic algorithm (PSMA)is proposed in the paper. PSMA first represents all possible SNP-SNP interactions in a constructed graph. Then, the proposed algorithm uses the progressive screening strategy to guarantee that every possible SNP-SNP interaction can only be evaluated once by reducing the constructed graph. Furthermore, two types of local search algorithms are introduced to enhance the detecting power of PSMA. For detecting disease-associated SNP-SNP interactions, experimental results show that our proposed method outperforms other existing state-of-the-art methods in terms of accuracy and time.
Collapse
|
6
|
Evaluating the detection ability of a range of epistasis detection methods on simulated data for pure and impure epistatic models. PLoS One 2022; 17:e0263390. [PMID: 35180244 PMCID: PMC8856572 DOI: 10.1371/journal.pone.0263390] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 01/18/2022] [Indexed: 11/19/2022] Open
Abstract
Background Numerous approaches have been proposed for the detection of epistatic interactions within GWAS datasets in order to better understand the drivers of disease and genetics. Methods A selection of state-of-the-art approaches were assessed. These included the statistical tests, fast-epistasis, BOOST, logistic regression and wtest; swarm intelligence methods, namely AntEpiSeeker, epiACO and CINOEDV; and data mining approaches, including MDR, GSS, SNPRuler and MPI3SNP. Data were simulated to provide randomly generated models with no individual main effects at different heritabilities (pure epistasis) as well as models based on penetrance tables with some main effects (impure epistasis). Detection of both two and three locus interactions were assessed across a total of 1,560 simulated datasets. The different methods were also applied to a section of the UK biobank cohort for Atrial Fibrillation. Results For pure, two locus interactions, PLINK’s implementation of BOOST recovered the highest number of correct interactions, with 53.9% and significantly better performing than the other methods (p = 4.52e − 36). For impure two locus interactions, MDR exhibited the best performance, recovering 62.2% of the most significant impure epistatic interactions (p = 6.31e − 90 for all but one test). The assessment of three locus interaction prediction revealed that wtest recovered the highest number (17.2%) of pure epistatic interactions(p = 8.49e − 14). wtest also recovered the highest number of three locus impure epistatic interactions (p = 6.76e − 48) while AntEpiSeeker ranked as the most significant the highest number of such interactions (40.5%). Finally, when applied to a real dataset for Atrial Fibrillation, most notably finding an interaction between SYNE2 and DTNB.
Collapse
|
7
|
Bayat A, Hosking B, Jain Y, Hosking C, Kodikara M, Reti D, Twine NA, Bauer DC. Fast and accurate exhaustive higher-order epistasis search with BitEpi. Sci Rep 2021; 11:15923. [PMID: 34354094 PMCID: PMC8342486 DOI: 10.1038/s41598-021-94959-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Accepted: 07/20/2021] [Indexed: 01/03/2023] Open
Abstract
Complex genetic diseases may be modulated by a large number of epistatic interactions affecting a polygenic phenotype. Identifying these interactions is difficult due to computational complexity, especially in the case of higher-order interactions where more than two genomic variants are involved. In this paper, we present BitEpi, a fast and accurate method to test all possible combinations of up to four bi-allelic variants (i.e. Single Nucleotide Variant or SNV for short). BitEpi introduces a novel bitwise algorithm that is 1.7 and 56 times faster for 3-SNV and 4-SNV search, than established software. The novel entropy statistic used in BitEpi is 44% more accurate to identify interactive SNVs, incorporating a p-value-based significance testing. We demonstrate BitEpi on real world data of 4900 samples and 87,000 SNPs. We also present EpiExplorer to visualize the potentially large number of individual and interacting SNVs in an interactive Cytoscape graph. EpiExplorer uses various visual elements to facilitate the discovery of true biological events in a complex polygenic environment.
Collapse
Affiliation(s)
- Arash Bayat
- Transformations Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, 2113, Australia.,The Kinghorn Cancer Centre, Darlinghurst, NSW, 2010, Australia
| | - Brendan Hosking
- Transformations Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, 2113, Australia
| | - Yatish Jain
- Transformations Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, 2113, Australia.,Department of Biomedical Sciences, Macquarie University, Macquarie Park, NSW, 2113, Australia
| | - Cameron Hosking
- Transformations Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, 2113, Australia
| | - Milindi Kodikara
- Transformations Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, 2113, Australia
| | - Daniel Reti
- Transformations Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, 2113, Australia.,Applied BioSciences, Faculty of Science and Engineering, Macquarie University, Macquarie Park, NSW, 2113, Australia
| | - Natalie A Twine
- Transformations Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, 2113, Australia.,Applied BioSciences, Faculty of Science and Engineering, Macquarie University, Macquarie Park, NSW, 2113, Australia
| | - Denis C Bauer
- Transformations Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, 2113, Australia. .,Department of Biomedical Sciences, Macquarie University, Macquarie Park, NSW, 2113, Australia. .,Applied BioSciences, Faculty of Science and Engineering, Macquarie University, Macquarie Park, NSW, 2113, Australia.
| |
Collapse
|
8
|
A differential evolution based feature combination selection algorithm for high-dimensional data. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.08.081] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
9
|
Zhou X, Chan KCC, Huang Z, Wang J. Determining dependency and redundancy for identifying gene-gene interaction associated with complex disease. J Bioinform Comput Biol 2020; 18:2050035. [PMID: 33064052 DOI: 10.1142/s0219720020500353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
As interactions among genetic variants in different genes can be an important factor for predicting complex diseases, many computational methods have been proposed to detect if a particular set of genes has interaction with a particular complex disease. However, even though many such methods have been shown to be useful, they can be made more effective if the properties of gene-gene interactions can be better understood. Towards this goal, we have attempted to uncover patterns in gene-gene interactions and the patterns reveal an interesting property that can be reflected in an inequality that describes the relationship between two genotype variables and a disease-status variable. We show, in this paper, that this inequality can be generalized to [Formula: see text] genotype variables. Based on this inequality, we establish a conditional independence and redundancy (CIR)-based definition of gene-gene interaction and the concept of an interaction group. From these new definitions, a novel measure of gene-gene interaction is then derived. We discuss the properties of these concepts and explain how they can be used in a novel algorithm to detect high-order gene-gene interactions. Experimental results using both simulated and real datasets show that the proposed method can be very promising.
Collapse
Affiliation(s)
- Xiangdong Zhou
- College of Mathematics and Computer Science, Fuzhou University Fuzhou, Fujian 350108, P. R. China
| | - Keith C C Chan
- Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong, P. R. China
| | - Zhihua Huang
- College of Mathematics and Computer Science, Fuzhou University Fuzhou, Fujian 350108, P. R. China
| | - Jingbin Wang
- College of Mathematics and Computer Science, Fuzhou University Fuzhou, Fujian 350108, P. R. China
| |
Collapse
|
10
|
Sun Y, Wang X, Shang J, Liu JX, Zheng CH, Lei X. Introducing Heuristic Information Into Ant Colony Optimization Algorithm for Identifying Epistasis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1253-1261. [PMID: 30403637 DOI: 10.1109/tcbb.2018.2879673] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Epistasis learning, which is aimed at detecting associations between multiple Single Nucleotide Polymorphisms (SNPs) and complex diseases, has gained increasing attention in genome wide association studies. Although much work has been done on mapping the SNPs underlying complex diseases, there is still difficulty in detecting epistatic interactions due to the lack of heuristic information to expedite the search process. In this study, a method EACO is proposed to detect epistatic interactions based on the ant colony optimization (ACO) algorithm, the highlights of which are the introduced heuristic information, fitness function, and a candidate solutions filtration strategy. The heuristic information multi-SURF* is introduced into EACO for identifying epistasis, which is incorporated into ant-decision rules to guide the search with linear time. Two functionally complementary fitness functions, mutual information and the Gini index, are combined to effectively evaluate the associations between SNP combinations and the phenotype. Furthermore, a strategy for candidate solutions filtration is provided to adaptively retain all optimal solutions which yields a more accurate way for epistasis searching. Experiments of EACO, as well as three ACO based methods (AntEpiSeeker, MACOED, and epiACO) and four commonly used methods (BOOST, SNPRuler, TEAM, and epiMODE) are performed on both simulation data sets and a real data set of age-related macular degeneration. Results indicate that EACO is promising in identifying epistasis.
Collapse
|
11
|
Zhu X, Shang J, Sun Y, Li F, Liu JX, Yuan S. PSO-CFDP: A Particle Swarm Optimization-Based Automatic Density Peaks Clustering Method for Cancer Subtyping. Hum Hered 2019; 84:9-20. [PMID: 31412348 DOI: 10.1159/000501481] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2019] [Accepted: 06/13/2019] [Indexed: 12/27/2022] Open
Abstract
Cancer subtyping is of great importance for the prediction, diagnosis, and precise treatment of cancer patients. Many clustering methods have been proposed for cancer subtyping. In 2014, a clustering algorithm named Clustering by Fast Search and Find of Density Peaks (CFDP) was proposed and published in Science, which has been applied to cancer subtyping and achieved attractive results. However, CFDP requires to set two key parameters (cluster centers and cutoff distance) manually, while their optimal values are difficult to be determined. To overcome this limitation, an automatic clustering method named PSO-CFDP is proposed in this paper, in which cluster centers and cutoff distance are automatically determined by running an improved particle swarm optimization (PSO) algorithm multiple times. Experiments using PSO-CFDP, as well as LR-CFDP, STClu, CH-CCFDAC, and CFDP, were performed on four benchmark data-sets and two real cancer gene expression datasets. The results show that PSO-CFDP can determine cluster centers and cutoff distance automatically within controllable time/cost and, therefore, improve the accuracy of cancer subtyping.
Collapse
Affiliation(s)
- Xuhui Zhu
- School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| | - Junliang Shang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, China, .,School of Statistics, Qufu Normal University, Qufu, China,
| | - Yan Sun
- School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| | - Feng Li
- School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| | - Jin-Xing Liu
- School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| | - Shasha Yuan
- School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| |
Collapse
|
12
|
Ding Q, Shang J, Sun Y, Wang X, Liu JX. HC-HDSD: A method of hypergraph construction and high-density subgraph detection for inferring high-order epistatic interactions. Comput Biol Chem 2018; 78:440-447. [PMID: 30595466 DOI: 10.1016/j.compbiolchem.2018.11.031] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Accepted: 11/26/2018] [Indexed: 01/08/2023]
Abstract
Detecting epistatic interactions, or nonlinear interactive effects of Single Nucleotide Polymorphisms (SNPs), has gained increasing attention in explaining the "missing heritability" of complex diseases. Though much work has been done in mapping SNPs underlying diseases, most of them constrain to 2-order epistatic interactions. In this paper, a method of hypergraph construction and high-density subgraph detection, named HC-HDSD, is proposed for detecting high-order epistatic interactions. The hypergraph is constructed by low-order epistatic interactions that identified using the normalized co-information measure and the exhaustive search. The hypergraph consists of two types of vertices: real ones representing main effects of SNPs and virtual ones denoting interactive effects of epistatic interactions. Then, both maximal clique centrality algorithm and near-clique mining algorithm are employed to detect high-density subgraphs from the constructed hypergraph. These high-density subgraphs are inferred as high-order epistatic interactions in the HC-HDSD. Experiments are performed on several simulation data sets, results of which show that HC-HDSD is promising in inferring high-order epistatic interactions while substantially reducing the computation cost. In addition, the application of HC-HDSD on a real Age-related Macular Degeneration (AMD) data set provides several new clues for the exploration of causative factors of AMD.
Collapse
Affiliation(s)
- Qian Ding
- School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| | - Junliang Shang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, China; School of Statistics, Qufu Normal University, Qufu, 273165, China.
| | - Yingxia Sun
- School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| | - Xuan Wang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| | - Jin-Xing Liu
- School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| |
Collapse
|
13
|
Guan B, Zhao Y, Sun W. Ant colony optimization with an automatic adjustment mechanism for detecting epistatic interactions. Comput Biol Chem 2018; 77:354-362. [DOI: 10.1016/j.compbiolchem.2018.11.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Revised: 10/01/2018] [Accepted: 11/05/2018] [Indexed: 12/13/2022]
|
14
|
FDHE-IW: A Fast Approach for Detecting High-Order Epistasis in Genome-Wide Case-Control Studies. Genes (Basel) 2018; 9:genes9090435. [PMID: 30158504 PMCID: PMC6162554 DOI: 10.3390/genes9090435] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Revised: 08/16/2018] [Accepted: 08/16/2018] [Indexed: 12/13/2022] Open
Abstract
Detecting high-order epistasis in genome-wide association studies (GWASs) is of importance when characterizing complex human diseases. However, the enormous numbers of possible single-nucleotide polymorphism (SNP) combinations and the diversity among diseases presents a significant computational challenge. Herein, a fast method for detecting high-order epistasis based on an interaction weight (FDHE-IW) method is evaluated in the detection of SNP combinations associated with disease. First, the symmetrical uncertainty (SU) value for each SNP is calculated. Then, the top-k SNPs are isolated as guiders to identify 2-way SNP combinations with significant interaction weight values. Next, a forward search is employed to detect high-order SNP combinations with significant interaction weight values as candidates. Finally, the findings were statistically evaluated using a G-test to isolate true positives. The developed algorithm was used to evaluate 12 simulated datasets and an age-related macular degeneration (AMD) dataset and was shown to perform robustly in the detection of some high-order disease-causing models.
Collapse
|
15
|
Yang CH, Chuang LY, Lin YD. Multiobjective differential evolution-based multifactor dimensionality reduction for detecting gene-gene interactions. Sci Rep 2017; 7:12869. [PMID: 28993686 PMCID: PMC5634479 DOI: 10.1038/s41598-017-12773-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2017] [Accepted: 09/15/2017] [Indexed: 12/11/2022] Open
Abstract
Epistasis within disease-related genes (gene–gene interactions) was determined through contingency table measures based on multifactor dimensionality reduction (MDR) using single-nucleotide polymorphisms (SNPs). Most MDR-based methods use the single contingency table measure to detect gene–gene interactions; however, some gene–gene interactions may require identification through multiple contingency table measures. In this study, a multiobjective differential evolution method (called MODEMDR) was proposed to merge the various contingency table measures based on MDR to detect significant gene–gene interactions. Two contingency table measures, namely the correct classification rate and normalized mutual information, were selected to design the fitness functions in MODEMDR. The characteristics of multiobjective optimization enable MODEMDR to use multiple measures to efficiently and synchronously detect significant gene–gene interactions within a reasonable time frame. Epistatic models with and without marginal effects under various parameter settings (heritability and minor allele frequencies) were used to assess existing methods by comparing the detection success rates of gene–gene interactions. The results of the simulation datasets show that MODEMDR is superior to existing methods. Moreover, a large dataset obtained from the Wellcome Trust Case Control Consortium was used to assess MODEMDR. MODEMDR exhibited efficiency in identifying significant gene–gene interactions in genome-wide association studies.
Collapse
Affiliation(s)
- Cheng-Hong Yang
- Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung, 80778, Taiwan.,Graduate Institute of Clinical Medicine, Kaohsiung Medical University, Kaohsiung, 80708, Taiwan
| | - Li-Yeh Chuang
- Department of Chemical Engineering and Institute of Biotechnology and Chemical Engineering, I-Shou University, Kaohsiung, 84004, Taiwan.
| | - Yu-Da Lin
- Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung, 80778, Taiwan.
| |
Collapse
|
16
|
Niche harmony search algorithm for detecting complex disease associated high-order SNP combinations. Sci Rep 2017; 7:11529. [PMID: 28912584 PMCID: PMC5599559 DOI: 10.1038/s41598-017-11064-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2016] [Accepted: 08/17/2017] [Indexed: 02/01/2023] Open
Abstract
Genome-wide association study is especially challenging in detecting high-order disease-causing models due to model diversity, possible low or even no marginal effect of the model, and extraordinary search and computations. In this paper, we propose a niche harmony search algorithm where joint entropy is utilized as a heuristic factor to guide the search for low or no marginal effect model, and two computationally lightweight scores are selected to evaluate and adapt to diverse of disease models. In order to obtain all possible suspected pathogenic models, niche technique merges with HS, which serves as a taboo region to avoid HS trapping into local search. From the resultant set of candidate SNP-combinations, we use G-test statistic for testing true positives. Experiments were performed on twenty typical simulation datasets in which 12 models are with marginal effect and eight ones are with no marginal effect. Our results indicate that the proposed algorithm has very high detection power for searching suspected disease models in the first stage and it is superior to some typical existing approaches in both detection power and CPU runtime for all these datasets. Application to age-related macular degeneration (AMD) demonstrates our method is promising in detecting high-order disease-causing models.
Collapse
|
17
|
Liu J, Yu G, Jiang Y, Wang J. HiSeeker: Detecting High-Order SNP Interactions Based on Pairwise SNP Combinations. Genes (Basel) 2017; 8:genes8060153. [PMID: 28561745 PMCID: PMC5485517 DOI: 10.3390/genes8060153] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2017] [Revised: 05/06/2017] [Accepted: 05/25/2017] [Indexed: 01/27/2023] Open
Abstract
Detecting single nucleotide polymorphisms’ (SNPs) interaction is one of the most popular approaches for explaining the missing heritability of common complex diseases in genome-wide association studies. Many methods have been proposed for SNP interaction detection, but most of them only focus on pairwise interactions and ignore high-order ones, which may also contribute to complex traits. Existing methods for high-order interaction detection can hardly handle genome-wide data and suffer from low detection power, due to the exponential growth of search space. In this paper, we proposed a flexible two-stage approach (called HiSeeker) to detect high-order interactions. In the screening stage, HiSeeker employs the chi-squared test and logistic regression model to efficiently obtain candidate pairwise combinations, which have intermediate or significant associations with the phenotype for interaction detection. In the search stage, two different strategies (exhaustive search and ant colony optimization-based search) are utilized to detect high-order interactions from candidate combinations. The experimental results on simulated datasets demonstrate that HiSeeker can more efficiently and effectively detect high-order interactions than related representative algorithms. On two real case-control datasets, HiSeeker also detects several significant high-order interactions, whose individual SNPs and pairwise interactions have no strong main effects or pairwise interaction effects, and these high-order interactions can hardly be identified by related algorithms.
Collapse
Affiliation(s)
- Jie Liu
- College of Computer and Information Science, Southwest University, Chongqing 400715, China.
| | - Guoxian Yu
- College of Computer and Information Science, Southwest University, Chongqing 400715, China.
| | - Yuan Jiang
- College of Computer and Information Science, Southwest University, Chongqing 400715, China.
| | - Jun Wang
- College of Computer and Information Science, Southwest University, Chongqing 400715, China.
| |
Collapse
|