1
|
Tuo S, Jiang J. A Novel Detection Method for High-Order SNP Epistatic Interactions Based on Explicit-Encoding-Based Multitasking Harmony Search. Interdiscip Sci 2024:10.1007/s12539-024-00621-2. [PMID: 38954231 DOI: 10.1007/s12539-024-00621-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 02/06/2024] [Accepted: 02/17/2024] [Indexed: 07/04/2024]
Abstract
To elucidate the genetic basis of complex diseases, it is crucial to discover the single-nucleotide polymorphisms (SNPs) contributing to disease susceptibility. This is particularly challenging for high-order SNP epistatic interactions (HEIs), which exhibit small individual effects but potentially large joint effects. These interactions are difficult to detect due to the vast search space, encompassing billions of possible combinations, and the computational complexity of evaluating them. This study proposes a novel explicit-encoding-based multitasking harmony search algorithm (MTHS-EE-DHEI) specifically designed to address this challenge. The algorithm operates in three stages. First, a harmony search algorithm is employed, utilizing four lightweight evaluation functions, such as Bayesian network and entropy, to efficiently explore potential SNP combinations related to disease status. Second, a G-test statistical method is applied to filter out insignificant SNP combinations. Finally, two machine learning-based methods, multifactor dimensionality reduction (MDR) as well as random forest (RF), are employed to validate the classification performance of the remaining significant SNP combinations. This research aims to demonstrate the effectiveness of MTHS-EE-DHEI in identifying HEIs compared to existing methods, potentially providing valuable insights into the genetic architecture of complex diseases. The performance of MTHS-EE-DHEI was evaluated on twenty simulated disease datasets and three real-world datasets encompassing age-related macular degeneration (AMD), rheumatoid arthritis (RA), and breast cancer (BC). The results demonstrably indicate that MTHS-EE-DHEI outperforms four state-of-the-art algorithms in terms of both detection power and computational efficiency. The source code is available at https://github.com/shouhengtuo/MTHS-EE-DHEI.git .
Collapse
Affiliation(s)
- Shouheng Tuo
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, China.
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, China.
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, China.
| | - Jiewei Jiang
- School of Electronic Engineering, Xi'an University of Posts and Telecommunications, Xi'an, 710121, China
| |
Collapse
|
2
|
Tang DY, Mao YJ, Zhao J, Yang J, Li SY, Ren FX, Zheng J. SEEI: spherical evolution with feedback mechanism for identifying epistatic interactions. BMC Genomics 2024; 25:462. [PMID: 38735952 DOI: 10.1186/s12864-024-10373-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 05/03/2024] [Indexed: 05/14/2024] Open
Abstract
BACKGROUND Detecting epistatic interactions (EIs) involves the exploration of associations among single nucleotide polymorphisms (SNPs) and complex diseases, which is an important task in genome-wide association studies. The EI detection problem is dependent on epistasis models and corresponding optimization methods. Although various models and methods have been proposed to detect EIs, identifying EIs efficiently and accurately is still a challenge. RESULTS Here, we propose a linear mixed statistical epistasis model (LMSE) and a spherical evolution approach with a feedback mechanism (named SEEI). The LMSE model expands the existing single epistasis models such as LR-Score, K2-Score, Mutual information, and Gini index. The SEEI includes an adaptive spherical search strategy and population updating strategy, which ensures that the algorithm is not easily trapped in local optima. We analyzed the performances of 8 random disease models, 12 disease models with marginal effects, 30 disease models without marginal effects, and 10 high-order disease models. The 60 simulated disease models and a real breast cancer dataset were used to evaluate eight algorithms (SEEI, EACO, EpiACO, FDHEIW, MP-HS-DHSI, NHSA-DHSC, SNPHarvester, CSE). Three evaluation criteria (pow1, pow2, pow3), a T-test, and a Friedman test were used to compare the performances of these algorithms. The results show that the SEEI algorithm (order 1, averages ranks = 13.125) outperformed the other algorithms in detecting EIs. CONCLUSIONS Here, we propose an LMSE model and an evolutionary computing method (SEEI) to solve the optimization problem of the LMSE model. The proposed method performed better than the other seven algorithms tested in its ability to identify EIs in genome-wide association datasets. We identified new SNP-SNP combinations in the real breast cancer dataset and verified the results. Our findings provide new insights for the diagnosis and treatment of breast cancer. AVAILABILITY AND IMPLEMENTATION https://github.com/scutdy/SSO/blob/master/SEEI.zip .
Collapse
Affiliation(s)
- De-Yu Tang
- Department of Computer Science, School of Mathematics and Informatics, School of Software Engineering, South China Agricultural University, Guangzhou, 510642, PR China.
- School of Medical Information and Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, PR China.
| | - Yi-Jun Mao
- Department of Computer Science, School of Mathematics and Informatics, School of Software Engineering, South China Agricultural University, Guangzhou, 510642, PR China.
| | - Jie Zhao
- School of Management, Guangdong University of Technology, Guangzhou, 510006, PR China
| | - Jin Yang
- School of Medical Information and Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, PR China.
| | - Shi-Yin Li
- School of Medical Information and Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, PR China
| | - Fu-Xiang Ren
- School of Medical Information and Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, PR China
| | - Junxi Zheng
- School of Medical Information and Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, PR China.
| |
Collapse
|
3
|
Ren F, Li S, Wen Z, Liu Y, Tang D. The Spherical Evolutionary Multi-Objective (SEMO) Algorithm for Identifying Disease Multi-Locus SNP Interactions. Genes (Basel) 2023; 15:11. [PMID: 38275593 PMCID: PMC10815643 DOI: 10.3390/genes15010011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 11/21/2023] [Accepted: 12/18/2023] [Indexed: 01/27/2024] Open
Abstract
Single-nucleotide polymorphisms (SNPs), as disease-related biogenetic markers, are crucial in elucidating complex disease susceptibility and pathogenesis. Due to computational inefficiency, it is difficult to identify high-dimensional SNP interactions efficiently using combinatorial search methods, so the spherical evolutionary multi-objective (SEMO) algorithm for detecting multi-locus SNP interactions was proposed. The algorithm uses a spherical search factor and a feedback mechanism of excellent individual history memory to enhance the balance between search and acquisition. Moreover, a multi-objective fitness function based on the decomposition idea was used to evaluate the associations by combining two functions, K2-Score and LR-Score, as an objective function for the algorithm's evolutionary iterations. The performance evaluation of SEMO was compared with six state-of-the-art algorithms on a simulated dataset. The results showed that SEMO outperforms the comparative methods by detecting SNP interactions quickly and accurately with a shorter average run time. The SEMO algorithm was applied to the Wellcome Trust Case Control Consortium (WTCCC) breast cancer dataset and detected two- and three-point SNP interactions that were significantly associated with breast cancer, confirming the effectiveness of the algorithm. New combinations of SNPs associated with breast cancer were also identified, which will provide a new way to detect SNP interactions quickly and accurately.
Collapse
Affiliation(s)
- Fuxiang Ren
- College of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou 510006, China; (F.R.); (S.L.); (Y.L.)
| | - Shiyin Li
- College of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou 510006, China; (F.R.); (S.L.); (Y.L.)
| | - Zihao Wen
- College of Mathematics and Informatics, College of Software Engineering, South China Agricultural University, Guangzhou 510642, China
- Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Yidi Liu
- College of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou 510006, China; (F.R.); (S.L.); (Y.L.)
| | - Deyu Tang
- College of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou 510006, China; (F.R.); (S.L.); (Y.L.)
- College of Mathematics and Informatics, College of Software Engineering, South China Agricultural University, Guangzhou 510642, China
| |
Collapse
|
4
|
MDSN: A Module Detection Method for Identifying High-Order Epistatic Interactions. Genes (Basel) 2022; 13:genes13122403. [PMID: 36553670 PMCID: PMC9778340 DOI: 10.3390/genes13122403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/14/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2022] Open
Abstract
Epistatic interactions are referred to as SNPs (single nucleotide polymorphisms) that affect disease development and trait expression nonlinearly, and hence identifying epistatic interactions plays a great role in explaining the pathogenesis and genetic heterogeneity of complex diseases. Many methods have been proposed for epistasis detection; nevertheless, they mainly focus on low-order epistatic interactions, two-order or three-order for instance, and often ignore high-order interactions due to computational burden. In this paper, a module detection method called MDSN is proposed for identifying high-order epistatic interactions. First, an SNP network is constructed by a construction strategy of interaction complementary, which consists of low-order SNP interactions that can be obtained from fast computations. Then, a node evaluation measure that integrates multi-topological features is proposed to improve the node expansion algorithm, where the importance of a node is comprehensively evaluated by the topological characteristics of the neighborhood. Finally, modules are detected in the constructed SNP network, which have high-order epistatic interactions associated with the disease. The MDSN was compared with four state-of-the-art methods on simulation datasets and a real Age-related Macular Degeneration dataset. The results demonstrate that MDSN has higher performance on detecting high-order interactions.
Collapse
|
5
|
Shang J, Cai X, Zhang T, Sun Y, Zhang Y, Liu J, Guan B. EpiReSIM: A Resampling Method of Epistatic Model without Marginal Effects Using Under-Determined System of Equations. Genes (Basel) 2022; 13:genes13122286. [PMID: 36553553 PMCID: PMC9777644 DOI: 10.3390/genes13122286] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 11/30/2022] [Accepted: 12/01/2022] [Indexed: 12/12/2022] Open
Abstract
Simulation experiments are essential to evaluate epistasis detection methods, which is the main way to prove their effectiveness and move toward practical applications. However, due to the lack of effective simulators, especially for simulating models without marginal effects (eNME models), epistasis detection methods can hardly verify their effectiveness through simulation experiments. In this study, we propose a resampling simulation method (EpiReSIM) for generating the eNME model. First, EpiReSIM provides two strategies for solving eNME models. One is to calculate eNME models using prevalence constraints, and another is by joint constraints of prevalence and heritability. We transform the computation of the model into the problem of solving the under-determined system of equations. Introducing the complete orthogonal decomposition method and Newton's method, EpiReSIM calculates the solution of the underdetermined system of equations to obtain the eNME model, especially the solution of the high-order model, which is the highlight of EpiReSIM. Second, based on the computed eNME model, EpiReSIM generates simulation data by a resampling method. Experimental results show that EpiReSIM has advantages in preserving the biological properties of minor allele frequencies and calculating high-order models, and it is a convenient and effective alternative method for current simulation software.
Collapse
Affiliation(s)
- Junliang Shang
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Xinrui Cai
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Tongdui Zhang
- Science and Technology Innovation Service Institution of Rizhao, Rizhao 276827, China
| | - Yan Sun
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Yuanyuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, China
| | - Jinxing Liu
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Boxin Guan
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
- Correspondence:
| |
Collapse
|
6
|
Tuo S, Li C, Liu F, Zhu Y, Chen T, Feng Z, Liu H, Li A. A Novel Multitasking Ant Colony Optimization Method for Detecting Multiorder SNP Interactions. Interdiscip Sci 2022; 14:814-832. [PMID: 35788965 DOI: 10.1007/s12539-022-00530-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 05/29/2022] [Accepted: 06/01/2022] [Indexed: 06/15/2023]
Abstract
MOTIVATION Linear or nonlinear interactions of multiple single-nucleotide polymorphisms (SNPs) play an important role in understanding the genetic basis of complex human diseases. However, combinatorial analytics in high-dimensional space makes it extremely challenging to detect multiorder SNP interactions. Most classic approaches can only perform one task (for detecting k-order SNP interactions) in each run. Since prior knowledge of a complex disease is usually not available, it is difficult to determine the value of k for detecting k-order SNP interactions. METHODS A novel multitasking ant colony optimization algorithm (named MTACO-DMSI) is proposed to detect multiorder SNP interactions, and it is divided into two stages: searching and testing. In the searching stage, multiple multiorder SNP interaction detection tasks (from 2nd-order to kth-order) are executed in parallel, and two subpopulations that separately adopt the Bayesian network-based K2-score and Jensen-Shannon divergence (JS-score) as evaluation criteria are generated for each task to improve the global search capability and the discrimination ability for various disease models. In the testing stage, the G test statistical test is adopted to further verify the authenticity of candidate solutions to reduce the error rate. RESULT Three multiorder simulated disease models with different interaction effects and three real age-related macular degeneration (AMD), rheumatoid arthritis (RA) and type 1 diabetes (T1D) datasets were used to investigate the performance of the proposed MTACO-DMSI. The experimental results show that the MTACO-DMSI has a faster search speed and higher discriminatory power for diverse simulation disease models than traditional single-task algorithms. The results on real AMD data and RA and T1D datasets indicate that MTACO-DMSI has the ability to detect multiorder SNP interactions at a genome-wide scale. Availability and implementation: https://github.com/shouhengtuo/MTACO-DMSI/.
Collapse
Affiliation(s)
- Shouheng Tuo
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China.
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China.
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China.
| | - Chao Li
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - Fan Liu
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - YanLing Zhu
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - TianRui Chen
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - ZengYu Feng
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - Haiyan Liu
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - Aimin Li
- School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, 710048, Shaanxi, China
| |
Collapse
|
7
|
Multi-Objective Artificial Bee Colony Algorithm Based on Scale-Free Network for Epistasis Detection. Genes (Basel) 2022; 13:genes13050871. [PMID: 35627256 PMCID: PMC9140669 DOI: 10.3390/genes13050871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2022] [Revised: 04/30/2022] [Accepted: 05/10/2022] [Indexed: 12/04/2022] Open
Abstract
In genome-wide association studies, epistasis detection is of great significance for the occurrence and diagnosis of complex human diseases, but it also faces challenges such as high dimensionality and a small data sample size. In order to cope with these challenges, several swarm intelligence methods have been introduced to identify epistasis in recent years. However, the existing methods still have some limitations, such as high-consumption and premature convergence. In this study, we proposed a multi-objective artificial bee colony (ABC) algorithm based on the scale-free network (SFMOABC). The SFMOABC incorporates the scale-free network into the ABC algorithm to guide the update and selection of solutions. In addition, the SFMOABC uses mutual information and the K2-Score of the Bayesian network as objective functions, and the opposition-based learning strategy is used to improve the search ability. Experiments were performed on both simulation datasets and a real dataset of age-related macular degeneration (AMD). The results of the simulation experiments showed that the SFMOABC has better detection power and efficiency than seven other epistasis detection methods. In the real AMD data experiment, most of the single nucleotide polymorphism combinations detected by the SFMOABC have been shown to be associated with AMD disease. Therefore, SFMOABC is a promising method for epistasis detection.
Collapse
|
8
|
Ponte-Fernandez C, Gonzalez-Dominguez J, Carvajal-Rodriguez A, Martin MJ. Evaluation of Existing Methods for High-Order Epistasis Detection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:912-926. [PMID: 33055017 DOI: 10.1109/tcbb.2020.3030312] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Finding epistatic interactions among loci when expressing a phenotype is a widely employed strategy to understand the genetic architecture of complex traits in GWAS. The abundance of methods dedicated to the same purpose, however, makes it increasingly difficult for scientists to decide which method is more suitable for their studies. This work compares the different epistasis detection methods published during the last decade in terms of runtime, detection power and type I error rate, with a special emphasis on high-order interactions. Results show that in terms of detection power, the only methods that perform well across all experiments are the exhaustive methods, although their computational cost may be prohibitive in large-scale studies. Regarding non-exhaustive methods, not one could consistently find epistasis interactions when marginal effects are absent. If marginal effects are present, there are methods that perform well for high-order interactions, such as BADTrees, FDHE-IW, SingleMI or SNPHarvester. As for false-positive control, only SNPHarvester, FDHE-IW and DCHE show good results. The study concludes that there is no single epistasis detection method to recommend in all scenarios. Authors should prioritize exhaustive methods when sufficient computational resources are available considering the data set size, and resort to non-exhaustive methods when the analysis time is prohibitive.
Collapse
|
9
|
Wang X, Zhang H, Wang J, Yu G, Cui L, Guo M. EpiHNet: Detecting epistasis by heterogeneous molecule network. Methods 2021; 198:65-75. [PMID: 34555529 DOI: 10.1016/j.ymeth.2021.09.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 08/16/2021] [Accepted: 09/16/2021] [Indexed: 12/22/2022] Open
Abstract
Epistasis between single nucleotide polymorphisms (SNPs) plays an important role in elucidating the missing heritability of complex diseases. Diverse approaches have been invented for detecting SNP interactions, but they canonically neglect the important and useful connections between SNPs and other bio-molecules (i.e., miRNAs and lncRNAs). To comprehensively model these disease related molecules, a heterogeneous bio-molecular network based solution EpiHNet is introduced for high-order SNP interactions detection. EpiHNet firstly uses case/control data to construct an SNP statistical network, and meta-path based similarity on the heterogeneous network composed with SNPs, genes, lncRNAs, miRNAs and diseases to define another SNP relational network. The SNP relational network can explore and exploit different associations between molecules and diseases to complement the SNP statistical network and search the significantly associated SNPs. Next, EpiHNet integrates these two networks into a composite network, applies the modularity based clustering with fast search strategy to divide SNP nodes into different clusters. After that, it detects SNP interactions based on SNP combinations derived from each cluster. Synthetic experiments on diverse two-locus and three-locus disease models manifest that EpiHNet outperforms competitive baselines, even without the heterogeneous network. For real WTCCC breast cancer data, EpiHNet also demonstrates expressive results on detecting high-order SNP interactions.
Collapse
Affiliation(s)
- Xin Wang
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre For AI Research (C-FAIR), Shandong University, Jinan, China.
| | - Huiling Zhang
- College of Computer and Information Sciences, Southwest University, Chongqing, China.
| | - Jun Wang
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre For AI Research (C-FAIR), Shandong University, Jinan, China.
| | - Guoxian Yu
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre For AI Research (C-FAIR), Shandong University, Jinan, China.
| | - Lizhen Cui
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre For AI Research (C-FAIR), Shandong University, Jinan, China.
| | - Maozu Guo
- College of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China.
| |
Collapse
|
10
|
MIDESP: Mutual Information-Based Detection of Epistatic SNP Pairs for Qualitative and Quantitative Phenotypes. BIOLOGY 2021; 10:biology10090921. [PMID: 34571798 PMCID: PMC8469369 DOI: 10.3390/biology10090921] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 09/09/2021] [Accepted: 09/13/2021] [Indexed: 11/17/2022]
Abstract
Simple Summary The interactions between SNPs, which are known as epistasis, can strongly influence the phenotype. Their detection is still a challenge, which is made even more difficult through the existence of background associations that can hide correct epistatic interactions. To address the limitations of existing methods, we present in this study our novel method MIDESP for the detection of epistatic SNP pairs. It is the first mutual information-based method that can be applied to both qualitative and quantitative phenotypes and which explicitly accounts for background associations in the dataset. Abstract The interactions between SNPs result in a complex interplay with the phenotype, known as epistasis. The knowledge of epistasis is a crucial part of understanding genetic causes of complex traits. However, due to the enormous number of SNP pairs and their complex relationship to the phenotype, identification still remains a challenging problem. Many approaches for the detection of epistasis have been developed using mutual information (MI) as an association measure. However, these methods have mainly been restricted to case–control phenotypes and are therefore of limited applicability for quantitative traits. To overcome this limitation of MI-based methods, here, we present an MI-based novel algorithm, MIDESP, to detect epistasis between SNPs for qualitative as well as quantitative phenotypes. Moreover, by incorporating a dataset-dependent correction technique, we deal with the effect of background associations in a genotypic dataset to separate correct epistatic interaction signals from those of false positive interactions resulting from the effect of single SNP×phenotype associations. To demonstrate the effectiveness of MIDESP, we apply it on two real datasets with qualitative and quantitative phenotypes, respectively. Our results suggest that by eliminating the background associations, MIDESP can identify important genes, which play essential roles for bovine tuberculosis or the egg weight of chickens.
Collapse
|
11
|
Yuan X, Li J, Bai J, Xi J. A Local Outlier Factor-Based Detection of Copy Number Variations From NGS Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1811-1820. [PMID: 31880558 DOI: 10.1109/tcbb.2019.2961886] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Copy number variation (CNV) is a major type of genomic structural variations that play an important role in human disorders. Next generation sequencing (NGS) has fueled the advancement in algorithm design to detect CNVs at base-pair resolution. However, accurate detection of CNVs of low amplitudes remains a challenging task. This paper proposes a new computational method, CNV-LOF, to identify CNVs of full-range amplitudes from NGS data. CNV-LOF is distinctly different from traditional methods, which mainly consider aberrations from a global perspective and rely on some assumed distribution of NGS read depths. In contrast, CNV-LOF takes a local view on the read depths and assigns an outlier factor to each genome segment. With the outlier factor profile, CNV-LOF uses a boxplot procedure to declare CNVs without the reliance of any distribution assumptions. Simulation experiments indicate that CNV-LOF outperforms five existing methods with respect to F1-measure, sensitivity, and precision. CNV-LOF is further validated on real sequencing samples, yielding highly consistent results with peer methods. CNV-LOF is able to detect CNVs of low and moderate amplitudes where the other existing methods fail, and it is expected to become a routine approach for the discovery of novel CNVs on whole sequencing genome.
Collapse
|
12
|
Lyu R, Sun J, Xu D, Jiang Q, Wei C, Zhang Y. GESLM algorithm for detecting causal SNPs in GWAS with multiple phenotypes. Brief Bioinform 2021; 22:6329404. [PMID: 34323927 DOI: 10.1093/bib/bbab276] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Revised: 06/05/2021] [Accepted: 06/29/2021] [Indexed: 12/13/2022] Open
Abstract
With the development of genome-wide association studies, how to gain information from a large scale of data has become an issue of common concern, since traditional methods are not fully developed to solve problems such as identifying loci-to-loci interactions (also known as epistasis). Previous epistatic studies mainly focused on local information with a single outcome (phenotype), while in this paper, we developed a two-stage global search algorithm, Greedy Equivalence Search with Local Modification (GESLM), to implement a global search of directed acyclic graph in order to identify genome-wide epistatic interactions with multiple outcome variables (phenotypes) in a case-control design. GESLM integrates the advantages of score-based methods and constraint-based methods to learn the phenotype-related Bayesian network and is powerful and robust to find the interaction structures that display both genetic associations with phenotypes and gene interactions. We compared GESLM with some common phenotype-related loci detecting methods in simulation studies. The results showed that our method improved the accuracy and efficiency compared with others, especially in an unbalanced case-control study. Besides, its application on the UK Biobank dataset suggested that our algorithm has great performance when handling genome-wide association data with more than one phenotype.
Collapse
Affiliation(s)
- Ruiqi Lyu
- Shanghai Jiao Tong University, Department of Bioinformatics and Biostatistics, Shanghai, 200240, China
| | - Jianle Sun
- Shanghai Jiao Tong University, Department of Bioinformatics and Biostatistics, Shanghai, 200240, China
| | - Dong Xu
- Shanghai Jiao Tong University, Department of Bioinformatics and Biostatistics, Shanghai, 200240, China
| | - Qianxue Jiang
- Shanghai Jiao Tong University, Department of Bioinformatics and Biostatistics, Shanghai, 200240, China
| | - Chaochun Wei
- Shanghai Jiao Tong University, Department of Bioinformatics and Biostatistics, Shanghai, 200240, China
| | - Yue Zhang
- Shanghai Jiao Tong University, Department of Bioinformatics and Biostatistics, Shanghai, 200240, China
| |
Collapse
|
13
|
Sakai T, Abe A, Shimizu M, Terauchi R. RIL-StEp: epistasis analysis of rice recombinant inbred lines reveals candidate interacting genes that control seed hull color and leaf chlorophyll content. G3 (BETHESDA, MD.) 2021; 11:jkab130. [PMID: 33871605 PMCID: PMC8496299 DOI: 10.1093/g3journal/jkab130] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 04/10/2021] [Indexed: 11/19/2022]
Abstract
Characterizing epistatic gene interactions is fundamental for understanding the genetic architecture of complex traits. However, due to the large number of potential gene combinations, detecting epistatic gene interactions is computationally demanding. A simple, easy-to-perform method for sensitive detection of epistasis is required. Due to their homozygous nature, use of recombinant inbred lines excludes the dominance effect of alleles and interactions involving heterozygous genotypes, thereby allowing detection of epistasis in a simple and interpretable model. Here, we present an approach called RIL-StEp (recombinant inbred lines stepwise epistasis detection) to detect epistasis using single-nucleotide polymorphisms in the genome. We applied the method to reveal epistasis affecting rice (Oryza sativa) seed hull color and leaf chlorophyll content and successfully identified pairs of genomic regions that presumably control these phenotypes. This method has the potential to improve our understanding of the genetic architecture of various traits of crops and other organisms.
Collapse
Affiliation(s)
- Toshiyuki Sakai
- Laboratory of Crop Evolution, Graduate School of Agriculture, Kyoto University, Mozume, Muko, Kyoto 617-0001, Japan
- The Sainsbury Laboratory, University of East Anglia, Norwich Research Park, Norwich NR4 7UH, UK
| | - Akira Abe
- Iwate Biotechnology Research Center, Kitakami, Iwate 024-0003, Japan
| | - Motoki Shimizu
- Iwate Biotechnology Research Center, Kitakami, Iwate 024-0003, Japan
| | - Ryohei Terauchi
- Laboratory of Crop Evolution, Graduate School of Agriculture, Kyoto University, Mozume, Muko, Kyoto 617-0001, Japan
- Iwate Biotechnology Research Center, Kitakami, Iwate 024-0003, Japan
| |
Collapse
|
14
|
Tuo S, Liu H, Chen H. Multipopulation harmony search algorithm for the detection of high-order SNP interactions. Bioinformatics 2021; 36:4389-4398. [PMID: 32227192 DOI: 10.1093/bioinformatics/btaa215] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Revised: 01/01/2020] [Accepted: 03/24/2020] [Indexed: 01/23/2023] Open
Abstract
MOTIVATION Recently, multiobjective swarm intelligence optimization (SIO) algorithms have attracted considerable attention as disease model-free methods for detecting high-order single nucleotide polymorphism (SNP) interactions. However, a strict Pareto optimal set may filter out some of the SNP combinations associated with disease status. Furthermore, the lack of heuristic factors for finding SNP interactions and the preference for discrimination approaches to disease models are considerable challenges for SIO. In this study, we propose a multipopulation harmony search (HS) algorithm dedicated to the detection of high-order SNP interactions (MP-HS-DHSI). This method consists of three stages. In the first stage, HS with multipopulation (multiharmony memories) is used to discover a set of candidate high-order SNP combinations having an association with disease status. In HS, multiple criteria [Bayesian network-based K2-score, Jensen-Shannon divergence, likelihood ratio and normalized distance with joint entropy (ND-JE)] are adopted by four harmony memories to improve the ability to discriminate diverse disease models. A novel evaluation criterion named ND-JE is proposed to guide HS to explore clues for high-order SNP interactions. In the second and third stages, the G-test statistical method and multifactor dimensionality reduction are employed to verify the authenticity of the candidate solutions, respectively. RESULTS We compared MP-HS-DHSI with four state-of-the-art SIO algorithms for detecting high-order SNP interactions for 20 simulation disease models and a real dataset of age-related macular degeneration. The experimental results revealed that our proposed method can accelerate the search speed efficiently and enhance the discrimination ability of diverse epistasis models. AVAILABILITY AND IMPLEMENTATION https://github.com/shouhengtuo/MP-HS-DHSI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shouheng Tuo
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, Shaanxi 710121, China
| | - Haiyan Liu
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, Shaanxi 710121, China
| | - Hao Chen
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, Shaanxi 710121, China
| |
Collapse
|
15
|
A differential evolution based feature combination selection algorithm for high-dimensional data. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.08.081] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
16
|
EpiMOGA: An Epistasis Detection Method Based on a Multi-Objective Genetic Algorithm. Genes (Basel) 2021; 12:genes12020191. [PMID: 33525573 PMCID: PMC7911965 DOI: 10.3390/genes12020191] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2020] [Revised: 01/12/2021] [Accepted: 01/25/2021] [Indexed: 12/28/2022] Open
Abstract
In genome-wide association studies, detecting high-order epistasis is important for analyzing the occurrence of complex human diseases and explaining missing heritability. However, there are various challenges in the actual high-order epistasis detection process due to the large amount of data, “small sample size problem”, diversity of disease models, etc. This paper proposes a multi-objective genetic algorithm (EpiMOGA) for single nucleotide polymorphism (SNP) epistasis detection. The K2 score based on the Bayesian network criterion and the Gini index of the diversity of the binary classification problem were used to guide the search process of the genetic algorithm. Experiments were performed on 26 simulated datasets of different models and a real Alzheimer’s disease dataset. The results indicated that EpiMOGA was obviously superior to other related and competitive methods in both detection efficiency and accuracy, especially for small-sample-size datasets, and the performance of EpiMOGA remained stable across datasets of different disease models. At the same time, a number of SNP loci and 2-order epistasis associated with Alzheimer’s disease were identified by the EpiMOGA method, indicating that this method is capable of identifying high-order epistasis from genome-wide data and can be applied in the study of complex diseases.
Collapse
|
17
|
Zhou X, Chan KCC, Huang Z, Wang J. Determining dependency and redundancy for identifying gene-gene interaction associated with complex disease. J Bioinform Comput Biol 2020; 18:2050035. [PMID: 33064052 DOI: 10.1142/s0219720020500353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
As interactions among genetic variants in different genes can be an important factor for predicting complex diseases, many computational methods have been proposed to detect if a particular set of genes has interaction with a particular complex disease. However, even though many such methods have been shown to be useful, they can be made more effective if the properties of gene-gene interactions can be better understood. Towards this goal, we have attempted to uncover patterns in gene-gene interactions and the patterns reveal an interesting property that can be reflected in an inequality that describes the relationship between two genotype variables and a disease-status variable. We show, in this paper, that this inequality can be generalized to [Formula: see text] genotype variables. Based on this inequality, we establish a conditional independence and redundancy (CIR)-based definition of gene-gene interaction and the concept of an interaction group. From these new definitions, a novel measure of gene-gene interaction is then derived. We discuss the properties of these concepts and explain how they can be used in a novel algorithm to detect high-order gene-gene interactions. Experimental results using both simulated and real datasets show that the proposed method can be very promising.
Collapse
Affiliation(s)
- Xiangdong Zhou
- College of Mathematics and Computer Science, Fuzhou University Fuzhou, Fujian 350108, P. R. China
| | - Keith C C Chan
- Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong, P. R. China
| | - Zhihua Huang
- College of Mathematics and Computer Science, Fuzhou University Fuzhou, Fujian 350108, P. R. China
| | - Jingbin Wang
- College of Mathematics and Computer Science, Fuzhou University Fuzhou, Fujian 350108, P. R. China
| |
Collapse
|
18
|
Yin Y, Guan B, Zhao Y, Li Y. SAMA: A Fast Self-Adaptive Memetic Algorithm for Detecting SNP-SNP Interactions Associated with Disease. BIOMED RESEARCH INTERNATIONAL 2020; 2020:5610658. [PMID: 32908899 PMCID: PMC7468611 DOI: 10.1155/2020/5610658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Accepted: 07/13/2020] [Indexed: 11/29/2022]
Abstract
Detecting SNP-SNP interactions associated with disease is significant in genome-wide association study (GWAS). Owing to intensive computational burden and diversity of disease models, existing methods have drawbacks on low detection power and long running time. To tackle these drawbacks, a fast self-adaptive memetic algorithm (SAMA) is proposed in this paper. In this method, the crossover, mutation, and selection of standard memetic algorithm are improved to make SAMA adapt to the detection of SNP-SNP interactions associated with disease. Furthermore, a self-adaptive local search algorithm is introduced to enhance the detecting power of the proposed method. SAMA is evaluated on a variety of simulated datasets and a real-world biological dataset, and a comparative study between it and the other four methods (FHSA-SED, AntEpiSeeker, IEACO, and DESeeker) that have been developed recently based on evolutionary algorithms is performed. The results of extensive experiments show that SAMA outperforms the other four compared methods in terms of detection power and running time.
Collapse
Affiliation(s)
- Ying Yin
- Key Laboratory of Intelligent Computing in Medical Image, Minister of Education, and School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
| | - Boxin Guan
- Key Laboratory of Intelligent Computing in Medical Image, Minister of Education, and School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
| | - Yuhai Zhao
- Key Laboratory of Intelligent Computing in Medical Image, Minister of Education, and School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
| | - Yuan Li
- School of Information Science and Technology, North China University of Technology, Beijing 100144, China
| |
Collapse
|
19
|
Esteban L, Lonishin LR, Bobrovskiy DM, Leleytner G, Bogatyreva NS, Kondrashov FA, Ivankov DN. HypercubeME: two hundred million combinatorially complete datasets from a single experiment. Bioinformatics 2019; 36:btz841. [PMID: 31742320 PMCID: PMC7703787 DOI: 10.1093/bioinformatics/btz841] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Revised: 11/01/2019] [Accepted: 11/07/2019] [Indexed: 11/17/2022] Open
Abstract
MOTIVATION Epistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2n genotypes of an n-dimensional hypercube in genotype space forming a "combinatorially complete dataset". So far, only a handful of such datasets have been produced by manual curation. Concurrently, random mutagenesis experiments have produced measurements of fitness and other phenotypes in a high-throughput manner, potentially containing a number of combinatorially complete datasets. RESULTS We present an effective recursive algorithm for finding all hypercube structures in random mutagenesis experimental data. To test the algorithm, we applied it to the data from a recent HIS3 protein dataset and found all 199,847,053 unique combinatorially complete genotype combinations of dimensionality ranging from two to twelve. The algorithm may be useful for researchers looking for higher-order epistasis in their high-throughput experimental data. AVAILABILITY https://github.com/ivankovlab/HypercubeME.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Lyubov R Lonishin
- Faculty of Medical Physics, Institute of Biomedical System and Technologies, Peter the Great Saint Petersburg Polytechnic University, Saint Petersburg 195251, Russia
| | - Daniil M Bobrovskiy
- Faculty of Bioengineering and Bioinformatics, Moscow State University, Moscow 119234, Russia
| | - Gregory Leleytner
- Department of Innovation and High Technology, Moscow Institute of Physics and Technology, Moscow 141701, Russia
| | - Natalya S Bogatyreva
- Universitat Pompeu Fabra (UPF), Barcelona 08003, Spain
- Bioinformatics and Genomics Programme, Center for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, 08003 Barcelona, Spain
- Laboratory of Protein Physics, Institute of Protein Research of the Russian Academy of Sciences, Moscow 142290, Russia
| | | | - Dmitry N Ivankov
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow 121205, Russia
| |
Collapse
|
20
|
Guan B, Zhao Y. Self-Adjusting Ant Colony Optimization Based on Information Entropy for Detecting Epistatic Interactions. Genes (Basel) 2019; 10:genes10020114. [PMID: 30717303 PMCID: PMC6409693 DOI: 10.3390/genes10020114] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 01/21/2019] [Accepted: 01/28/2019] [Indexed: 12/15/2022] Open
Abstract
The epistatic interactions of single nucleotide polymorphisms (SNPs) are considered to be an important factor in determining the susceptibility of individuals to complex diseases. Although many methods have been proposed to detect such interactions, the development of detection algorithm is still ongoing due to the computational burden in large-scale association studies. In this paper, to deal with the intensive computing problem of detecting epistatic interactions in large-scale datasets, a self-adjusting ant colony optimization based on information entropy (IEACO) is proposed. The algorithm can automatically self-adjust the path selection strategy according to the real-time information entropy. The performance of IEACO is compared with that of ant colony optimization (ACO), AntEpiSeeker, AntMiner, and epiACO on a set of simulated datasets and a real genome-wide dataset. The results of extensive experiments show that the proposed method is superior to the other methods.
Collapse
Affiliation(s)
- Boxin Guan
- Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, and School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China.
| | - Yuhai Zhao
- Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, and School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China.
| |
Collapse
|