1
|
Tuo S, Jiang J. A Novel Detection Method for High-Order SNP Epistatic Interactions Based on Explicit-Encoding-Based Multitasking Harmony Search. Interdiscip Sci 2024; 16:688-711. [PMID: 38954231 DOI: 10.1007/s12539-024-00621-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 02/06/2024] [Accepted: 02/17/2024] [Indexed: 07/04/2024]
Abstract
To elucidate the genetic basis of complex diseases, it is crucial to discover the single-nucleotide polymorphisms (SNPs) contributing to disease susceptibility. This is particularly challenging for high-order SNP epistatic interactions (HEIs), which exhibit small individual effects but potentially large joint effects. These interactions are difficult to detect due to the vast search space, encompassing billions of possible combinations, and the computational complexity of evaluating them. This study proposes a novel explicit-encoding-based multitasking harmony search algorithm (MTHS-EE-DHEI) specifically designed to address this challenge. The algorithm operates in three stages. First, a harmony search algorithm is employed, utilizing four lightweight evaluation functions, such as Bayesian network and entropy, to efficiently explore potential SNP combinations related to disease status. Second, a G-test statistical method is applied to filter out insignificant SNP combinations. Finally, two machine learning-based methods, multifactor dimensionality reduction (MDR) as well as random forest (RF), are employed to validate the classification performance of the remaining significant SNP combinations. This research aims to demonstrate the effectiveness of MTHS-EE-DHEI in identifying HEIs compared to existing methods, potentially providing valuable insights into the genetic architecture of complex diseases. The performance of MTHS-EE-DHEI was evaluated on twenty simulated disease datasets and three real-world datasets encompassing age-related macular degeneration (AMD), rheumatoid arthritis (RA), and breast cancer (BC). The results demonstrably indicate that MTHS-EE-DHEI outperforms four state-of-the-art algorithms in terms of both detection power and computational efficiency. The source code is available at https://github.com/shouhengtuo/MTHS-EE-DHEI.git .
Collapse
Affiliation(s)
- Shouheng Tuo
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, China.
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, China.
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, China.
| | - Jiewei Jiang
- School of Electronic Engineering, Xi'an University of Posts and Telecommunications, Xi'an, 710121, China
| |
Collapse
|
2
|
Lin HY, Mazumder H, Sarkar I, Huang PY, Eeles RA, Kote-Jarai Z, Muir KR, Schleutker J, Pashayan N, Batra J, Neal DE, Nielsen SF, Nordestgaard BG, Grönberg H, Wiklund F, MacInnis RJ, Haiman CA, Travis RC, Stanford JL, Kibel AS, Cybulski C, Khaw KT, Maier C, Thibodeau SN, Teixeira MR, Cannon-Albright L, Brenner H, Kaneva R, Pandha H, Park JY. Cluster effect for SNP-SNP interaction pairs for predicting complex traits. Sci Rep 2024; 14:18677. [PMID: 39134575 PMCID: PMC11319716 DOI: 10.1038/s41598-024-66311-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 07/01/2024] [Indexed: 08/15/2024] Open
Abstract
Single nucleotide polymorphism (SNP) interactions are the key to improving polygenic risk scores. Previous studies reported several significant SNP-SNP interaction pairs that shared a common SNP to form a cluster, but some identified pairs might be false positives. This study aims to identify factors associated with the cluster effect of false positivity and develop strategies to enhance the accuracy of SNP-SNP interactions. The results showed the cluster effect is a major cause of false-positive findings of SNP-SNP interactions. This cluster effect is due to high correlations between a causal pair and null pairs in a cluster. The clusters with a hub SNP with a significant main effect and a large minor allele frequency (MAF) tended to have a higher false-positive rate. In addition, peripheral null SNPs in a cluster with a small MAF tended to enhance false positivity. We also demonstrated that using the modified significance criterion based on the 3 p-value rules and the bootstrap approach (3pRule + bootstrap) can reduce false positivity and maintain high true positivity. In addition, our results also showed that a pair without a significant main effect tends to have weak or no interaction. This study identified the cluster effect and suggested using the 3pRule + bootstrap approach to enhance SNP-SNP interaction detection accuracy.
Collapse
Affiliation(s)
- Hui-Yi Lin
- Biostatistics and Data Science Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, LA, 70112, USA.
| | - Harun Mazumder
- Biostatistics and Data Science Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, LA, 70112, USA
| | - Indrani Sarkar
- Biostatistics and Data Science Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, LA, 70112, USA
| | - Po-Yu Huang
- Information and Communications Research Laboratories, Industrial Technology Research Institute, Hsinchu, Taiwan
| | - Rosalind A Eeles
- The Institute of Cancer Research, London, SM2 5NG, UK
- Royal Marsden NHS Foundation Trust, London, SW3 6JJ, UK
| | | | - Kenneth R Muir
- Division of Population Health, Health Services Research and Primary Care, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| | - Johanna Schleutker
- Institute of Biomedicine, University of Turku, Turku, Finland
- Department of Medical Genetics, Genomics, Laboratory Division, Turku University Hospital, PO Box 52, 20521, Turku, Finland
| | - Nora Pashayan
- Department of Applied Health Research, University College London, London, WC1E 7HB, UK
- Centre for Cancer Genetic Epidemiology, Department of Oncology, University of Cambridge, Strangeways Laboratory, Worts Causeway, Cambridge, CB1 8RN, UK
| | - Jyotsna Batra
- Australian Prostate Cancer Research Centre-Qld, Institute of Health and Biomedical Innovation and School of Biomedical Science, Queensland University of Technology, Brisbane, QLD, 4059, Australia
- Translational Research Institute, Brisbane, QLD, 4102, Australia
| | - David E Neal
- Nuffield Department of Surgical Sciences, University of Oxford, John Radcliffe Hospital, Room 6603, Level 6, Headley Way, Headington, Oxford, OX3 9DU, UK
- Department of Oncology, University of Cambridge, Addenbrooke's Hospital, Hills Road, Box 279, Cambridge, CB2 0QQ, UK
- Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Cambridge, CB2 0RE, UK
| | - Sune F Nielsen
- Faculty of Health and Medical Sciences, University of Copenhagen, 2200, Copenhagen, Denmark
- Department of Clinical Biochemistry, Herlev and Gentofte Hospital, Copenhagen University Hospital, Herlev, 2200, Copenhagen, Denmark
| | - Børge G Nordestgaard
- Faculty of Health and Medical Sciences, University of Copenhagen, 2200, Copenhagen, Denmark
- Department of Clinical Biochemistry, Herlev and Gentofte Hospital, Copenhagen University Hospital, Herlev, 2200, Copenhagen, Denmark
| | - Henrik Grönberg
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, 171 77, Stockholm, Sweden
| | - Fredrik Wiklund
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, 171 77, Stockholm, Sweden
| | - Robert J MacInnis
- Cancer Epidemiology Division, Cancer Council Victoria, 200 Victoria Parade, East Melbourne, 3002, Australia
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Grattan Street, Parkville, VIC, 3010, Australia
| | - Christopher A Haiman
- Center for Genetic Epidemiology, Department of Preventive Medicine, Keck School of Medicine, University of Southern California/Norris Comprehensive Cancer Center, Los Angeles, CA, 90015, USA
| | - Ruth C Travis
- Cancer Epidemiology Unit, Nuffield Department of Population Health, University of Oxford, Oxford, OX3 7LF, UK
| | - Janet L Stanford
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, 98109-1024, USA
- Department of Epidemiology, School of Public Health, University of Washington, Seattle, WA, 98195, USA
| | - Adam S Kibel
- Division of Urologic Surgery, Brigham and Womens Hospital, 75 Francis Street, Boston, MA, 02115, USA
| | - Cezary Cybulski
- International Hereditary Cancer Center, Department of Genetics and Pathology, Pomeranian Medical University, 70-115, Szczecin, Poland
| | - Kay-Tee Khaw
- Clinical Gerontology Unit, University of Cambridge, Cambridge, CB2 2QQ, UK
| | - Christiane Maier
- Humangenetik Tuebingen, Paul-Ehrlich-Str 23, 72076, Tuebingen, Germany
| | - Stephen N Thibodeau
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, 55905, USA
| | - Manuel R Teixeira
- Department of Laboratory Genetics, Portuguese Oncology Institute of Porto (IPO Porto)/Porto Comprehensive Cancer Center, Porto, Portugal
- Cancer Genetics Group, IPO Porto Research Center (CI-IPOP)/RISE@CI-IPOP (Health Research Network), Portuguese Oncology Institute of Porto (IPO Porto)/Porto Comprehensive Cancer Center, Porto, Portugal
- School of Medicine and Biomedical Sciences (ICBAS), University of Porto, Porto, Portugal
| | - Lisa Cannon-Albright
- Division of Epidemiology, Department of Internal Medicine, University of Utah School of Medicine, Salt Lake City, UT, 84132, USA
- George E. Wahlen Department of Veterans Affairs Medical Center, Salt Lake City, UT, 84148, USA
| | - Hermann Brenner
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany
- German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany
- Division of Preventive Oncology, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases (NCT), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany
| | - Radka Kaneva
- Molecular Medicine Center, Department of Medical Chemistry and Biochemistry, Medical University of Sofia, Sofia, 2 Zdrave Str., 1431, Sofia, Bulgaria
| | - Hardev Pandha
- The University of Surrey, Guildford, Surrey, GU2 7XH, UK
| | - Jong Y Park
- Department of Cancer Epidemiology, Moffitt Cancer Center, 12902 Magnolia Drive, Tampa, FL, 33612, USA
| |
Collapse
|
3
|
Tang DY, Mao YJ, Zhao J, Yang J, Li SY, Ren FX, Zheng J. SEEI: spherical evolution with feedback mechanism for identifying epistatic interactions. BMC Genomics 2024; 25:462. [PMID: 38735952 DOI: 10.1186/s12864-024-10373-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 05/03/2024] [Indexed: 05/14/2024] Open
Abstract
BACKGROUND Detecting epistatic interactions (EIs) involves the exploration of associations among single nucleotide polymorphisms (SNPs) and complex diseases, which is an important task in genome-wide association studies. The EI detection problem is dependent on epistasis models and corresponding optimization methods. Although various models and methods have been proposed to detect EIs, identifying EIs efficiently and accurately is still a challenge. RESULTS Here, we propose a linear mixed statistical epistasis model (LMSE) and a spherical evolution approach with a feedback mechanism (named SEEI). The LMSE model expands the existing single epistasis models such as LR-Score, K2-Score, Mutual information, and Gini index. The SEEI includes an adaptive spherical search strategy and population updating strategy, which ensures that the algorithm is not easily trapped in local optima. We analyzed the performances of 8 random disease models, 12 disease models with marginal effects, 30 disease models without marginal effects, and 10 high-order disease models. The 60 simulated disease models and a real breast cancer dataset were used to evaluate eight algorithms (SEEI, EACO, EpiACO, FDHEIW, MP-HS-DHSI, NHSA-DHSC, SNPHarvester, CSE). Three evaluation criteria (pow1, pow2, pow3), a T-test, and a Friedman test were used to compare the performances of these algorithms. The results show that the SEEI algorithm (order 1, averages ranks = 13.125) outperformed the other algorithms in detecting EIs. CONCLUSIONS Here, we propose an LMSE model and an evolutionary computing method (SEEI) to solve the optimization problem of the LMSE model. The proposed method performed better than the other seven algorithms tested in its ability to identify EIs in genome-wide association datasets. We identified new SNP-SNP combinations in the real breast cancer dataset and verified the results. Our findings provide new insights for the diagnosis and treatment of breast cancer. AVAILABILITY AND IMPLEMENTATION https://github.com/scutdy/SSO/blob/master/SEEI.zip .
Collapse
Affiliation(s)
- De-Yu Tang
- Department of Computer Science, School of Mathematics and Informatics, School of Software Engineering, South China Agricultural University, Guangzhou, 510642, PR China.
- School of Medical Information and Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, PR China.
| | - Yi-Jun Mao
- Department of Computer Science, School of Mathematics and Informatics, School of Software Engineering, South China Agricultural University, Guangzhou, 510642, PR China.
| | - Jie Zhao
- School of Management, Guangdong University of Technology, Guangzhou, 510006, PR China
| | - Jin Yang
- School of Medical Information and Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, PR China.
| | - Shi-Yin Li
- School of Medical Information and Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, PR China
| | - Fu-Xiang Ren
- School of Medical Information and Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, PR China
| | - Junxi Zheng
- School of Medical Information and Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, PR China.
| |
Collapse
|
4
|
Li F, Zhao Y, Xu T, Zhang Y. Distributed multi-objective optimization for SNP-SNP interaction detection. Methods 2024; 221:55-64. [PMID: 38061496 DOI: 10.1016/j.ymeth.2023.11.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 11/20/2023] [Accepted: 11/29/2023] [Indexed: 01/16/2024] Open
Abstract
The detection of complex interactions between single nucleotide polymorphisms (SNPs) plays a vital role in genome-wide association analysis (GWAS). The multi-objective evolutionary algorithm is a promising technique for SNP-SNP interaction detection. However, as the scale of SNP data further increases, the exponentially growing search space gradually becomes the dominant factor, causing evolutionary algorithm (EA)-based approaches to fall into local optima. In addition, multi-objective genetic operations consume significant amounts of time and computational resources. To this end, this study proposes a distributed multi-objective evolutionary framework (DM-EF) to identify SNP-SNP interactions on large-scale datasets. DM-EF first partitions the entire search space into several subspaces based on a space-partitioning strategy, which is nondestructive because it guarantees that each feasible solution is assigned to a specific subspace. Thereafter, each subspace is optimized using a multi-objective EA optimizer, and all subspaces are optimized in parallel. A decomposition-based multi-objective firework optimizer (DCFWA) with several problem-guided operators was designed. Finally, the final output is selected from the Pareto-optimal solutions in the historical search of each subspace. DM-EF avoids the preference for a single objective function, handles the heavy computational burden, and enhances the diversity of the population to avoid local optima. Notably, DM-EF is load-balanced and scalable because it can flexibly partition the space according to the number of available computational nodes and problem size. Experiments on both artificial and real-world datasets demonstrate that the proposed method significantly improves the search speed and accuracy.
Collapse
Affiliation(s)
- Fangting Li
- School of Computer Science and Engineering, Northeastern University, Shenyang, China.
| | - Yuhai Zhao
- School of Computer Science and Engineering, Northeastern University, Shenyang, China.
| | - Tongze Xu
- School of Computer Science and Engineering, Northeastern University, Shenyang, China.
| | - Yuhan Zhang
- College of Medicine and Biological information Engineering, Northeastern University, Shenyang, China.
| |
Collapse
|
5
|
Ren F, Li S, Wen Z, Liu Y, Tang D. The Spherical Evolutionary Multi-Objective (SEMO) Algorithm for Identifying Disease Multi-Locus SNP Interactions. Genes (Basel) 2023; 15:11. [PMID: 38275593 PMCID: PMC10815643 DOI: 10.3390/genes15010011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 11/21/2023] [Accepted: 12/18/2023] [Indexed: 01/27/2024] Open
Abstract
Single-nucleotide polymorphisms (SNPs), as disease-related biogenetic markers, are crucial in elucidating complex disease susceptibility and pathogenesis. Due to computational inefficiency, it is difficult to identify high-dimensional SNP interactions efficiently using combinatorial search methods, so the spherical evolutionary multi-objective (SEMO) algorithm for detecting multi-locus SNP interactions was proposed. The algorithm uses a spherical search factor and a feedback mechanism of excellent individual history memory to enhance the balance between search and acquisition. Moreover, a multi-objective fitness function based on the decomposition idea was used to evaluate the associations by combining two functions, K2-Score and LR-Score, as an objective function for the algorithm's evolutionary iterations. The performance evaluation of SEMO was compared with six state-of-the-art algorithms on a simulated dataset. The results showed that SEMO outperforms the comparative methods by detecting SNP interactions quickly and accurately with a shorter average run time. The SEMO algorithm was applied to the Wellcome Trust Case Control Consortium (WTCCC) breast cancer dataset and detected two- and three-point SNP interactions that were significantly associated with breast cancer, confirming the effectiveness of the algorithm. New combinations of SNPs associated with breast cancer were also identified, which will provide a new way to detect SNP interactions quickly and accurately.
Collapse
Affiliation(s)
- Fuxiang Ren
- College of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou 510006, China; (F.R.); (S.L.); (Y.L.)
| | - Shiyin Li
- College of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou 510006, China; (F.R.); (S.L.); (Y.L.)
| | - Zihao Wen
- College of Mathematics and Informatics, College of Software Engineering, South China Agricultural University, Guangzhou 510642, China
- Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Yidi Liu
- College of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou 510006, China; (F.R.); (S.L.); (Y.L.)
| | - Deyu Tang
- College of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou 510006, China; (F.R.); (S.L.); (Y.L.)
- College of Mathematics and Informatics, College of Software Engineering, South China Agricultural University, Guangzhou 510642, China
| |
Collapse
|
6
|
Wang G, Ott J. Digenic Analysis Finds Highly Interactive Genetic Variants Underlying Polygenic Traits. MEDICAL RESEARCH ARCHIVES 2023; 11:10.18103/mra.v11i10.4604. [PMID: 38882238 PMCID: PMC11177775 DOI: 10.18103/mra.v11i10.4604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2024]
Abstract
We briefly review our recently published approach to mining digenic genotype patterns, which consist of two genotypes each originating in a different DNA variant. We do this for a genetic case-control study by evaluating all possible pairs of genotypes, distributing the workload over numerous CPUs (threads) in a high-performance computing environment and apply our methods to two known datasets, age-related macular degeneration (AMD) and Parkinson Disease (PD). Based on a list of (e.g., 100,000) genotype pairs with largest genotype pair frequency differences between cases and controls, we determine the numberN u of unique variants occurring in this list. For each unique variant, we find the number of genotype pairs it participates in, which identifies a set of variants "connected" with the given unique variant. Among the total of variants "connected" with all unique variants, only a subset of variants is unique. The ratio of all connected variants divided by that subset of variants is a measure for the overall density or connectedness of variants interacting with each other. We find that variants for the AMD data are much more interconnected than those for PD, at least based on the 100,000 genotype pairs with largest chi-square we investigated. Further, for each of theN u unique variants, we use the number of variants connected with it as a test statistic, weighted by the inverse of the rank at which the unique variant first occurred in the original list of genotype patterns. This weighing scheme ties the number of connections to the genetics of the trait and allows us to obtain, for each of theN u unique variants, an empirical significance level by permuting ranks. We find 12 and 8 significant, highly connected variants for AMD and PD, respectively, some of which have previously been identified by other machine learning methods, thus providing credence to our approach. Among the 100,000 genotype pairs investigated for each of AMD and PD, significant variants showed connections with up to 7,093 and 3,777 other variants, respectively. Our approach has been implemented in a freely available piece of software, the Digenic Network Test. Thus, our statistical genetics method can provide important information on the genetic architecture of polygenic traits.
Collapse
Affiliation(s)
| | - Jurg Ott
- Rockefeller University, New York
| |
Collapse
|
7
|
MDSN: A Module Detection Method for Identifying High-Order Epistatic Interactions. Genes (Basel) 2022; 13:genes13122403. [PMID: 36553670 PMCID: PMC9778340 DOI: 10.3390/genes13122403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/14/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2022] Open
Abstract
Epistatic interactions are referred to as SNPs (single nucleotide polymorphisms) that affect disease development and trait expression nonlinearly, and hence identifying epistatic interactions plays a great role in explaining the pathogenesis and genetic heterogeneity of complex diseases. Many methods have been proposed for epistasis detection; nevertheless, they mainly focus on low-order epistatic interactions, two-order or three-order for instance, and often ignore high-order interactions due to computational burden. In this paper, a module detection method called MDSN is proposed for identifying high-order epistatic interactions. First, an SNP network is constructed by a construction strategy of interaction complementary, which consists of low-order SNP interactions that can be obtained from fast computations. Then, a node evaluation measure that integrates multi-topological features is proposed to improve the node expansion algorithm, where the importance of a node is comprehensively evaluated by the topological characteristics of the neighborhood. Finally, modules are detected in the constructed SNP network, which have high-order epistatic interactions associated with the disease. The MDSN was compared with four state-of-the-art methods on simulation datasets and a real Age-related Macular Degeneration dataset. The results demonstrate that MDSN has higher performance on detecting high-order interactions.
Collapse
|
8
|
Tuo S, Li C, Liu F, Zhu Y, Chen T, Feng Z, Liu H, Li A. A Novel Multitasking Ant Colony Optimization Method for Detecting Multiorder SNP Interactions. Interdiscip Sci 2022; 14:814-832. [PMID: 35788965 DOI: 10.1007/s12539-022-00530-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 05/29/2022] [Accepted: 06/01/2022] [Indexed: 06/15/2023]
Abstract
MOTIVATION Linear or nonlinear interactions of multiple single-nucleotide polymorphisms (SNPs) play an important role in understanding the genetic basis of complex human diseases. However, combinatorial analytics in high-dimensional space makes it extremely challenging to detect multiorder SNP interactions. Most classic approaches can only perform one task (for detecting k-order SNP interactions) in each run. Since prior knowledge of a complex disease is usually not available, it is difficult to determine the value of k for detecting k-order SNP interactions. METHODS A novel multitasking ant colony optimization algorithm (named MTACO-DMSI) is proposed to detect multiorder SNP interactions, and it is divided into two stages: searching and testing. In the searching stage, multiple multiorder SNP interaction detection tasks (from 2nd-order to kth-order) are executed in parallel, and two subpopulations that separately adopt the Bayesian network-based K2-score and Jensen-Shannon divergence (JS-score) as evaluation criteria are generated for each task to improve the global search capability and the discrimination ability for various disease models. In the testing stage, the G test statistical test is adopted to further verify the authenticity of candidate solutions to reduce the error rate. RESULT Three multiorder simulated disease models with different interaction effects and three real age-related macular degeneration (AMD), rheumatoid arthritis (RA) and type 1 diabetes (T1D) datasets were used to investigate the performance of the proposed MTACO-DMSI. The experimental results show that the MTACO-DMSI has a faster search speed and higher discriminatory power for diverse simulation disease models than traditional single-task algorithms. The results on real AMD data and RA and T1D datasets indicate that MTACO-DMSI has the ability to detect multiorder SNP interactions at a genome-wide scale. Availability and implementation: https://github.com/shouhengtuo/MTACO-DMSI/.
Collapse
Affiliation(s)
- Shouheng Tuo
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China.
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China.
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China.
| | - Chao Li
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - Fan Liu
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - YanLing Zhu
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - TianRui Chen
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - ZengYu Feng
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - Haiyan Liu
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - Aimin Li
- School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, 710048, Shaanxi, China
| |
Collapse
|
9
|
Abdulkhaleq MT, Rashid TA, Alsadoon A, Hassan BA, Mohammadi M, Abdullah JM, Chhabra A, Ali SL, Othman RN, Hasan HA, Azad S, Mahmood NA, Abdalrahman SS, Rasul HO, Bacanin N, Vimal S. Harmony search: Current studies and uses on healthcare systems. Artif Intell Med 2022; 131:102348. [DOI: 10.1016/j.artmed.2022.102348] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 05/08/2022] [Accepted: 06/30/2022] [Indexed: 11/29/2022]
|
10
|
Tuo S, Li C, Liu F, Li A, He L, Geem ZW, Shang J, Liu H, Zhu Y, Feng Z, Chen T. MTHSA-DHEI: multitasking harmony search algorithm for detecting high-order SNP epistatic interactions. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00813-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
AbstractGenome-wide association studies have succeeded in identifying genetic variants associated with complex diseases, but the findings have not been well interpreted biologically. Although it is widely accepted that epistatic interactions of high-order single nucleotide polymorphisms (SNPs) [(1) Single nucleotide polymorphisms (SNP) are mainly deoxyribonucleic acid (DNA) sequence polymorphisms caused by variants at a single nucleotide at the genome level. They are the most common type of heritable variation in humans.] are important causes of complex diseases, the combinatorial explosion of millions of SNPs and multiple tests impose a large computational burden. Moreover, it is extremely challenging to correctly distinguish high-order SNP epistatic interactions from other high-order SNP combinations due to small sample sizes. In this study, a multitasking harmony search algorithm (MTHSA-DHEI) is proposed for detecting high-order epistatic interactions [(2) In classical genetics, if genes X1 and X2 are mutated and each mutation by itself produces a unique disease status (phenotype) but the mutations together cause the same disease status as the gene X1 mutation, gene X1 is epistatic and gene X2 is hypostatic, and gene X1 has an epistatic effect (main effect) on disease status. In this work, a high-order epistatic interaction occurs when two or more SNP loci have a joint influence on disease status.], with the goal of simultaneously detecting multiple types of high-order (k1-order, k2-order, …, kn-order) SNP epistatic interactions. Unified coding is adopted for multiple tasks, and four complementary association evaluation functions are employed to improve the capability of discriminating the high-order SNP epistatic interactions. We compare the proposed MTHSA-DHEI method with four excellent methods for detecting high-order SNP interactions for 8 high-orderepistatic interaction models with no marginal effect (EINMEs) and 12 epistatic interaction models with marginal effects (EIMEs) (*) and implement the MTHSA-DHEI algorithm with a real dataset: age-related macular degeneration (AMD). The experimental results indicate that MTHSA-DHEI has power and an F1-score exceeding 90% for all EIMEs and five EINMEs and reduces the computational time by more than 90%. It can efficiently perform multiple high-order detection tasks for high-order epistatic interactions and improve the discrimination ability for diverse epistasis models.
Collapse
|
11
|
Detecting genetic epistasis by differential departure from independence. Mol Genet Genomics 2022; 297:911-924. [PMID: 35606612 DOI: 10.1007/s00438-022-01893-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 03/27/2022] [Indexed: 10/18/2022]
Abstract
Countering prior beliefs that epistasis is rare, genomics advancements suggest the other way. Current practice often filters out genomic loci with low variant counts before detecting epistasis. We argue that this practice is far from optimal because it can throw away strong epistatic patterns. Instead, we present the compensated Sharma-Song test to infer genetic epistasis in genome-wide association studies by differential departure from independence. The test does not require a minimum number of replicates for each variant. We also introduce algorithms to simulate epistatic patterns that differentially depart from independence. Using two simulators, the test performed comparably to the original Sharma-Song test when variant frequencies at a locus are marginally uniform; encouragingly, it has a marked advantage over alternatives when variant frequencies are marginally nonuniform. The test further revealed uniquely clean epistatic variants associated with chicken abdominal fat content that are not prioritized by other methods. Genes involved in most numbers of inferred epistasis between single nucleotide polymorphisms (SNPs) belong to pathways known for obesity regulation; many top SNPs are located on chromosome 20 and in intergenic regions. Measuring differential departure from independence, the compensated Sharma-Song test offers a practical choice for studying epistasis robust to nonuniform genetic variant frequencies.
Collapse
|
12
|
Multi-Objective Artificial Bee Colony Algorithm Based on Scale-Free Network for Epistasis Detection. Genes (Basel) 2022; 13:genes13050871. [PMID: 35627256 PMCID: PMC9140669 DOI: 10.3390/genes13050871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2022] [Revised: 04/30/2022] [Accepted: 05/10/2022] [Indexed: 12/04/2022] Open
Abstract
In genome-wide association studies, epistasis detection is of great significance for the occurrence and diagnosis of complex human diseases, but it also faces challenges such as high dimensionality and a small data sample size. In order to cope with these challenges, several swarm intelligence methods have been introduced to identify epistasis in recent years. However, the existing methods still have some limitations, such as high-consumption and premature convergence. In this study, we proposed a multi-objective artificial bee colony (ABC) algorithm based on the scale-free network (SFMOABC). The SFMOABC incorporates the scale-free network into the ABC algorithm to guide the update and selection of solutions. In addition, the SFMOABC uses mutual information and the K2-Score of the Bayesian network as objective functions, and the opposition-based learning strategy is used to improve the search ability. Experiments were performed on both simulation datasets and a real dataset of age-related macular degeneration (AMD). The results of the simulation experiments showed that the SFMOABC has better detection power and efficiency than seven other epistasis detection methods. In the real AMD data experiment, most of the single nucleotide polymorphism combinations detected by the SFMOABC have been shown to be associated with AMD disease. Therefore, SFMOABC is a promising method for epistasis detection.
Collapse
|
13
|
A Secure High-Order Gene Interaction Detecting Method for Infectious Diseases. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:4471736. [PMID: 35495886 PMCID: PMC9050263 DOI: 10.1155/2022/4471736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 03/01/2022] [Indexed: 12/04/2022]
Abstract
Infectious diseases pose a serious threat to human life, the Genome Wide Association Studies (GWAS) can analyze susceptibility genes of infectious diseases from the genetic level and carry out targeted prevention and treatment. The susceptibility genes for infectious diseases often act in combination with multiple susceptibility sites; therefore, high-order epistasis detection has become an important means. However, due to intensive computational burden and diversity of disease models, existing methods have drawbacks on low detection power, high computation cost, and preference for some types of disease models. Furthermore, these methods are exposed to repeated query and model inversion attacks in the process of iterative optimization, which may disclose Single Nucleotide Polymorphism (SNP) information associated with individual privacy. Therefore, in order to solve these problems, this paper proposed a safe harmony search algorithm for high-order gene interaction detection, termed as HS-DP. Firstly, the linear weighting method was used to integrate 5 objective functions to screen out high-order SNP sets with high correlation, including K2-Score, JS divergence, logistic regression, mutual information, and Gini. Then, based on the Differential Privacy (DP) theory, the function disturbance mechanism was introduced to protect the security of individual privacy information associated with the objective function, and we proved the rationality of the disturbance mechanism theoretically. Finally, the practicability and superiority of the algorithm were verified by experiments. Experimental results showed that the algorithm proposed in this paper could improve the detection accuracy to the greatest extent while guaranteeing privacy.
Collapse
|
14
|
Ponte-Fernandez C, Gonzalez-Dominguez J, Carvajal-Rodriguez A, Martin MJ. Evaluation of Existing Methods for High-Order Epistasis Detection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:912-926. [PMID: 33055017 DOI: 10.1109/tcbb.2020.3030312] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Finding epistatic interactions among loci when expressing a phenotype is a widely employed strategy to understand the genetic architecture of complex traits in GWAS. The abundance of methods dedicated to the same purpose, however, makes it increasingly difficult for scientists to decide which method is more suitable for their studies. This work compares the different epistasis detection methods published during the last decade in terms of runtime, detection power and type I error rate, with a special emphasis on high-order interactions. Results show that in terms of detection power, the only methods that perform well across all experiments are the exhaustive methods, although their computational cost may be prohibitive in large-scale studies. Regarding non-exhaustive methods, not one could consistently find epistasis interactions when marginal effects are absent. If marginal effects are present, there are methods that perform well for high-order interactions, such as BADTrees, FDHE-IW, SingleMI or SNPHarvester. As for false-positive control, only SNPHarvester, FDHE-IW and DCHE show good results. The study concludes that there is no single epistasis detection method to recommend in all scenarios. Authors should prioritize exhaustive methods when sufficient computational resources are available considering the data set size, and resort to non-exhaustive methods when the analysis time is prohibitive.
Collapse
|
15
|
Peng YZ, Lin Y, Huang Y, Li Y, Luo G, Liao J. GEP-EpiSeeker: a gene expression programming-based method for epistatic interaction detection in genome-wide association studies. BMC Genomics 2021; 22:910. [PMID: 34930147 PMCID: PMC8686218 DOI: 10.1186/s12864-021-08207-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Accepted: 11/24/2021] [Indexed: 11/10/2022] Open
Abstract
Background Identification of epistatic interactions provides a systematic way for exploring associations among different single nucleotide polymorphism (SNP) and complex diseases. Although considerable progress has been made in epistasis detection, efficiently and accurately identifying epistatic interactions remains a challenge due to the intensive growth of measuring SNP combinations. Results In this work, we formulate the detection of epistatic interactions by a combinational optimization problem, and propose a novel evolutionary-based framework, called GEP-EpiSeeker, to detect epistatic interactions using Gene Expression Programming. In GEP-EpiSeeker, we propose several tailor-made chromosome rules to describe SNP combinations, and incorporate Bayesian network-based fitness evaluation into the evolution of tailor-made chromosomes to find suspected SNP combinations, and adopt the Chi-square test to identify optimal solutions from suspected SNP combinations. Moreover, to improve the convergence and accuracy of the algorithm, we design two genetic operators with multiple and adjacent mutations and an adaptive genetic manipulation method with fuzzy control to efficiently manipulate the evolution of tailor-made chromosomes. We compared GEP-EpiSeeker with state-of-the-art methods including BEAM, BOOST, AntEpiSeeker, MACOED, and EACO in terms of power, recall, precision and F1-score on the GWAS datasets of 12 DME disease models and 10 DNME disease models. Our experimental results show that GEP-EpiSeeker outperforms comparative methods. Conclusions Here we presented a novel method named GEP-EpiSeeker, based on the Gene Expression Programming algorithm, to identify epistatic interactions in Genome-wide Association Studies. The results indicate that GEP-EpiSeeker could be a promising alternative to the existing methods in epistasis detection and will provide a new way for accurately identifying epistasis.
Collapse
Affiliation(s)
- Yu Zhong Peng
- School of Computer & Information Engineering, Nanning Normal University, Nanning, 530001, China.,School of Computer science, Fudan University, Shanghai, 200433, China
| | - Yanmei Lin
- School of Computer & Information Engineering, Nanning Normal University, Nanning, 530001, China
| | - Yiran Huang
- School of Computer and Electronics and Information, Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, 530004, China.
| | - Ying Li
- School of Computer & Information Engineering, Nanning Normal University, Nanning, 530001, China
| | - Guangsheng Luo
- School of Computer science, Fudan University, Shanghai, 200433, China
| | - Jianping Liao
- School of Computer & Information Engineering, Nanning Normal University, Nanning, 530001, China.
| |
Collapse
|
16
|
Yilmaz S, Tastan O, Cicek AE. SPADIS: An Algorithm for Selecting Predictive and Diverse SNPs in GWAS. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1208-1216. [PMID: 31443041 DOI: 10.1109/tcbb.2019.2935437] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Phenotypic heritability of complex traits and diseases is seldom explained by individual genetic variants identified in genome-wide association studies (GWAS). Many methods have been developed to select a subset of variant loci, which are associated with or predictive of the phenotype. Selecting connected SNPs on SNP-SNP networks have been proven successful in finding biologically interpretable and predictive SNPs. However, we argue that the connectedness constraint favors selecting redundant features that affect similar biological processes and therefore does not necessarily yield better predictive performance. In this paper, we propose a novel method called SPADIS that favors the selection of remotely located SNPs in order to account for their complementary effects in explaining a phenotype. SPADIS selects a diverse set of loci on a SNP-SNP network. This is achieved by maximizing a submodular set function with a greedy algorithm that ensures a constant factor approximation to the optimal solution. We compare SPADIS to the state-of-the-art method SConES, on a dataset of Arabidopsis Thaliana with continuous flowering time phenotypes. SPADIS has better average phenotype prediction performance in 15 out of 17 phenotypes when the same number of SNPs are selected and provides consistent improvements across multiple networks and settings on average. Moreover, it identifies more candidate genes and runs faster.
Collapse
|
17
|
Tuo S, Liu H, Chen H. Multipopulation harmony search algorithm for the detection of high-order SNP interactions. Bioinformatics 2021; 36:4389-4398. [PMID: 32227192 DOI: 10.1093/bioinformatics/btaa215] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Revised: 01/01/2020] [Accepted: 03/24/2020] [Indexed: 01/23/2023] Open
Abstract
MOTIVATION Recently, multiobjective swarm intelligence optimization (SIO) algorithms have attracted considerable attention as disease model-free methods for detecting high-order single nucleotide polymorphism (SNP) interactions. However, a strict Pareto optimal set may filter out some of the SNP combinations associated with disease status. Furthermore, the lack of heuristic factors for finding SNP interactions and the preference for discrimination approaches to disease models are considerable challenges for SIO. In this study, we propose a multipopulation harmony search (HS) algorithm dedicated to the detection of high-order SNP interactions (MP-HS-DHSI). This method consists of three stages. In the first stage, HS with multipopulation (multiharmony memories) is used to discover a set of candidate high-order SNP combinations having an association with disease status. In HS, multiple criteria [Bayesian network-based K2-score, Jensen-Shannon divergence, likelihood ratio and normalized distance with joint entropy (ND-JE)] are adopted by four harmony memories to improve the ability to discriminate diverse disease models. A novel evaluation criterion named ND-JE is proposed to guide HS to explore clues for high-order SNP interactions. In the second and third stages, the G-test statistical method and multifactor dimensionality reduction are employed to verify the authenticity of the candidate solutions, respectively. RESULTS We compared MP-HS-DHSI with four state-of-the-art SIO algorithms for detecting high-order SNP interactions for 20 simulation disease models and a real dataset of age-related macular degeneration. The experimental results revealed that our proposed method can accelerate the search speed efficiently and enhance the discrimination ability of diverse epistasis models. AVAILABILITY AND IMPLEMENTATION https://github.com/shouhengtuo/MP-HS-DHSI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shouheng Tuo
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, Shaanxi 710121, China
| | - Haiyan Liu
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, Shaanxi 710121, China
| | - Hao Chen
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, Shaanxi 710121, China
| |
Collapse
|
18
|
A differential evolution based feature combination selection algorithm for high-dimensional data. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.08.081] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
19
|
A New Method for Analyzing the Performance of the Harmony Search Algorithm. MATHEMATICS 2020. [DOI: 10.3390/math8091421] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
A harmony search (HS) algorithm for solving high-dimensional multimodal optimization problems (named DIHS) was proposed in 2015 and showed good performance, in which a dynamic-dimensionality-reduction strategy is employed to maintain a high update success rate of harmony memory (HM). However, an extreme assumption was adopted in the DIHS that is not reasonable, and its analysis for the update success rate is not sufficiently accurate. In this study, we reanalyzed the update success rate of HS and now present a more valid method for analyzing the update success rate of HS. In the new analysis, take-k and take-all strategies that are employed to generate new solutions are compared to the update success rate, and the average convergence rate of algorithms is also analyzed. The experimental results demonstrate that the HS based on the take-k strategy is efficient and effective at solving some complex high-dimensional optimization problems.
Collapse
|
20
|
Yin Y, Guan B, Zhao Y, Li Y. SAMA: A Fast Self-Adaptive Memetic Algorithm for Detecting SNP-SNP Interactions Associated with Disease. BIOMED RESEARCH INTERNATIONAL 2020; 2020:5610658. [PMID: 32908899 PMCID: PMC7468611 DOI: 10.1155/2020/5610658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Accepted: 07/13/2020] [Indexed: 11/29/2022]
Abstract
Detecting SNP-SNP interactions associated with disease is significant in genome-wide association study (GWAS). Owing to intensive computational burden and diversity of disease models, existing methods have drawbacks on low detection power and long running time. To tackle these drawbacks, a fast self-adaptive memetic algorithm (SAMA) is proposed in this paper. In this method, the crossover, mutation, and selection of standard memetic algorithm are improved to make SAMA adapt to the detection of SNP-SNP interactions associated with disease. Furthermore, a self-adaptive local search algorithm is introduced to enhance the detecting power of the proposed method. SAMA is evaluated on a variety of simulated datasets and a real-world biological dataset, and a comparative study between it and the other four methods (FHSA-SED, AntEpiSeeker, IEACO, and DESeeker) that have been developed recently based on evolutionary algorithms is performed. The results of extensive experiments show that SAMA outperforms the other four compared methods in terms of detection power and running time.
Collapse
Affiliation(s)
- Ying Yin
- Key Laboratory of Intelligent Computing in Medical Image, Minister of Education, and School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
| | - Boxin Guan
- Key Laboratory of Intelligent Computing in Medical Image, Minister of Education, and School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
| | - Yuhai Zhao
- Key Laboratory of Intelligent Computing in Medical Image, Minister of Education, and School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
| | - Yuan Li
- School of Information Science and Technology, North China University of Technology, Beijing 100144, China
| |
Collapse
|
21
|
Sun Y, Wang X, Shang J, Liu JX, Zheng CH, Lei X. Introducing Heuristic Information Into Ant Colony Optimization Algorithm for Identifying Epistasis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1253-1261. [PMID: 30403637 DOI: 10.1109/tcbb.2018.2879673] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Epistasis learning, which is aimed at detecting associations between multiple Single Nucleotide Polymorphisms (SNPs) and complex diseases, has gained increasing attention in genome wide association studies. Although much work has been done on mapping the SNPs underlying complex diseases, there is still difficulty in detecting epistatic interactions due to the lack of heuristic information to expedite the search process. In this study, a method EACO is proposed to detect epistatic interactions based on the ant colony optimization (ACO) algorithm, the highlights of which are the introduced heuristic information, fitness function, and a candidate solutions filtration strategy. The heuristic information multi-SURF* is introduced into EACO for identifying epistasis, which is incorporated into ant-decision rules to guide the search with linear time. Two functionally complementary fitness functions, mutual information and the Gini index, are combined to effectively evaluate the associations between SNP combinations and the phenotype. Furthermore, a strategy for candidate solutions filtration is provided to adaptively retain all optimal solutions which yields a more accurate way for epistasis searching. Experiments of EACO, as well as three ACO based methods (AntEpiSeeker, MACOED, and epiACO) and four commonly used methods (BOOST, SNPRuler, TEAM, and epiMODE) are performed on both simulation data sets and a real data set of age-related macular degeneration. Results indicate that EACO is promising in identifying epistasis.
Collapse
|
22
|
Li X, Zhang S, Wong KC. Nature-Inspired Multiobjective Epistasis Elucidation from Genome-Wide Association Studies. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:226-237. [PMID: 29994485 DOI: 10.1109/tcbb.2018.2849759] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In recent years, the detection of epistatic interactions of multiple genetic variants on the causes of complex diseases brings a significant challenge in genome-wide association studies (GWAS). However, most of the existing methods still suffer from algorithmic limitations such as single-objective optimization, intensive computational requirement, and premature convergence. In this paper, we propose and formulate an epistatic interaction multi-objective artificial bee colony algorithm based on decomposition (EIMOABC/D) to address those problems for genetic interaction detection in genome-wide association studies. First, to direct the genetic interaction detection, two objective functions are formulated to characterize various epistatic models; rank probability model is proposed to sort each population into different nondomination levels based on the fast nondominated sorting approach. After that, the mutual information based local search algorithm is proposed to guide the population search for disease model evaluations in an unbiased manner. To validate the effectiveness of EIMOABC/D, we compare EIMOABC/D against seven state-of-the-art methods on 77 epistatic models including eight small-scale epistatic models with marginal effects, eight large-scale epistatic models with marginal effects, 60 large-scale epistatic models without any marginal effect, and one case study. The experimental results indicate that our proposed algorithm EIMOABC/D outperforms seven state-of-the-art methods on those epistatic models. Furthermore, time complexity analysis and parameter analysis are conducted to demonstrate various properties of our proposed algorithm.
Collapse
|
23
|
A statistical approach for high order epistasis interaction detection for prediction of diabetic macular edema. INFORMATICS IN MEDICINE UNLOCKED 2020. [DOI: 10.1016/j.imu.2020.100362] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
24
|
Sun L, Liu G, Su L, Wang R. HS-MMGKG: A Fast Multi-objective Harmony Search Algorithm for Two-locus Model Detection in GWAS. Curr Bioinform 2019. [DOI: 10.2174/1574893614666190409110843] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Background::
Genome-Wide Association Study (GWAS) plays a very important role in
identifying the causes of a disease. Because most of the existing methods for genetic-interaction
detection in GWAS are designed for a single-correlation model, their performances vary
considerably for different disease models. These methods usually have high computation cost and
low accuracy.
Method::
We present a new multi-objective heuristic optimization methodology named HSMMGKG
for detecting genetic interactions. In HS-MMGKG, we use harmony search with five
objective functions to improve the efficiency and accuracy. A new strategy based on p-value and
MDR is adopted to generate more reasonable results. The Boolean representation in BOOST is
modified to calculate the five functions rapidly. These strategies take less time complexity and
have higher accuracy while detecting the potential models.
Results::
We compared HS-MMGKG with CSE, MACOED and FHSA-SED using 26 simulated
datasets. The experimental results demonstrate that our method outperforms others in accuracy and
computation time. Our method has identified many two-locus SNP combinations that are
associated with seven diseases in WTCCC dataset. Some of the SNPs have direct evidence in CTD
database. The results may be helpful to further explain the pathogenesis.
Conclusion::
It is anticipated that our proposed algorithm could be used in GWAS which is helpful
in understanding disease mechanism, diagnosis and prognosis.
Collapse
Affiliation(s)
- Liyan Sun
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China
| | - Guixia Liu
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China
| | - Lingtao Su
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China
| | - Rongquan Wang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China
| |
Collapse
|
25
|
Guan B, Zhao Y. Self-Adjusting Ant Colony Optimization Based on Information Entropy for Detecting Epistatic Interactions. Genes (Basel) 2019; 10:genes10020114. [PMID: 30717303 PMCID: PMC6409693 DOI: 10.3390/genes10020114] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 01/21/2019] [Accepted: 01/28/2019] [Indexed: 12/15/2022] Open
Abstract
The epistatic interactions of single nucleotide polymorphisms (SNPs) are considered to be an important factor in determining the susceptibility of individuals to complex diseases. Although many methods have been proposed to detect such interactions, the development of detection algorithm is still ongoing due to the computational burden in large-scale association studies. In this paper, to deal with the intensive computing problem of detecting epistatic interactions in large-scale datasets, a self-adjusting ant colony optimization based on information entropy (IEACO) is proposed. The algorithm can automatically self-adjust the path selection strategy according to the real-time information entropy. The performance of IEACO is compared with that of ant colony optimization (ACO), AntEpiSeeker, AntMiner, and epiACO on a set of simulated datasets and a real genome-wide dataset. The results of extensive experiments show that the proposed method is superior to the other methods.
Collapse
Affiliation(s)
- Boxin Guan
- Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, and School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China.
| | - Yuhai Zhao
- Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, and School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China.
| |
Collapse
|
26
|
Sun L, Liu G, Su L, Wang R. SEE: a novel multi-objective evolutionary algorithm for identifying SNP epistasis in genome-wide association studies. BIOTECHNOL BIOTEC EQ 2019. [DOI: 10.1080/13102818.2019.1593052] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022] Open
Affiliation(s)
- Liyan Sun
- Department of Computational Intelligence, College of Computer Science and Technology, Jilin University, Changchun, P.R. China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, P.R. China
| | - Guixia Liu
- Department of Computational Intelligence, College of Computer Science and Technology, Jilin University, Changchun, P.R. China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, P.R. China
| | - Lingtao Su
- Department of Computational Intelligence, College of Computer Science and Technology, Jilin University, Changchun, P.R. China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, P.R. China
| | - Rongquan Wang
- Department of Computational Intelligence, College of Computer Science and Technology, Jilin University, Changchun, P.R. China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, P.R. China
| |
Collapse
|
27
|
FDHE-IW: A Fast Approach for Detecting High-Order Epistasis in Genome-Wide Case-Control Studies. Genes (Basel) 2018; 9:genes9090435. [PMID: 30158504 PMCID: PMC6162554 DOI: 10.3390/genes9090435] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Revised: 08/16/2018] [Accepted: 08/16/2018] [Indexed: 12/13/2022] Open
Abstract
Detecting high-order epistasis in genome-wide association studies (GWASs) is of importance when characterizing complex human diseases. However, the enormous numbers of possible single-nucleotide polymorphism (SNP) combinations and the diversity among diseases presents a significant computational challenge. Herein, a fast method for detecting high-order epistasis based on an interaction weight (FDHE-IW) method is evaluated in the detection of SNP combinations associated with disease. First, the symmetrical uncertainty (SU) value for each SNP is calculated. Then, the top-k SNPs are isolated as guiders to identify 2-way SNP combinations with significant interaction weight values. Next, a forward search is employed to detect high-order SNP combinations with significant interaction weight values as candidates. Finally, the findings were statistically evaluated using a G-test to isolate true positives. The developed algorithm was used to evaluate 12 simulated datasets and an age-related macular degeneration (AMD) dataset and was shown to perform robustly in the detection of some high-order disease-causing models.
Collapse
|