1
|
Luo KH, Wu CH, Yang CC, Chen TH, Tu HP, Yang CH, Chuang HY. Exploring the association of metal mixture in blood to the kidney function and tumor necrosis factor alpha using machine learning methods. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2023; 265:115528. [PMID: 37783110 DOI: 10.1016/j.ecoenv.2023.115528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Revised: 09/09/2023] [Accepted: 09/25/2023] [Indexed: 10/04/2023]
Abstract
This research aimed to approach relationships between metal mixture in blood and kidney function, tumor necrosis factor alpha (TNF-α) by machine learning. Metals levels were measured by Inductively Couple Plasma Mass Spectrometry in blood from 421 participants. We applied K Nearest Neighbor (KNN), Naive Bayes classifier (NB), Support Vector Machines (SVM), random forest (RF), Gradient Boosting Decision Tree (GBDT), Categorical boosting (CatBoost), eXtreme Gradient Boosting (XGBoost), Whale Optimization-based XGBoost (WXGBoost) to identify the effect of plasma metals, TNF-α, and estimated glomerular filtration rate (eGFR by CKD-EPI equation). We conducted not only toxic metals, lead (Pb), arsenic (As), cadmium (Cd) but also included trace essential metals, selenium (Se), copper (Cu), zinc (Zn), cobalt (Co), to predict the interaction of TNF-α, TNF-α/white blood count, and eGFR. The high average TNF-α level group was observed among subjects with higher Pb, As, Cd, Cu, and Zn levels in blood. No associations were shown between the low and high TNF-α level group in blood Se and Co levels. Those with lower eGFR group had high Pb, As, Cd, Co, Cu, and Zn levels. The crucial predictor of TNF-α level in metals was blood Pb, and then Cd, As, Cu, Se, Zn and Co. The machine learning revealed that As was the major role among predictors of eGFR after feature selection. The levels of kidney function and TNF-α were modified by co-exposure metals. We were able to acquire highest accuracy of over 85% in the multi-metals exposure model. The higher Pb and Zn levels had strongest interaction with declined eGFR. In addition, As and Cd had synergistic with prediction model of TNF-α. We explored the potential of machine learning approaches for predicting health outcomes with multi-metal exposure. XGBoost model added SHAP could give an explicit explanation of individualized and precision risk prediction and insight of the interaction of key features in the multi-metal exposure.
Collapse
Affiliation(s)
- Kuei-Hau Luo
- Graduate Institute of Medicine, College of Medicine, Kaohsiung Medicine University, Kaohsiung City 807, Taiwan
| | - Chih-Hsien Wu
- Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung 80778, Taiwan
| | - Chen-Cheng Yang
- Graduate Institute of Medicine, College of Medicine, Kaohsiung Medicine University, Kaohsiung City 807, Taiwan; Department of Occupational Medicine, Kaohsiung Municipal Siaogang Hospital, Kaohsiung Medical University, Kaohsiung 812, Taiwan
| | - Tzu-Hua Chen
- Department of Family Medicine, Kaohsiung Municipal Ta-Tung Hospital, Kaohsiung 801, Taiwan
| | - Hung-Pin Tu
- Department of Public Health and Environmental Medicine, School of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung 807, Taiwan
| | - Cheng-Hong Yang
- Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung 80778, Taiwan; Department of Information Management, Tainan University of Technology, Tainan 71002, Taiwan; Drug Development and Value Creation Research Center, Kaohsiung Medical University, Kaohsiung 80708, Taiwan; Ph. D. Program in Biomedical Engineering, Kaohsiung Medical University, Kaohsiung 80708, Taiwan; School of Dentistry, Kaohsiung Medical University, Kaohsiung 80708, Taiwan
| | - Hung-Yi Chuang
- Graduate Institute of Medicine, College of Medicine, Kaohsiung Medicine University, Kaohsiung City 807, Taiwan; Department of Public Health and Environmental Medicine, School of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung 807, Taiwan; Department of Occupational and Environmental Medicine, Kaohsiung Medicine University Hospital, Kaohsiung Medicine University, Kaohsiung City 807, Taiwan; Ph.D. Program in Environmental and Occupational Medicine, and Research Center for Precision Environmental Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung 807, Taiwan.
| |
Collapse
|
2
|
Yang CH, Wu KC, Chuang LY, Chang HW. DeepBarcoding: Deep Learning for Species Classification Using DNA Barcoding. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2158-2165. [PMID: 33600318 DOI: 10.1109/tcbb.2021.3056570] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
DNA barcodes with short sequence fragments are used for species identification. Because of advances in sequencing technologies, DNA barcodes have gradually been emphasized. DNA sequences from different organisms are easily and rapidly acquired. Therefore, DNA sequence analysis tools play an increasingly crucial role in species identification. This study proposed deep barcoding, a deep learning framework for species classification by using DNA barcodes. Deep barcoding uses raw sequence data as the input to represent one-hot encoding as a one-dimensional image and uses a deep convolutional neural network with a fully connected deep neural network for sequence analysis. It can achieve an average accuracy of >90 percent for both simulation and real datasets. Although deep learning yields outstanding performance for species classification with DNA sequences, its application remains a challenge. The deep barcoding model can be a potential tool for species classification and can elucidate DNA barcode-based species identification.
Collapse
|
3
|
A Secure High-Order Gene Interaction Detecting Method for Infectious Diseases. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:4471736. [PMID: 35495886 PMCID: PMC9050263 DOI: 10.1155/2022/4471736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 03/01/2022] [Indexed: 12/04/2022]
Abstract
Infectious diseases pose a serious threat to human life, the Genome Wide Association Studies (GWAS) can analyze susceptibility genes of infectious diseases from the genetic level and carry out targeted prevention and treatment. The susceptibility genes for infectious diseases often act in combination with multiple susceptibility sites; therefore, high-order epistasis detection has become an important means. However, due to intensive computational burden and diversity of disease models, existing methods have drawbacks on low detection power, high computation cost, and preference for some types of disease models. Furthermore, these methods are exposed to repeated query and model inversion attacks in the process of iterative optimization, which may disclose Single Nucleotide Polymorphism (SNP) information associated with individual privacy. Therefore, in order to solve these problems, this paper proposed a safe harmony search algorithm for high-order gene interaction detection, termed as HS-DP. Firstly, the linear weighting method was used to integrate 5 objective functions to screen out high-order SNP sets with high correlation, including K2-Score, JS divergence, logistic regression, mutual information, and Gini. Then, based on the Differential Privacy (DP) theory, the function disturbance mechanism was introduced to protect the security of individual privacy information associated with the objective function, and we proved the rationality of the disturbance mechanism theoretically. Finally, the practicability and superiority of the algorithm were verified by experiments. Experimental results showed that the algorithm proposed in this paper could improve the detection accuracy to the greatest extent while guaranteeing privacy.
Collapse
|
4
|
Identifying the Association of Time-Averaged Serum Albumin Levels with Clinical Factors among Patients on Hemodialysis Using Whale Optimization Algorithm. MATHEMATICS 2022. [DOI: 10.3390/math10071030] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Time-averaged serum albumin (TSA) is commonly associated with clinical outcomes in hemodialysis (HD) patients and considered as a surrogate indicator of nutritional status. The whale optimization algorithm-based feature selection (WOFS) model could address the complex association between the clinical factors, and could further combine with regression models for application. The present study aimed to demonstrate an optimal multifactor TSA-associated model, in order to interpret the complex association between TSA and clinical factors among HD patients. A total of 829 HD patients who met the inclusion criteria were selected for analysis. Monthly serum albumin data tracked from January 2009 to December 2013 were converted into TSA categories based on a critical value of 3.5 g/dL. Multivariate logistic regression was used to analyze the association between TSA categories and multiple clinical factors using three types of feature selection models, namely the fully adjusted, stepwise, and WOFS models. Five features, albumin, age, creatinine, potassium, and HD adequacy index (Kt/V level), were selected from fifteen clinical factors by the WOFS model, which is the minimum number of selected features required in multivariate regression models for optimal multifactor model construction. The WOFS model yielded the lowest Akaike information criterion (AIC) value, which indicated that the WOFS model could achieve superior performance in the multifactor analysis of TSA for HD patients. In conclusion, the application of the optimal multifactor TSA-associated model could facilitate nutritional status monitoring in HD patients.
Collapse
|
5
|
Abstract
Accurately forecasting the movement of exchange rates is of interest in a variety of fields, such as international business, financial management, and monetary policy, though this is not an easy task due to dramatic fluctuations caused by political and economic events. In this study, we develop a new forecasting approach referred to as FSPSOSVR, which is able to accurately predict exchange rates by combining particle swarm optimization (PSO), random forest feature selection, and support vector regression (SVR). PSO is used to obtain the optimal SVR parameters for predicting exchange rates. Our analysis involves the monthly exchange rates from January 1971 to December 2017 of seven countries including Australia, Canada, China, the European Union, Japan, Taiwan, and the United Kingdom. The out-of-sample forecast performance of the FSPSOSVR algorithm is compared with six competing forecasting models using the mean absolute percentage error (MAPE) and root mean square error (RMSE), including random walk, exponential smoothing, autoregressive integrated moving average (ARIMA), seasonal ARIMA, SVR, and PSOSVR. Our empirical results show that the FSPSOSVR algorithm consistently yields excellent predictive accuracy, which compares favorably with competing models for all currencies. These findings suggest that the proposed algorithm is a promising method for the empirical forecasting of exchange rates. Finally, we show the empirical relevance of exchange rate forecasts arising from FSPSOSVR by use of foreign exchange carry trades and find that the proposed trading strategies can deliver positive excess returns of more than 3% per annum for most currencies, except for AUD and NTD.
Collapse
|
6
|
Kong M, Zhang Y, Xu D, Chen W, Dehmer M. FCTP-WSRC: Protein-Protein Interactions Prediction via Weighted Sparse Representation Based Classification. Front Genet 2020; 11:18. [PMID: 32117437 PMCID: PMC7010952 DOI: 10.3389/fgene.2020.00018] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 01/07/2020] [Indexed: 12/21/2022] Open
Abstract
The task of predicting protein–protein interactions (PPIs) has been essential in the context of understanding biological processes. This paper proposes a novel computational model namely FCTP-WSRC to predict PPIs effectively. Initially, combinations of the F-vector, composition (C) and transition (T) are used to map each protein sequence onto numeric feature vectors. Afterwards, an effective feature extraction method PCA (principal component analysis) is employed to reconstruct the most discriminative feature subspaces, which is subsequently used as input in weighted sparse representation based classification (WSRC) for prediction. The FCTP-WSRC model achieves accuracies of 96.67%, 99.82%, and 98.09% for H. pylori, Human and Yeast datasets respectively. Furthermore, the FCTP-WSRC model performs well when predicting three significant PPIs networks: the single-core network (CD9), the multiple-core network (Ras-Raf-Mek-Erk-Elk-Srf pathway), and the cross-connection network (Wnt-related Network). Consequently, the promising results show that the proposed method can be a powerful tool for PPIs prediction with excellent performance and less time.
Collapse
Affiliation(s)
- Meng Kong
- School of Mathematics and Statistics, Shandong University at Weihai, Weihai, China
| | - Yusen Zhang
- School of Mathematics and Statistics, Shandong University at Weihai, Weihai, China
| | - Da Xu
- School of Mathematics and Statistics, Shandong University at Weihai, Weihai, China
| | - Wei Chen
- School of Mathematics and Statistics, Shandong University at Weihai, Weihai, China
| | - Matthias Dehmer
- University of Applied Sciences Upper Austria, School of Management, Steyr, Austria.,College of Artificial Intellegience, Nankai University, Tianjin, China.,Department of Biomedical Computer Science and Mechantronics, UMIT Hall, Tyrol, Austria
| |
Collapse
|
7
|
Yang CH, Lin YD, Chuang LY. Class Balanced Multifactor Dimensionality Reduction to Detect Gene-Gene Interactions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:71-81. [PMID: 30040653 DOI: 10.1109/tcbb.2018.2858776] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Detecting gene-gene interactions in single-nucleotide polymorphism data is vital for understanding disease susceptibility. However, existing approaches may be limited by the sample size in case-control studies. Herein, we propose a balance approach for the multifactor dimensionality reduction (BMDR) method to increase the accuracy of estimates of the prediction error rate in small samples. BMDR explicitly selects the best model by evaluating the average of prediction error rates over k-fold cross-validation without cross-validation consistency selection. In this study, we used several epistatic models with and without marginal effects under different parameter settings (heritability and minor allele frequencies) to evaluate the performance of existing approaches. Using simulated data sets, BMDR successfully detected gene-gene interactions, particularly for data sets with small sample sizes. A large data set was obtained from the Wellcome Trust Case Control Consortium, and results indicated that BMDR could effectively detect significant gene-gene interactions.
Collapse
|
8
|
Han N, Qiao S, Yuan G, Huang P, Liu D, Yue K. A novel Chinese herbal medicine clustering algorithm via artificial bee colony optimization. Artif Intell Med 2019; 101:101760. [PMID: 31813485 DOI: 10.1016/j.artmed.2019.101760] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Revised: 10/08/2019] [Accepted: 11/06/2019] [Indexed: 11/30/2022]
Abstract
Traditional Chinese medicine (TCM) has become popular and been viewed as an effective clinical treatment across the world. Accordingly, there is an ever-increasing interest in performing data analysis over TCM data. Aiming to cope with the problem of excessively depending on empirical values when selecting cluster centers by traditional clustering algorithms, an improved artificial bee colony algorithm is proposed by which to automatically select cluster centers and apply it to aggregate Chinese herbal medicines. The proposed method integrates the following new techniques: (1) improving the artificial bee colony algorithm by applying a new searching strategy of neighbour nectar, (2) employing the improved artificial bee colony algorithm to optimize the parameters of the cutoff distance dc, the local density ρi and the minimum distance δi between the element i and any other element with higher density in the cluster algorithm by fast search and finding of density peaks (called DP algorithm) to find the optimal cluster centers, in order to clustering herbal medicines in an accurate fashion with the guarantee of runtime performance. Extensive experiments were conducted on the UCI benchmark datasets and the TCM datasets and the results verify the effectiveness of the proposed method by comparing it with classical clustering algorithms including K-means, K-mediods and DBSCAN in multiple evaluation metrics, that is, Silhouette Coefficient, Entropy, Purity, Precision, Recall and F1-Measure. The results show that the IABC-DP algorithm outperforms other approaches with good clustering quality and accuracy on the UCI and the TCM datasets as well. In addition, it can be found that the improved artificial bee colony algorithm can effectively reduce the number of iterations when compared to the traditional bee colony algorithm. In particular, the IABC-DP algorithm is applied to cluster multi-dimensional Chinese herbal medicines and the result shows that it outperforms other clustering algorithms in clustering Chinese herbal medicines, which can contribute to a larger effort targeted at advancing the study of discovering composition rules of traditional Chinese prescriptions.
Collapse
Affiliation(s)
- Nan Han
- School of Management, Chengdu University of Information Technology, Chengdu 610103, China
| | - Shaojie Qiao
- School of Software Engineering, Chengdu University of Information Technology, Chengdu 610225, China.
| | - Guan Yuan
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
| | - Ping Huang
- School of Management, Chengdu University of Information Technology, Chengdu 610103, China
| | - Dingxiang Liu
- School of Software Engineering, Chengdu University of Information Technology, Chengdu 610225, China
| | - Kun Yue
- School of Information Science and Engineering, Yunnan University, Kunming 650500, China
| |
Collapse
|
9
|
Tai HK, Jusoh SA, Siu SWI. Chaos-embedded particle swarm optimization approach for protein-ligand docking and virtual screening. J Cheminform 2018; 10:62. [PMID: 30552524 PMCID: PMC6755579 DOI: 10.1186/s13321-018-0320-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2018] [Accepted: 12/10/2018] [Indexed: 12/01/2022] Open
Abstract
Background Protein-ligand docking programs are routinely used in structure-based drug design to find the optimal binding pose of a ligand in the protein’s active site. These programs are also used to identify potential drug candidates by ranking large sets of compounds. As more accurate and efficient docking programs are always desirable, constant efforts focus on developing better docking algorithms or improving the scoring function. Recently, chaotic maps have emerged as a promising approach to improve the search behavior of optimization algorithms in terms of search diversity and convergence speed. However, their effectiveness on docking applications has not been explored. Herein, we integrated five popular chaotic maps—logistic, Singer, sinusoidal, tent, and Zaslavskii maps—into PSOVina\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$^{{\mathrm{2LS}}}$$\end{document}2LS, a recent variant of the popular AutoDock Vina program with enhanced global and local search capabilities, and evaluated their performances in ligand pose prediction and virtual screening using four docking benchmark datasets and two virtual screening datasets. Results Pose prediction experiments indicate that chaos-embedded algorithms outperform AutoDock Vina and PSOVina in ligand pose RMSD, success rate, and run time. In virtual screening experiments, Singer map-embedded PSOVina\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$^{{\mathrm{2LS}}}$$\end{document}2LS achieved a very significant five- to sixfold speedup with comparable screening performances to AutoDock Vina in terms of area under the receiver operating characteristic curve and enrichment factor. Therefore, our results suggest that chaos-embedded PSOVina methods might be a better option than AutoDock Vina for docking and virtual screening tasks. The success of chaotic maps in protein-ligand docking reveals their potential for improving optimization algorithms in other search problems, such as protein structure prediction and folding. The Singer map-embedded PSOVina\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$^{{\mathrm{2LS}}}$$\end{document}2LS which is named PSOVina-2.0 and all testing datasets are publicly available on https://cbbio.cis.umac.mo/software/psovina. Electronic supplementary material The online version of this article (10.1186/s13321-018-0320-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hio Kuan Tai
- Department of Computer and Information Science, University of Macau, Avenida da Universidade, Taipa, Macau, China
| | - Siti Azma Jusoh
- Bioinformatics Lab, Faculty of Pharmacy, Level 8, FF2 Building, Universiti Teknologi MARA (UiTM), 42300, Bandar Puncak Alam, Selangor, Malaysia
| | - Shirley W I Siu
- Department of Computer and Information Science, University of Macau, Avenida da Universidade, Taipa, Macau, China.
| |
Collapse
|
10
|
Yang CH, Yang HS, Chuang LY. PBMDR: A particle swarm optimization-based multifactor dimensionality reduction for the detection of multilocus interactions. J Theor Biol 2018; 461:68-75. [PMID: 30296447 DOI: 10.1016/j.jtbi.2018.10.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2018] [Revised: 08/26/2018] [Accepted: 10/04/2018] [Indexed: 12/29/2022]
Abstract
Studies on multilocus interactions have mainly investigated the associations between genetic variations from the related genes and histopathological tumor characteristics in patients. However, currently, the identification and characterization of susceptibility genes for complex diseases remain a great challenge for geneticists. In this study, a particle swarm optimization (PSO)-based multifactor dimensionality reduction (MDR) approach was proposed, denoted by PBMDR. MDR was used to detect multilocus interactions based on the PSO algorithm. A test data set was simulated from the genotype frequencies of 26 SNPs from eight breast-cancer-related gene. In simulated disease models, we demonstrated that PBMDR outperforms existing global optimization algorithms in terms of its ability to explore and power to detect specific SNP-genotype combinations. In addition, the PBMDR algorithm was compared with other algorithms, including PSO and chaotic PSOs, and the results revealed that the PBMDR algorithm yielded higher accuracy and chi-square values than other algorithms did.
Collapse
Affiliation(s)
- Cheng-Hong Yang
- Department of Electronic Engineering, National Kaohsiung University of Science and Technology, No.415, Jiangong Rd., Sanmin Dist., Kaohsiung City 80778, Taiwan.; Graduate Institute of Clinical Medicine, Kaohsiung Medical University, Kaohsiung City 80708, Taiwan..
| | - Huai-Shuo Yang
- Department of Electronic Engineering, National Kaohsiung University of Science and Technology, No.415, Jiangong Rd., Sanmin Dist., Kaohsiung City 80778, Taiwan
| | - Li-Yeh Chuang
- Department of Chemical Engineering & Institute of Biotechnology and Chemical Engineering, I-Shou University, No.1, Sec. 1, Syuecheng Rd., Dashu District, Kaohsiung City 84001, Taiwan..
| |
Collapse
|
11
|
Yang CH, Kao YK, Chuang LY, Lin YD. Catfish Taguchi-Based Binary Differential Evolution Algorithm for Analyzing Single Nucleotide Polymorphism Interactions in Chronic Dialysis. IEEE Trans Nanobioscience 2018; 17:291-299. [PMID: 29994217 DOI: 10.1109/tnb.2018.2844342] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Single-nucleotide polymorphism (SNP)-SNP interactions are crucial for understanding the association between disease-related multifactorials for disease analysis. Existing statistical methods for determining such interactions are limited by the considerable computation required for evaluating all potential associations between disease-related multifactorials. Identifying SNP-SNP interactions is thus a major challenge in genetic association studies. This paper proposes a catfish Taguchi-based binary differential evolution (CT-BDE) algorithm for identifying SNP-SNP interactions. In the search space, the catfish effect prevents the premature convergence of the population, and the Taguchi method improves the search ability of the BDE algorithm. Hence, the proposed algorithm enables obtaining a favorable solution regarding the identification of high-order SNP-SNP interactions. Additionally, the proposed algorithm applies an effective fitness function derived from a multifactor dimensionality reduction (MDR) operation to evaluate the solutions from BDE-based algorithms. Simulated and real data sets were used to evaluate the ability of several BDE-based algorithms in identifying specific SNP-SNP interactions. We compared the fitness function derived from the MDR operation with that derived according to the difference between cases and controls, by using the different BDE-based algorithms. The results showed that the proposed CT-BDE algorithm applying the fitness function derived from the MDR operation exhibited a superior ability in identifying SNP-SNP interactions compared with the other BDE-based algorithms.
Collapse
|