1
|
Informative SNP Selection Based on a Fuzzy Clustering and Improved Binary Particle Swarm Optimization Algorithm. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:3837579. [PMID: 35756402 PMCID: PMC9225903 DOI: 10.1155/2022/3837579] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 04/14/2022] [Accepted: 04/30/2022] [Indexed: 12/04/2022]
Abstract
Single-nucleotide polymorphism (SNP) involves the replacement of a single nucleotide in a deoxyribonucleic acid (DNA) sequence and is often linked to the development of specific diseases. Although current genotyping methods can tag SNP loci within biological samples to provide accurate genetic information for a disease associated, they have limited prediction accuracy. Furthermore, they are complex to perform and may result in the prediction of an excessive number of tag SNP loci, which may not always be associated with the disease. Therefore in this manuscript, we aimed to evaluate the impact of a newly optimized fuzzy clustering and binary particle swarm optimization algorithm (FCBPSO) on the accuracy and running time of informative SNP selection. Fuzzy clustering and FCBPSO were first applied to identify the equivalence relation and the candidate tag SNP set to reduce the redundancy between loci. The FCBPSO algorithm was then optimized and used to obtain the final tag SNP set. The prediction performance and running time of the newly developed model were compared with other traditional methods, including NMC, SPSO, and MCMR. The prediction accuracy of the FCBPSO algorithm was always higher than that of the other algorithms especially as the number of tag SNPs increased. However, when the number of tag SNPs was low, the prediction accuracy of FCBPSO was slightly lower than that of MCMR (add prediction accuracy values for each algorithm). However, the running time of the FCBPSO algorithm was always lower than that of MCMR. FCBPSO not only reduced the size and dimension of the optimization problem but also simplified the training of the prediction model. This improved the prediction accuracy of the model and reduced the running time when compared with other traditional methods.
Collapse
|
2
|
Wang P, Zhu W, Liao B, Cai L, Peng L, Yang J. Predicting Influenza Antigenicity by Matrix Completion With Antigen and Antiserum Similarity. Front Microbiol 2018; 9:2500. [PMID: 30405563 PMCID: PMC6206390 DOI: 10.3389/fmicb.2018.02500] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Accepted: 10/01/2018] [Indexed: 12/20/2022] Open
Abstract
The rapid mutation of influenza viruses especially on the two surface proteins hemagglutinin (HA) and neuraminidase (NA) has made them capable to escape from population immunity, which has become a key challenge for influenza vaccine design. Thus, it is crucial to predict influenza antigenic evolution and identify new antigenic variants in a timely manner. However, traditional experimental methods like hemagglutination inhibition (HI) assay to select vaccine strains are time and labor-intensive, while popular computational methods are less sensitive, which presents the need for more accurate algorithms. In this study, we have proposed a novel low-rank matrix completion model MCAAS to infer antigenic distances between antigens and antisera based on partially revealed antigenic distances, virus similarity based on HA protein sequences, and vaccine similarity based on vaccine strains. The model exploits the correlations of viruses and vaccines in serological tests as well as the ability of HAs from viruses and vaccine strains in inferring influenza antigenicity. We also compared the effects of comprehensive 65 amino acids substitution matrices in predicting influenza antigenicity. As a result, we applied MCAAS into H3N2 seasonal influenza virus data. Our model achieved a 10-fold cross validation root-mean-squared error (RMSE) of 0.5982, significantly outperformed existing computational methods like antigenic cartography, AntigenMap and BMCSI. We also constructed the antigenic map and studied the association between genetic and antigenic evolution of H3N2 influenza viruses. Finally, our analyses showed that homologous structure derived amino acid substitution matrix (HSDM) is most powerful in predicting influenza antigenicity, which is consistent with previous studies.
Collapse
Affiliation(s)
- Peng Wang
- College of Information Science and Engineering, Hunan University, Changsha, Changsha, China
| | - Wen Zhu
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Bo Liao
- College of Information Science and Engineering, Hunan University, Changsha, Changsha, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Lijun Cai
- College of Information Science and Engineering, Hunan University, Changsha, Changsha, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Jialiang Yang
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine At Mount Sinai, New York, NY, United States
| |
Collapse
|
3
|
Li Z, Liao B, Li Y, Liu W, Chen M, Cai L. Gene function prediction based on combining gene ontology hierarchy with multi-instance multi-label learning. RSC Adv 2018; 8:28503-28509. [PMID: 35542493 PMCID: PMC9083914 DOI: 10.1039/c8ra05122d] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Accepted: 07/12/2018] [Indexed: 12/04/2022] Open
Abstract
Gene function annotation is the main challenge in the post genome era, which is an important part of the genome annotation. The sequencing of the human genome project produces a whole genome data, providing abundant biological information for the study of gene function annotation. However, to obtain useful knowledge from a large amount of data, a potential strategy is to apply machine learning methods to mine these data and predict gene function. In this study, we improved multi-instance hierarchical clustering by using gene ontology hierarchy to annotate gene function, which combines gene ontology hierarchy with multi-instance multi-label learning frame structure. Then, we used multi-label support vector machine (MLSVM) and multi-label k-nearest neighbor (MLKNN) algorithm to predict the function of gene. Finally, we verified our method in four yeast expression datasets. The performance of the simulated experiments proved that our method is efficient.
Collapse
Affiliation(s)
- Zejun Li
- College of Information Science and Engineering, Hunan University Changsha Hunan 410082 China
- School of Computer and Information Science, Hunan Institute of Technology Hengyang 412002 China
| | - Bo Liao
- College of Information Science and Engineering, Hunan University Changsha Hunan 410082 China
| | - Yun Li
- College of Information Science and Engineering, Hunan University Changsha Hunan 410082 China
| | - Wenhua Liu
- School of Computer and Information Science, Hunan Institute of Technology Hengyang 412002 China
| | - Min Chen
- College of Information Science and Engineering, Hunan University Changsha Hunan 410082 China
- School of Computer and Information Science, Hunan Institute of Technology Hengyang 412002 China
| | - Lijun Cai
- College of Information Science and Engineering, Hunan University Changsha Hunan 410082 China
| |
Collapse
|
4
|
Zhang W, Wang SL. An efficient strategy for identifying cancer-related key genes based on graph entropy. Comput Biol Chem 2018; 74:142-148. [PMID: 29609142 DOI: 10.1016/j.compbiolchem.2018.03.022] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Revised: 01/22/2018] [Accepted: 03/20/2018] [Indexed: 02/02/2023]
Abstract
Gene networks are beneficial to identify functional genes that are highly relevant to clinical outcomes. Most of the current methods require information about the interaction of genes or proteins to construct genetic network connection. However, the conclusion of these methods may be bias because of the current incompleteness of human interactome. In this paper, we propose an efficient strategy to use gene expression data and gene mutation data for identifying cancer-related key genes based on graph entropy (iKGGE). Firstly, we construct a gene network using only gene expression data based on the sparse inverse covariance matrix, then, cluster genes use the algorithm of parallel maximal cliques for quickly obtaining a series of subgraphs, and at last, we introduce a novel metric that combine graph entropy and the influence of upstream gene mutations information to measure the impact factors of genes. Testing of the three available cancer datasets shows that our strategy can effectively extract key genes that may play distinct roles in tumorigenesis, and the cancer patient risk groups are well predicted based on key genes.
Collapse
Affiliation(s)
- Wei Zhang
- College of Computer Science and Electronics Engineering, Hunan University, Changsha, Hunan, 410082, China.
| | - Shu-Lin Wang
- College of Computer Science and Electronics Engineering, Hunan University, Changsha, Hunan, 410082, China.
| |
Collapse
|
5
|
Zhang X, Han R, Wang M, Li X, Yang X, Xia Q, Liu R, Yuan Y, Hu X, Chen M, Jiang G, Ma Y, Yang J, Xu S, Xu J, Shuai Z, Pan F. Association between the autophagy-related gene ULK1 and ankylosing spondylitis susceptibility in the Chinese Han population: a case-control study. Postgrad Med J 2017; 93:752-757. [PMID: 28667165 DOI: 10.1136/postgradmedj-2017-134964] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2017] [Revised: 05/24/2017] [Accepted: 05/27/2017] [Indexed: 01/29/2023]
Abstract
PURPOSE Ankylosing spondylitis (AS), inflammatory bowel disease and Crohn's disease (CD) often coexist in the same patient and these diseases have remarkably strong overlaps in genetic association. The association between Unc51like kinase 1 (ULK1) gene polymorphisms and CD has been reported, and the aim of the current study was to investigate whether ULK1 polymorphisms are also associated with susceptibility to AS in the Chinese Han population. METHODS Five tagging single nucleotide polymorphisms in the ULK1 gene (rs9652059, rs11616018, rs12303764, rs4964879 and rs7300908) were genotyped by the improved multiplex ligase detection reaction method in a cohort of patients with AS (n=649) and controls (n=628). Various genetic models were performed and haplotypes were constructed after linkage disequilibrium analysis. RESULTS A statistically significant difference was found in the dominant model of the rs9652059 polymorphism (OR (95% CI) = 0.796 (0.638 to 0.994), χ2 = 4.064, p= 0.044). Haplotypes were conducted between rs9652059 and rs11616018, rs11616018 and rs4964879, rs9652059 and rs4964879 based on D' ≥0.9 and r2 ≥ 0.6. Ht5 (rs9652059C-rs4964879G) haplotype was associated with AS (OR (95% CI) = 0.834 (0.706 to 0.985), χ2=4.555, p= 0.0328) and other two haplotypes were marginally correlated with AS (ht2 (rs9652059C-rs11616018T): OR (95% CI) = 0.846 (0.717 to 1.000), χ2= 3.864, p= 0.0493); ht3 (rs9652059T-rs11616018T): OR (95% CI) = 1.440 (0.999 to 2.076), χ2 = 3.849, p = 0.0498). CONCLUSIONS Our findings suggest that rs9652059 variation (C→T) could increase AS susceptibility and haplotypes of rs9652059C-rs4964879G, rs9652059C-rs11616018T and rs9652059T-rs11616018T may be associatd with AS.
Collapse
Affiliation(s)
- Xu Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, Hefei, Anhui, China.,The Key Laboratory of Major Autoimmune Diseases, Anhui, China
| | - Renfang Han
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, Hefei, Anhui, China.,The Key Laboratory of Major Autoimmune Diseases, Anhui, China
| | - Mengmeng Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, Hefei, Anhui, China.,The Key Laboratory of Major Autoimmune Diseases, Anhui, China
| | - Xiaona Li
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, Hefei, Anhui, China.,The Key Laboratory of Major Autoimmune Diseases, Anhui, China
| | - Xiao Yang
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, Hefei, Anhui, China.,The Key Laboratory of Major Autoimmune Diseases, Anhui, China
| | - Qing Xia
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, Hefei, Anhui, China.,The Key Laboratory of Major Autoimmune Diseases, Anhui, China
| | - Rui Liu
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, Hefei, Anhui, China.,The Key Laboratory of Major Autoimmune Diseases, Anhui, China
| | - Yaping Yuan
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, Hefei, Anhui, China.,The Key Laboratory of Major Autoimmune Diseases, Anhui, China
| | - Xingxing Hu
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, Hefei, Anhui, China.,The Key Laboratory of Major Autoimmune Diseases, Anhui, China
| | - Mengya Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, Hefei, Anhui, China.,The Key Laboratory of Major Autoimmune Diseases, Anhui, China
| | - Guangming Jiang
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, Hefei, Anhui, China.,The Key Laboratory of Major Autoimmune Diseases, Anhui, China
| | - Yubo Ma
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, Hefei, Anhui, China.,The Key Laboratory of Major Autoimmune Diseases, Anhui, China
| | - Jiajia Yang
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, Hefei, Anhui, China.,The Key Laboratory of Major Autoimmune Diseases, Anhui, China
| | - Shengqian Xu
- Department of Rheumatism and Immunity, First Affiliated Hospital of Anhui Medical University, Anhui, China
| | - Jianhua Xu
- Department of Rheumatism and Immunity, First Affiliated Hospital of Anhui Medical University, Anhui, China
| | - Zongwen Shuai
- Department of Rheumatism and Immunity, First Affiliated Hospital of Anhui Medical University, Anhui, China
| | - Faming Pan
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, Hefei, Anhui, China.,The Key Laboratory of Major Autoimmune Diseases, Anhui, China
| |
Collapse
|
6
|
Li X, Liao B, Cai L, Cao Z, Zhu W. Informative SNPs selection based on two-locus and multilocus linkage disequilibrium: criteria of max-correlation and min-redundancy. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:688-695. [PMID: 24091401 DOI: 10.1109/tcbb.2013.61] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Currently, there are lots of methods to select informative SNPs for haplotype reconstruction. However, there are still some challenges that render them ineffective for large data sets. First, some traditional methods belong to wrappers which are of high computational complexity. Second, some methods ignore linkage disequilibrium that it is hard to interpret selection results. In this study, we innovatively derive optimization criteria by combining two-locus and multilocus LD measure to obtain the criteria of Max-Correlation and Min-Redundancy (MCMR). Then, we use a greedy algorithm to select the candidate set of informative SNPs constrained by the criteria. Finally, we use backward scheme to refine the candidate subset. We separately use small and middle (>1,000 SNPs) data sets to evaluate MCMR in terms of the reconstuction accuracy, the time complexity, and the compactness. Additionally, to demonstrate that MCMR is practical for large data sets, we design a parameter w to adapt to various platforms and introduce another replacement scheme for larger data sets, which sharply narrow down the computational complexity of evaluating the reconstruct ratio. Then, we first apply our method based on haplotype reconstruction for large size (>5,000 SNPs) data sets. The results confirm that MCMR leads to promising improvement in informative SNPs selection and prediction accuracy.
Collapse
|