1
|
Ganesh S, Chithambaram T, Krishnan NR, Vincent DR, Kaliappan J, Srinivasan K. Exploring Huntington's Disease Diagnosis via Artificial Intelligence Models: A Comprehensive Review. Diagnostics (Basel) 2023; 13:3592. [PMID: 38066833 PMCID: PMC10706174 DOI: 10.3390/diagnostics13233592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 11/25/2023] [Accepted: 11/27/2023] [Indexed: 10/16/2024] Open
Abstract
Huntington's Disease (HD) is a devastating neurodegenerative disorder characterized by progressive motor dysfunction, cognitive impairment, and psychiatric symptoms. The early and accurate diagnosis of HD is crucial for effective intervention and patient care. This comprehensive review provides a comprehensive overview of the utilization of Artificial Intelligence (AI) powered algorithms in the diagnosis of HD. This review systematically analyses the existing literature to identify key trends, methodologies, and challenges in this emerging field. It also highlights the potential of ML and DL approaches in automating HD diagnosis through the analysis of clinical, genetic, and neuroimaging data. This review also discusses the limitations and ethical considerations associated with these models and suggests future research directions aimed at improving the early detection and management of Huntington's disease. It also serves as a valuable resource for researchers, clinicians, and healthcare professionals interested in the intersection of machine learning and neurodegenerative disease diagnosis.
Collapse
Affiliation(s)
- Sowmiyalakshmi Ganesh
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India; (S.G.); (T.C.); (J.K.)
| | - Thillai Chithambaram
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India; (S.G.); (T.C.); (J.K.)
| | - Nadesh Ramu Krishnan
- School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India;
| | - Durai Raj Vincent
- School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India;
| | - Jayakumar Kaliappan
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India; (S.G.); (T.C.); (J.K.)
| | - Kathiravan Srinivasan
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India; (S.G.); (T.C.); (J.K.)
| |
Collapse
|
2
|
Chaki J, Woźniak M. Deep learning for neurodegenerative disorder (2016 to 2022): A systematic review. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
3
|
Wani MA, Garg P, Roy KK. Machine learning-enabled predictive modeling to precisely identify the antimicrobial peptides. Med Biol Eng Comput 2021; 59:2397-2408. [PMID: 34632545 DOI: 10.1007/s11517-021-02443-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Accepted: 09/14/2021] [Indexed: 10/20/2022]
Abstract
The ubiquitous antimicrobial peptides (AMPs), with a broad range of antimicrobial activities, represent a great promise for combating the multi-drug resistant infections. In this study, using a large and diverse set of AMPs (2638) and non-AMPs (3700), we have explored a variety of machine learning classifiers to build in silico models for AMP prediction, including Random Forest (RF), k-Nearest Neighbors (k-NN), Support Vector Machine (SVM), Decision Tree (DT), Naive Bayes (NB), Quadratic Discriminant Analysis (QDA), and ensemble learning. Among the various models generated, the RF classifier-based model top-performed in both the internal [Accuracy: 91.40%, Precision: 89.37%, Sensitivity: 90.05%, and Specificity: 92.36%] and external validations [Accuracy: 89.43%, Precision: 88.92%, Sensitivity: 85.21%, and Specificity: 92.43%]. In addition, the RF classifier-based model correctly predicted the known AMPs and non-AMPs; those kept aside as an additional external validation set. The performance assessment revealed three features viz. ChargeD2001, PAAC12 (pseudo amino acid composition), and polarity T13 that are likely to play vital roles in the antimicrobial activity of AMPs. The developed RF-based classification model may further be useful in the design and prediction of the novel potential AMPs.
Collapse
Affiliation(s)
- Mushtaq Ahmad Wani
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Kolkata, 700054, West Bengal, India
| | - Prabha Garg
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Mohali, 160062, Punjab, India
| | - Kuldeep K Roy
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Kolkata, 700054, West Bengal, India. .,Department of Pharmaceutical Sciences, School of Health Sciences, University of Petroleum and Energy Studies (UPES), P.O. Bidholi, Dehradun, 248007, Uttarakhand, India.
| |
Collapse
|
4
|
Jiang X, Chen M, Song W, Lin GN. Label propagation-based semi-supervised feature selection on decoding clinical phenotypes with RNA-seq data. BMC Med Genomics 2021; 14:141. [PMID: 34465339 PMCID: PMC8406783 DOI: 10.1186/s12920-021-00985-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2020] [Accepted: 05/14/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Clinically, behavior, cognitive, and mental functions are affected during the neurodegenerative disease progression. To date, the molecular pathogenesis of these complex disease is still unclear. With the rapid development of sequencing technologies, it is possible to delicately decode the molecular mechanisms corresponding to different clinical phenotypes at the genome-wide transcriptomic level using computational methods. Our previous studies have shown that it is difficult to distinguish disease genes from non-disease genes. Therefore, to precisely explore the molecular pathogenesis under complex clinical phenotypes, it is better to identify biomarkers corresponding to different disease stages or clinical phenotypes. So, in this study, we designed a label propagation-based semi-supervised feature selection approach (LPFS) to prioritize disease-associated genes corresponding to different disease stages or clinical phenotypes. METHODS In this study, we pioneering put label propagation clustering and feature selection into one framework and proposed label propagation-based semi-supervised feature selection approach. LPFS prioritizes disease genes related to different disease stages or phenotypes through the alternative iteration of label propagation clustering based on sample network and feature selection with gene expression profiles. Then the GO and KEGG pathway enrichment analysis were carried as well as the gene functional analysis to explore molecular mechanisms of specific disease phenotypes, thus to decode the changes in individual behavioral and mental characteristics during neurodegenerative disease progression. RESULTS Large amounts of experiments were conducted to verify the performance of LPFS with Huntington's gene expression data. Experimental results shown that LPFS performs better in comparison with the-state-of-art methods. GO and KEGG enrichment analysis of key gene sets shown that TGF-beta signaling pathway, cytokine-cytokine receptor interaction, immune response, and inflammatory response were gradually affected during the Huntington's disease progression. In addition, we found that the expression of SLC4A11, ZFP474, AMBP, TOP2A, PBK, CCDC33, APSL, DLGAP5, and Al662270 changed seriously by the development of the disease. CONCLUSIONS In this study, we designed a label propagation-based semi-supervised feature selection model to precisely selected key genes of different disease phenotypes. We conducted experiments using the model with Huntington's disease mice gene expression data to decode the mechanisms of it. We found many cell types, including astrocyte, microglia, and GABAergic neuron, could be involved in the pathological process.
Collapse
Affiliation(s)
- Xue Jiang
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030 China
| | - Miao Chen
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030 China
| | - Weichen Song
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030 China
| | - Guan Ning Lin
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030 China
- Shanghai Key Laboratory of Psychotic Disorders, Shanghai, 200030 China
| |
Collapse
|
5
|
Jiang X, Pan W, Chen M, Wang W, Song W, Lin GN. Integrative enrichment analysis of gene expression based on an artificial neuron. BMC Med Genomics 2021; 14:173. [PMID: 34433483 PMCID: PMC8386081 DOI: 10.1186/s12920-021-00988-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Accepted: 05/18/2021] [Indexed: 11/28/2022] Open
Abstract
BACKGROUND Huntington's disease is a kind of chronic progressive neurodegenerative disease with complex pathogenic mechanisms. To data, the pathogenesis of Huntington's disease is still not fully understood, and there has been no effective treatment. The rapid development of high-throughput sequencing technologies makes it possible to explore the molecular mechanisms at the transcriptome level. Our previous studies on Huntington's disease have shown that it is difficult to distinguish disease-associated genes from non-disease genes. Meanwhile, recent progress in bio-medicine shows that the molecular origin of chronic complex diseases may not exist in the diseased tissue, and differentially expressed genes between different tissues may be helpful to reveal the molecular origin of chronic diseases. Therefore, developing integrative analysis computational methods for the multi-tissues gene expression data, exploring the relationship between differentially expressed genes in different tissues and the disease, can greatly accelerate the molecular discovery process. METHODS For analysis of the intra- and inter- tissues' differentially expressed genes, we designed an integrative enrichment analysis method based on an artificial neuron (IEAAN). Firstly, we calculated the differential expression scores of genes which are seen as features of the corresponding gene, using fold-change approach with intra- and inter- tissues' gene expression data. Then, we weighted sum all the differential expression scores through a sigmoid function to get differential expression enrichment score. Finally, we ranked the genes according to the enrichment score. Top ranking genes are supposed to be the potential disease-associated genes. RESULTS In this study, we conducted large amounts of experiments to analyze the differentially expressed genes of intra- and inter- tissues. Experimental results showed that genes differentially expressed between different tissues are more likely to be Huntington's disease-associated genes. Five disease-associated genes were selected out in this study, two of which have been reported to be implicated in Huntington's disease. CONCLUSIONS We proposed a novel integrative enrichment analysis method based on artificial neuron (IEAAN), which displays better prediction precision of disease-associated genes in comparison with the state-of-the-art statistical-based methods. Our comprehensive evaluation suggests that genes differentially expressed between striatum and liver tissues of health individuals are more likely to be Huntington's disease-associated genes.
Collapse
Affiliation(s)
- Xue Jiang
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030 China
| | - Weihao Pan
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030 China
| | - Miao Chen
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030 China
| | - Weidi Wang
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030 China
| | - Weichen Song
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030 China
| | - Guan Ning Lin
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030 China
- Shanghai Key Laboratory of Psychotic Disorders, Shanghai, 200030 China
| |
Collapse
|
6
|
Cheng J, Liu HP, Lin WY, Tsai FJ. Identification of contributing genes of Huntington's disease by machine learning. BMC Med Genomics 2020; 13:176. [PMID: 33228685 PMCID: PMC7684976 DOI: 10.1186/s12920-020-00822-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Accepted: 11/12/2020] [Indexed: 02/06/2023] Open
Abstract
Background Huntington’s disease (HD) is an inherited disorder caused by the polyglutamine (poly-Q) mutations of the HTT gene results in neurodegeneration characterized by chorea, loss of coordination, cognitive decline. However, HD pathogenesis is still elusive. Despite the availability of a wide range of biological data, a comprehensive understanding of HD’s mechanism from machine learning is so far unrealized, majorly due to the lack of needed data density.
Methods To harness the knowledge of the HD pathogenesis from the expression profiles of postmortem prefrontal cortex samples of 157 HD and 157 controls, we used gene profiling ranking as the criteria to reduce the dimension to the order of magnitude of the sample size, followed by machine learning using the decision tree, rule induction, random forest, and generalized linear model. Results These four Machine learning models identified 66 potential HD-contributing genes, with the cross-validated accuracy of 90.79 ± 4.57%, 89.49 ± 5.20%, 90.45 ± 4.24%, and 97.46 ± 3.26%, respectively. The identified genes enriched the gene ontology of transcriptional regulation, inflammatory response, neuron projection, and the cytoskeleton. Moreover, three genes in the cognitive, sensory, and perceptual systems were also identified. Conclusions The mutant HTT may interfere with both the expression and transport of these identified genes to promote the HD pathogenesis.
Collapse
Affiliation(s)
- Jack Cheng
- Graduate Institute of Integrated Medicine, College of Chinese Medicine, China Medical University, Taichung, 40402, Taiwan.,Department of Medical Research, China Medical University Hospital, Taichung, 40447, Taiwan
| | - Hsin-Ping Liu
- Graduate Institute of Acupuncture Science, College of Chinese Medicine, China Medical University, Taichung, 40402, Taiwan
| | - Wei-Yong Lin
- Graduate Institute of Integrated Medicine, College of Chinese Medicine, China Medical University, Taichung, 40402, Taiwan. .,Department of Medical Research, China Medical University Hospital, Taichung, 40447, Taiwan. .,Brain Diseases Research Center, China Medical University, Taichung, 40402, Taiwan.
| | - Fuu-Jen Tsai
- Department of Medical Research, China Medical University Hospital, Taichung, 40447, Taiwan. .,School of Chinese Medicine, China Medical University, Taichung, 40402, Taiwan. .,Department of Biotechnology, Asia University, Taichung, 41354, Taiwan. .,Children's Medical Center, China Medical University Hospital, Taichung, 40447, Taiwan.
| |
Collapse
|
7
|
Zhang X, Zhang J, Yang J. Large-scale dynamic social data representation for structure feature learning. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2020. [DOI: 10.3233/jifs-189010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The problems caused by network dimension disasters and computational complexity have become an important issue to be solved in the field of social network research. The existing methods for network feature learning are mostly based on static and small-scale assumptions, and there is no modified learning for the unique attributes of social networks. Therefore, existing learning methods cannot adapt to the dynamic and large-scale of current social networks. Even super large scale and other features. This paper mainly studies the feature representation learning of large-scale dynamic social network structure. In this paper, the positive and negative damping sampling of network nodes in different classes is carried out, and the dynamic feature learning method for newly added nodes is constructed, which makes the model feasible for the extraction of structural features of large-scale social networks in the process of dynamic change. The obtained node feature representation has better dynamic robustness. By selecting the real datasets of three large-scale dynamic social networks and the experiments of dynamic link prediction in social networks, it is found that DNPS has achieved a large performance improvement over the benchmark model in terms of prediction accuracy and time efficiency. When the α value is around 0.7, the model effect is optimal.
Collapse
Affiliation(s)
- Xiaoxian Zhang
- College of Computer Science and Technology, Harbin Engineering University, Harbin, Heilongjiang, China
- School of Computer Technology and Engineering, Changchun Institute of Technology, Changchun, Jilin, China
| | - Jianpei Zhang
- College of Computer Science and Technology, Harbin Engineering University, Harbin, Heilongjiang, China
| | - Jing Yang
- College of Computer Science and Technology, Harbin Engineering University, Harbin, Heilongjiang, China
| |
Collapse
|
8
|
Stevenson R, Samokhina E, Rossetti I, Morley JW, Buskila Y. Neuromodulation of Glial Function During Neurodegeneration. Front Cell Neurosci 2020; 14:278. [PMID: 32973460 PMCID: PMC7473408 DOI: 10.3389/fncel.2020.00278] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Accepted: 08/05/2020] [Indexed: 12/12/2022] Open
Abstract
Glia, a non-excitable cell type once considered merely as the connective tissue between neurons, is nowadays acknowledged for its essential contribution to multiple physiological processes including learning, memory formation, excitability, synaptic plasticity, ion homeostasis, and energy metabolism. Moreover, as glia are key players in the brain immune system and provide structural and nutritional support for neurons, they are intimately involved in multiple neurological disorders. Recent advances have demonstrated that glial cells, specifically microglia and astroglia, are involved in several neurodegenerative diseases including Amyotrophic lateral sclerosis (ALS), Epilepsy, Parkinson's disease (PD), Alzheimer's disease (AD), and frontotemporal dementia (FTD). While there is compelling evidence for glial modulation of synaptic formation and regulation that affect neuronal signal processing and activity, in this manuscript we will review recent findings on neuronal activity that affect glial function, specifically during neurodegenerative disorders. We will discuss the nature of each glial malfunction, its specificity to each disorder, overall contribution to the disease progression and assess its potential as a future therapeutic target.
Collapse
Affiliation(s)
- Rebecca Stevenson
- School of Medicine, Western Sydney University, Campbelltown, NSW, Australia
| | - Evgeniia Samokhina
- School of Medicine, Western Sydney University, Campbelltown, NSW, Australia
| | - Ilaria Rossetti
- School of Medicine, Western Sydney University, Campbelltown, NSW, Australia
| | - John W. Morley
- School of Medicine, Western Sydney University, Campbelltown, NSW, Australia
| | - Yossi Buskila
- School of Medicine, Western Sydney University, Campbelltown, NSW, Australia
- International Centre for Neuromorphic Systems, The MARCS Institute for Brain, Behaviour and Development, Penrith, NSW, Australia
| |
Collapse
|
9
|
Tobore I, Li J, Yuhang L, Al-Handarish Y, Kandwal A, Nie Z, Wang L. Deep Learning Intervention for Health Care Challenges: Some Biomedical Domain Considerations. JMIR Mhealth Uhealth 2019; 7:e11966. [PMID: 31376272 PMCID: PMC6696854 DOI: 10.2196/11966] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Revised: 04/14/2019] [Accepted: 06/12/2019] [Indexed: 01/10/2023] Open
Abstract
The use of deep learning (DL) for the analysis and diagnosis of biomedical and health care problems has received unprecedented attention in the last decade. The technique has recorded a number of achievements for unearthing meaningful features and accomplishing tasks that were hitherto difficult to solve by other methods and human experts. Currently, biological and medical devices, treatment, and applications are capable of generating large volumes of data in the form of images, sounds, text, graphs, and signals creating the concept of big data. The innovation of DL is a developing trend in the wake of big data for data representation and analysis. DL is a type of machine learning algorithm that has deeper (or more) hidden layers of similar function cascaded into the network and has the capability to make meaning from medical big data. Current transformation drivers to achieve personalized health care delivery will be possible with the use of mobile health (mHealth). DL can provide the analysis for the deluge of data generated from mHealth apps. This paper reviews the fundamentals of DL methods and presents a general view of the trends in DL by capturing literature from PubMed and the Institute of Electrical and Electronics Engineers database publications that implement different variants of DL. We highlight the implementation of DL in health care, which we categorize into biological system, electronic health record, medical image, and physiological signals. In addition, we discuss some inherent challenges of DL affecting biomedical and health domain, as well as prospective research directions that focus on improving health management by promoting the application of physiological signals and modern internet technology.
Collapse
Affiliation(s)
- Igbe Tobore
- Center for Medical Robotics and Minimally Invasive Surgical Devices, Shenzhen Institutes of Advance Technology, Chinese Academy of Sciences, Shenzhen, China.,Graduate University, Chinese Academy of Sciences, Beijing, China
| | - Jingzhen Li
- Center for Medical Robotics and Minimally Invasive Surgical Devices, Shenzhen Institutes of Advance Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Liu Yuhang
- Center for Medical Robotics and Minimally Invasive Surgical Devices, Shenzhen Institutes of Advance Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Yousef Al-Handarish
- Center for Medical Robotics and Minimally Invasive Surgical Devices, Shenzhen Institutes of Advance Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Abhishek Kandwal
- Center for Medical Robotics and Minimally Invasive Surgical Devices, Shenzhen Institutes of Advance Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Zedong Nie
- Center for Medical Robotics and Minimally Invasive Surgical Devices, Shenzhen Institutes of Advance Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Lei Wang
- Center for Medical Robotics and Minimally Invasive Surgical Devices, Shenzhen Institutes of Advance Technology, Chinese Academy of Sciences, Shenzhen, China
| |
Collapse
|
10
|
Guo X, Jiang X, Xu J, Quan X, Wu M, Zhang H. Ensemble Consensus-Guided Unsupervised Feature Selection to Identify Huntington's Disease-Associated Genes. Genes (Basel) 2018; 9:genes9070350. [PMID: 30002337 PMCID: PMC6071299 DOI: 10.3390/genes9070350] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Revised: 07/06/2018] [Accepted: 07/09/2018] [Indexed: 12/20/2022] Open
Abstract
Due to the complexity of the pathological mechanisms of neurodegenerative diseases, traditional differentially-expressed gene selection methods cannot detect disease-associated genes accurately. Recent studies have shown that consensus-guided unsupervised feature selection (CGUFS) performs well in feature selection for identifying disease-associated genes. Since the random initialization of the feature selection matrix in CGUFS results in instability of the final disease-associated gene set, for the purposes of this study we proposed an ensemble method based on CGUFS-namely, ensemble consensus-guided unsupervised feature selection (ECGUFS) in order to further improve the accuracy of disease-associated genes and the stability of feature gene sets. We also proposed a bagging integration strategy to integrate the results of CGUFS. Lastly, we conducted experiments with Huntington's disease RNA sequencing (RNA-Seq) data and obtained the final feature gene set, where we detected 287 disease-associated genes. Enrichment analysis on these genes has shown that postsynaptic density and the postsynaptic membrane, synapse, and cell junction are all affected during the disease's progression. However, ECGUFS greatly improved the accuracy of disease-associated gene prediction and the stability of the disease-associated gene set. We conducted a classification of samples with labels based on the linear support vector machine with 10-fold cross-validation. The average accuracy is 0.9, which suggests the effectiveness of the feature gene set.
Collapse
Affiliation(s)
- Xia Guo
- College of Computer and Control Engineering, Nankai University, Tianjin 300350, China.
| | - Xue Jiang
- College of Computer and Control Engineering, Nankai University, Tianjin 300350, China.
| | - Jing Xu
- College of Computer and Control Engineering, Nankai University, Tianjin 300350, China.
| | - Xiongwen Quan
- College of Computer and Control Engineering, Nankai University, Tianjin 300350, China.
| | - Min Wu
- College of Computer and Control Engineering, Nankai University, Tianjin 300350, China.
| | - Han Zhang
- College of Computer and Control Engineering, Nankai University, Tianjin 300350, China.
| |
Collapse
|