1
|
Al-Azani S, Alkhnbashi OS, Ramadan E, Alfarraj M. Gene Expression-Based Cancer Classification for Handling the Class Imbalance Problem and Curse of Dimensionality. Int J Mol Sci 2024; 25:2102. [PMID: 38396779 PMCID: PMC10889442 DOI: 10.3390/ijms25042102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 12/25/2023] [Accepted: 12/27/2023] [Indexed: 02/25/2024] Open
Abstract
Cancer is a leading cause of death globally. The majority of cancer cases are only diagnosed in the late stages of cancer due to the use of conventional methods. This reduces the chance of survival for cancer patients. Therefore, early detection consequently followed by early diagnoses are important tasks in cancer research. Gene expression microarray technology has been applied to detect and diagnose most types of cancers in their early stages and has gained encouraging results. In this paper, we address the problem of classifying cancer based on gene expression for handling the class imbalance problem and the curse of dimensionality. The oversampling technique is utilized to overcome this problem by adding synthetic samples. Another common issue related to the gene expression dataset addressed in this paper is the curse of dimensionality. This problem is addressed by applying chi-square and information gain feature selection techniques. After applying these techniques individually, we proposed a method to select the most significant genes by combining those two techniques (CHiS and IG). We investigated the effect of these techniques individually and in combination. Four benchmarking biomedical datasets (Leukemia-subtypes, Leukemia-ALLAML, Colon, and CuMiDa) were used. The experimental results reveal that the oversampling techniques improve the results in most cases. Additionally, the performance of the proposed feature selection technique outperforms individual techniques in nearly all cases. In addition, this study provides an empirical study for evaluating several oversampling techniques along with ensemble-based learning. The experimental results also reveal that SVM-SMOTE, along with the random forests classifier, achieved the highest results, with a reporting accuracy of 100%. The obtained results surpass the findings in the existing literature as well.
Collapse
Affiliation(s)
- Sadam Al-Azani
- SDAIA-KFUPM Joint Research Center for Artificial Intelligence, King Fahd University of Petroleum and Minerals (KFUPM), Dhahran 31261, Saudi Arabia;
| | - Omer S. Alkhnbashi
- Information and Computer Science Department, King Fahd University of Petroleum and Minerals (KFUPM), Dhahran 31261, Saudi Arabia; (O.S.A.); (E.R.)
| | - Emad Ramadan
- Information and Computer Science Department, King Fahd University of Petroleum and Minerals (KFUPM), Dhahran 31261, Saudi Arabia; (O.S.A.); (E.R.)
| | - Motaz Alfarraj
- SDAIA-KFUPM Joint Research Center for Artificial Intelligence, King Fahd University of Petroleum and Minerals (KFUPM), Dhahran 31261, Saudi Arabia;
- Information and Computer Science Department, King Fahd University of Petroleum and Minerals (KFUPM), Dhahran 31261, Saudi Arabia; (O.S.A.); (E.R.)
- Electrical Engineering Department, King Fahd University of Petroleum and Minerals (KFUPM), Dhahran 31261, Saudi Arabia
| |
Collapse
|
2
|
Al-Rajab M, Lu J, Xu Q, Kentour M, Sawsa A, Shuweikeh E, Joy M, Arasaradnam R. A hybrid machine learning feature selection model-HMLFSM to enhance gene classification applied to multiple colon cancers dataset. PLoS One 2023; 18:e0286791. [PMID: 37917732 PMCID: PMC10621932 DOI: 10.1371/journal.pone.0286791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 05/20/2023] [Indexed: 11/04/2023] Open
Abstract
Colon cancer is a significant global health problem, and early detection is critical for improving survival rates. Traditional detection methods, such as colonoscopies, can be invasive and uncomfortable for patients. Machine Learning (ML) algorithms have emerged as a promising approach for non-invasive colon cancer classification using genetic data or patient demographics and medical history. One approach is to use ML to analyse genetic data, or patient demographics and medical history, to predict the likelihood of colon cancer. However, due to the challenges imposed by variable gene expression and the high dimensionality of cancer-related datasets, traditional transductive ML applications have limited accuracy and risk overfitting. In this paper, we propose a new hybrid feature selection model called HMLFSM-Hybrid Machine Learning Feature Selection Model to improve colon cancer gene classification. We developed a multifilter hybrid model including a two-phase feature selection approach, combining Information Gain (IG) and Genetic Algorithms (GA), and minimum Redundancy Maximum Relevance (mRMR) coupling with Particle Swarm Optimization (PSO). We critically tested our model on three colon cancer genetic datasets and found that the new framework outperformed other models with significant accuracy improvements (95%, ~97%, and ~94% accuracies for datasets 1, 2, and 3 respectively). The results show that our approach improves the classification accuracy of colon cancer detection by highlighting important and relevant genes, eliminating irrelevant ones, and revealing the genes that have a direct influence on the classification process. For colon cancer gene analysis, and along with our experiments and literature review, we found that selective input feature extraction prior to feature selection is essential for improving predictive performance.
Collapse
Affiliation(s)
- Murad Al-Rajab
- College of Engineering, Abu Dhabi University, Abu Dhabi, United Arab Emirates
- School of Computing and Engineering, University of Huddersfield, Huddersfield, United Kingdom
| | - Joan Lu
- School of Computing and Engineering, University of Huddersfield, Huddersfield, United Kingdom
| | - Qiang Xu
- School of Computing and Engineering, University of Huddersfield, Huddersfield, United Kingdom
| | - Mohamed Kentour
- School of Computing and Engineering, University of Huddersfield, Huddersfield, United Kingdom
| | - Ahlam Sawsa
- School of Computing and Engineering, University of Huddersfield, Huddersfield, United Kingdom
- Bradford Teaching Hospitals NHS Foundation Trust, Bradford, United Kingdom
| | - Emad Shuweikeh
- School of Computing and Engineering, University of Huddersfield, Huddersfield, United Kingdom
| | - Mike Joy
- University of Warwick, Coventry, United Kingdom
| | | |
Collapse
|
3
|
Khatun R, Akter M, Islam MM, Uddin MA, Talukder MA, Kamruzzaman J, Azad AKM, Paul BK, Almoyad MAA, Aryal S, Moni MA. Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data. Genes (Basel) 2023; 14:1802. [PMID: 37761941 PMCID: PMC10530870 DOI: 10.3390/genes14091802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/10/2023] [Accepted: 09/12/2023] [Indexed: 09/29/2023] Open
Abstract
Biomarker-based cancer identification and classification tools are widely used in bioinformatics and machine learning fields. However, the high dimensionality of microarray gene expression data poses a challenge for identifying important genes in cancer diagnosis. Many feature selection algorithms optimize cancer diagnosis by selecting optimal features. This article proposes an ensemble rank-based feature selection method (EFSM) and an ensemble weighted average voting classifier (VT) to overcome this challenge. The EFSM uses a ranking method that aggregates features from individual selection methods to efficiently discover the most relevant and useful features. The VT combines support vector machine, k-nearest neighbor, and decision tree algorithms to create an ensemble model. The proposed method was tested on three benchmark datasets and compared to existing built-in ensemble models. The results show that our model achieved higher accuracy, with 100% for leukaemia, 94.74% for colon cancer, and 94.34% for the 11-tumor dataset. This study concludes by identifying a subset of the most important cancer-causing genes and demonstrating their significance compared to the original data. The proposed approach surpasses existing strategies in accuracy and stability, significantly impacting the development of ML-based gene analysis. It detects vital genes with higher precision and stability than other existing methods.
Collapse
Affiliation(s)
- Rabea Khatun
- Department of Computer Science and Engineering, Green University of Bangladesh, Dhaka 1207, Bangladesh;
| | - Maksuda Akter
- Department of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh; (M.A.); (M.A.T.)
| | - Md. Manowarul Islam
- Department of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh; (M.A.); (M.A.T.)
| | - Md. Ashraf Uddin
- School of Information Technology, Deakin University, Waurn Ponds Campus, Geelong, VIC 3125, Australia; (M.A.U.); (S.A.)
| | - Md. Alamin Talukder
- Department of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh; (M.A.); (M.A.T.)
| | - Joarder Kamruzzaman
- Centre for Smart Analytics, Federation University Australia, Ballarat, VIC 3842, Australia;
| | - AKM Azad
- Department of Mathematics and Statistics, College of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 11564, Saudi Arabia;
| | - Bikash Kumar Paul
- Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Tangail 1902, Bangladesh;
- Department of Software Engineering, Daffodil International University (DIU), Dhaka 1342, Bangladesh
| | - Muhammad Ali Abdulllah Almoyad
- Department of Basic Medical Sciences, College of Applied Medical Sciences in Khamis Mushyt King Khalid University, Abha 61412, Saudi Arabia;
| | - Sunil Aryal
- School of Information Technology, Deakin University, Waurn Ponds Campus, Geelong, VIC 3125, Australia; (M.A.U.); (S.A.)
| | - Mohammad Ali Moni
- Artificial Intelligence & Data Science, School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St Lucia, QLD 4072, Australia
| |
Collapse
|
4
|
Qiu WR, Qi BB, Lin WZ, Zhang SH, Yu WK, Huang SF. Predicting the Lung Adenocarcinoma and Its Biomarkers by Integrating Gene Expression and DNA Methylation Data. Front Genet 2022; 13:926927. [PMID: 35846148 PMCID: PMC9280023 DOI: 10.3389/fgene.2022.926927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 06/13/2022] [Indexed: 11/17/2022] Open
Abstract
The early symptoms of lung adenocarcinoma patients are inapparent, and the clinical diagnosis of lung adenocarcinoma is primarily through X-ray examination and pathological section examination, whereas the discovery of biomarkers points out another direction for the diagnosis of lung adenocarcinoma with the development of bioinformatics technology. However, it is not accurate and trustworthy to diagnose lung adenocarcinoma due to omics data with high-dimension and low-sample size (HDLSS) features or biomarkers produced by utilizing only single omics data. To address the above problems, the feature selection methods of biological analysis are used to reduce the dimension of gene expression data (GSE19188) and DNA methylation data (GSE139032, GSE49996). In addition, the Cartesian product method is used to expand the sample set and integrate gene expression data and DNA methylation data. The classification is built by using a deep neural network and is evaluated on K-fold cross validation. Moreover, gene ontology analysis and literature retrieving are used to analyze the biological relevance of selected genes, TCGA database is used for survival analysis of these potential genes through Kaplan-Meier estimates to discover the detailed molecular mechanism of lung adenocarcinoma. Survival analysis shows that COL5A2 and SERPINB5 are significant for identifying lung adenocarcinoma and are considered biomarkers of lung adenocarcinoma.
Collapse
Affiliation(s)
- Wang-Ren Qiu
- Computer Department, Jing-De-Zhen Ceramic Institute, Jingdezhen, China
- *Correspondence: Wang-Ren Qiu, ; Shun-Fa Huang,
| | - Bei-Bei Qi
- Computer Department, Jing-De-Zhen Ceramic Institute, Jingdezhen, China
| | - Wei-Zhong Lin
- Computer Department, Jing-De-Zhen Ceramic Institute, Jingdezhen, China
| | - Shou-Hua Zhang
- Department of General Surgery, Jiangxi Provincial Children’s Hospital, Nanchang, China
| | - Wang-Ke Yu
- Computer Department, Jing-De-Zhen Ceramic Institute, Jingdezhen, China
| | - Shun-Fa Huang
- School of Information Engineering, Jingdezhen University, Jingdezhen, China
- *Correspondence: Wang-Ren Qiu, ; Shun-Fa Huang,
| |
Collapse
|
5
|
Al-Rajab M, Lu J, Xu Q. A framework model using multifilter feature selection to enhance colon cancer classification. PLoS One 2021; 16:e0249094. [PMID: 33861766 PMCID: PMC8691854 DOI: 10.1371/journal.pone.0249094] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 03/11/2021] [Indexed: 11/18/2022] Open
Abstract
Gene expression profiles can be utilized in the diagnosis of critical diseases such as cancer. The selection of biomarker genes from these profiles is significant and crucial for cancer detection. This paper presents a framework proposing a two-stage multifilter hybrid model of feature selection for colon cancer classification. Colon cancer is being extremely common nowadays among other types of cancer. There is a need to find fast and an accurate method to detect the tissues, and enhance the diagnostic process and the drug discovery. This paper reports on a study whose objective has been to improve the diagnosis of cancer of the colon through a two-stage, multifilter model of feature selection. The model described deals with feature selection using a combination of Information Gain and a Genetic Algorithm. The next stage is to filter and rank the genes identified through this method using the minimum Redundancy Maximum Relevance (mRMR) technique. The final phase is to further analyze the data using correlated machine learning algorithms. This two-stage approach, which involves the selection of genes before classification techniques are used, improves success rates for the identification of cancer cells. It is found that Decision Tree, K-Nearest Neighbor, and Naïve Bayes classifiers had showed promising accurate results using the developed hybrid framework model. It is concluded that the performance of our proposed method has achieved a higher accuracy in comparison with the existing methods reported in the literatures. This study can be used as a clue to enhance treatment and drug discovery for the colon cancer cure.
Collapse
Affiliation(s)
- Murad Al-Rajab
- School of Computing and Engineering, University of
Huddersfield, Huddersfield, United Kingdom
| | - Joan Lu
- School of Computing and Engineering, University of
Huddersfield, Huddersfield, United Kingdom
| | - Qiang Xu
- School of Computing and Engineering, University of
Huddersfield, Huddersfield, United Kingdom
| |
Collapse
|
6
|
Zhang J, Xu D, Hao K, Zhang Y, Chen W, Liu J, Gao R, Wu C, De Marinis Y. FS-GBDT: identification multicancer-risk module via a feature selection algorithm by integrating Fisher score and GBDT. Brief Bioinform 2020; 22:5901960. [PMID: 34020547 DOI: 10.1093/bib/bbaa189] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Revised: 07/03/2020] [Accepted: 07/21/2020] [Indexed: 11/14/2022] Open
Abstract
Cancer is a highly heterogeneous disease caused by dysregulation in different cell types and tissues. However, different cancers may share common mechanisms. It is critical to identify decisive genes involved in the development and progression of cancer, and joint analysis of multiple cancers may help to discover overlapping mechanisms among different cancers. In this study, we proposed a fusion feature selection framework attributed to ensemble method named Fisher score and Gradient Boosting Decision Tree (FS-GBDT) to select robust and decisive feature genes in high-dimensional gene expression datasets. Joint analysis of 11 human cancers types was conducted to explore the key feature genes subset of cancer. To verify the efficacy of FS-GBDT, we compared it with four other common feature selection algorithms by Support Vector Machine (SVM) classifier. The algorithm achieved highest indicators, outperforms other four methods. In addition, we performed gene ontology analysis and literature validation of the key gene subset, and this subset were classified into several functional modules. Functional modules can be used as markers of disease to replace single gene which is difficult to be found repeatedly in applications of gene chip, and to study the core mechanisms of cancer.
Collapse
Affiliation(s)
- Jialin Zhang
- School of Mathematics and Statistics at Shandong University, China
| | - Da Xu
- School of Mathematics and Statistics at Shandong University, China
| | - Kaijing Hao
- School of Mathematics and Statistics at Shandong University, China
| | - Yusen Zhang
- academic leader of Computer Engineering in Shandong University, China
| | - Wei Chen
- School of Mathematics and Statistics at Shandong University, China
| | - Jiaguo Liu
- School of Mathematics and Statistics at Shandong University, China
| | - Rui Gao
- School of Control Science and Engineering, Shandong University
| | - Chuanyan Wu
- School of Intelligent Engineering in Shandong Management University
| | | |
Collapse
|
7
|
Rathore S, Iftikhar MA, Chaddad A, Niazi T, Karasic T, Bilello M. Segmentation and Grade Prediction of Colon Cancer Digital Pathology Images Across Multiple Institutions. Cancers (Basel) 2019; 11:cancers11111700. [PMID: 31683818 PMCID: PMC6896042 DOI: 10.3390/cancers11111700] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2019] [Revised: 10/03/2019] [Accepted: 10/17/2019] [Indexed: 12/11/2022] Open
Abstract
Distinguishing benign from malignant disease is a primary challenge for colon histopathologists. Current clinical methods rely on qualitative visual analysis of features such as glandular architecture and size that exist on a continuum from benign to malignant. Consequently, discordance between histopathologists is common. To provide more reliable analysis of colon specimens, we propose an end-to-end computational pathology pipeline that encompasses gland segmentation, cancer detection, and then further breaking down the malignant samples into different cancer grades. We propose a multi-step gland segmentation method, which models tissue components as ellipsoids. For cancer detection/grading, we encode cellular morphology, spatial architectural patterns of glands, and texture by extracting multi-scale features: (i) Gland-based: extracted from individual glands, (ii) local-patch-based: computed from randomly-selected image patches, and (iii) image-based: extracted from images, and employ a hierarchical ensemble-classification method. Using two datasets (Rawalpindi Medical College (RMC), n = 174 and gland segmentation (GlaS), n = 165) with three cancer grades, our method reliably delineated gland regions (RMC = 87.5%, GlaS = 88.4%), detected the presence of malignancy (RMC = 97.6%, GlaS = 98.3%), and predicted tumor grade (RMC = 98.6%, GlaS = 98.6%). Training the model using one dataset and testing it on the other showed strong concordance in cancer detection (Train RMC – Test GlaS = 94.5%, Train GlaS – Test RMC = 93.7%) and grading (Train RMC – Test GlaS = 95%, Train GlaS – Test RMC = 95%) suggesting that the model will be applicable across institutions. With further prospective validation, the techniques demonstrated here may provide a reproducible and easily accessible method to standardize analysis of colon cancer specimens.
Collapse
Affiliation(s)
- Saima Rathore
- Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, PA 19104, USA.
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | - Muhammad Aksam Iftikhar
- Department of Computer Science, COMSATS University Islamabad, Lahore Campus, Lahore 54000, Pakistan.
| | - Ahmad Chaddad
- Division of Radiation Oncology, Department of Oncology, McGill University, Montreal, QC H3S 1Y9, Canada.
| | - Tamim Niazi
- Division of Radiation Oncology, Department of Oncology, McGill University, Montreal, QC H3S 1Y9, Canada.
| | - Thomas Karasic
- Department of Medicine, Division of Hematology/Oncology, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | - Michel Bilello
- Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, PA 19104, USA.
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
8
|
Kim MS, Kim D, Kim JR. Stage-Dependent Gene Expression Profiling in Colorectal Cancer. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1685-1692. [PMID: 29994071 DOI: 10.1109/tcbb.2018.2814043] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Temporal gene expression profiles have been widely considered to uncover the mechanism of cancer development and progression. Gene expression patterns, however, have been analyzed for limited stages with small samples, without proper data pre-processing, in many cases. With those approaches, it is difficult to unveil the mechanism of cancer development over time. In this study, we analyzed gene expression profiles of two independent colorectal cancer sample datasets, each of which contained 556 and 566 samples, respectively. To find specific gene expression changes according to cancer stage, we applied the linear mixed-effect regression model (LMER) that controls other clinical variables. Based on this methodology, we found two types of gene expression patterns: continuously increasing and decreasing genes as cancer develops. We found that continuously increasing genes are related to the nervous and developmental system, whereas the others are related to the cell cycle and metabolic processes. We further analyzed connected sub-networks related to the two types of genes. From these results, we suggest that the gene expression profile analysis can be used to understand underlying the mechanisms of cancer development such as cancer growth and metastasis. Furthermore, our approach can provide a good guideline for advancing our understanding of cancer developmental processes.
Collapse
|
9
|
Mahfouz MA, Nepomuceno JA. Graph coloring for extracting discriminative genes in cancer data. Ann Hum Genet 2019; 83:141-159. [PMID: 30644085 DOI: 10.1111/ahg.12297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Revised: 10/12/2018] [Accepted: 11/15/2018] [Indexed: 11/29/2022]
Abstract
BACKGROUND AND OBJECTIVE The major difficulty of the analysis of the input gene expression data in a microarray-based approach for an automated diagnosis of cancer is the large number of genes (high dimensionality) with many irrelevant genes (noise) compared to the very small number of samples. This research study tackles the dimensionality reduction challenge in this area. METHODS This research study introduces a dimension-reduction technique termed graph coloring approach (GCA) for microarray data-based cancer classification based on analyzing the absolute correlation between gene-gene pairs and partitioning genes into several hubs using graph coloring. GCA starts by a gene-selection step in which top relevant genes are selected using a biserial correlation. Each time, a gene from an ordered list of top relevant genes is selected as the hub gene (representative) and redundant genes are added to its group; the process is repeated recursively for the remaining genes. A gene is considered redundant if its absolute correlation with the hub gene is greater than a controlling threshold. A suitable range for the threshold is estimated by computing a percentage graph for the absolute correlation between gene-gene pairs. Each value in the estimated range for the threshold can efficiently produce a new feature subset. RESULTS GCA achieved significant improvement over several existing techniques in terms of higher accuracy and a smaller number of features. Also, genes selected by this method are relevant genes according to the information stored in scientific repositories. CONCLUSIONS The proposed dimension-reduction technique can help biologists accurately predict cancer in several areas of the body.
Collapse
Affiliation(s)
- Mohamed A Mahfouz
- Department of Computer and Systems Engineering, Faculty of Engineering, Alexandria University, Alexandria, Egypt
| | - Juan A Nepomuceno
- Departmento de Lenguajes y Sistemas Informáticos, Higher Technical School of Computer Engineering, University of Seville, Seville, Spain
| |
Collapse
|
10
|
Li J, Dong W, Meng D. Grouped Gene Selection of Cancer via Adaptive Sparse Group Lasso Based on Conditional Mutual Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:2028-2038. [PMID: 29028206 DOI: 10.1109/tcbb.2017.2761871] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This paper deals with the problems of cancer classification and grouped gene selection. The weighted gene co-expression network on cancer microarray data is employed to identify modules corresponding to biological pathways, based on which a strategy of dividing genes into groups is presented. Using the conditional mutual information within each divided group, an integrated criterion is proposed and the data-driven weights are constructed. They are shown with the ability to evaluate both the individual gene significance and the influence to improve correlation of all the other pairwise genes in each group. Furthermore, an adaptive sparse group lasso is proposed, by which an improved blockwise descent algorithm is developed. The results on four cancer data sets demonstrate that the proposed adaptive sparse group lasso can effectively perform classification and grouped gene selection.
Collapse
|
11
|
Wang A, An N, Chen G, Liu L, Alterovitz G. Subtype dependent biomarker identification and tumor classification from gene expression profiles. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2018.01.025] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
12
|
Saroja B, SelwinMich Priyadharson A. Adaptive pillar K-means clustering-based colon cancer detection from biopsy samples with outliers. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2017. [DOI: 10.1080/21681163.2017.1350603] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Affiliation(s)
- B. Saroja
- School of Electrical and Computing, Vel Tech University, Avadi, Chennai
| | | |
Collapse
|
13
|
Alam MGR, Abedin SF, Al Ameen M, Hong CS. Web of Objects Based Ambient Assisted Living Framework for Emergency Psychiatric State Prediction. SENSORS 2016; 16:s16091431. [PMID: 27608023 PMCID: PMC5038709 DOI: 10.3390/s16091431] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/23/2016] [Revised: 08/24/2016] [Accepted: 08/30/2016] [Indexed: 02/05/2023]
Abstract
Ambient assisted living can facilitate optimum health and wellness by aiding physical, mental and social well-being. In this paper, patients’ psychiatric symptoms are collected through lightweight biosensors and web-based psychiatric screening scales in a smart home environment and then analyzed through machine learning algorithms to provide ambient intelligence in a psychiatric emergency. The psychiatric states are modeled through a Hidden Markov Model (HMM), and the model parameters are estimated using a Viterbi path counting and scalable Stochastic Variational Inference (SVI)-based training algorithm. The most likely psychiatric state sequence of the corresponding observation sequence is determined, and an emergency psychiatric state is predicted through the proposed algorithm. Moreover, to enable personalized psychiatric emergency care, a service a web of objects-based framework is proposed for a smart-home environment. In this framework, the biosensor observations and the psychiatric rating scales are objectified and virtualized in the web space. Then, the web of objects of sensor observations and psychiatric rating scores are used to assess the dweller’s mental health status and to predict an emergency psychiatric state. The proposed psychiatric state prediction algorithm reported 83.03 percent prediction accuracy in an empirical performance study.
Collapse
Affiliation(s)
- Md Golam Rabiul Alam
- Computer Science and Engineering, Kyung Hee University, 1732 Deokyoungdaero, Gilheung-gu, Yongin-si 446-701, Korea.
| | - Sarder Fakhrul Abedin
- Computer Science and Engineering, Kyung Hee University, 1732 Deokyoungdaero, Gilheung-gu, Yongin-si 446-701, Korea.
| | - Moshaddique Al Ameen
- Computer Science and Engineering, Kyung Hee University, 1732 Deokyoungdaero, Gilheung-gu, Yongin-si 446-701, Korea.
| | - Choong Seon Hong
- Computer Science and Engineering, Kyung Hee University, 1732 Deokyoungdaero, Gilheung-gu, Yongin-si 446-701, Korea.
| |
Collapse
|