1
|
Lee H, Kim J. A Gene Selection Method Considering Measurement Errors. J Comput Biol 2024; 31:71-82. [PMID: 38010511 DOI: 10.1089/cmb.2023.0041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2023] Open
Abstract
The analysis of gene expression data has made significant contributions to understanding disease mechanisms and developing new drugs and therapies. In such analysis, gene selection is often required for identifying informative and relevant genes and removing redundant and irrelevant ones. However, this is not an easy task as gene expression data have inherent challenges such as ultra-high dimensionality, biological noise, and measurement errors. This study focuses on the measurement errors in gene selection problems. Typically, high-throughput experiments have their own intrinsic measurement errors, which can result in an increase of falsely discovered genes. To alleviate this problem, this study proposes a gene selection method that takes into account measurement errors using generalized liner measurement error models. The method consists of iterative filtering and selection steps until convergence, leading to fewer false positives and providing stable results under measurement errors. The performance of the proposed method is demonstrated through simulation studies and applied to a lung cancer data set.
Collapse
Affiliation(s)
- Hajoung Lee
- Department of Statistics, Sungkyunkwan University, Seoul, South Korea
| | - Jaejik Kim
- Department of Statistics, Sungkyunkwan University, Seoul, South Korea
| |
Collapse
|
2
|
Mohamed TIA, Ezugwu AE, Fonou-Dombeu JV, Mohammed M, Greeff J, Elbashir MK. A novel feature selection algorithm for identifying hub genes in lung cancer. Sci Rep 2023; 13:21671. [PMID: 38066059 PMCID: PMC10709567 DOI: 10.1038/s41598-023-48953-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 12/01/2023] [Indexed: 12/18/2023] Open
Abstract
Lung cancer, a life-threatening disease primarily affecting lung tissue, remains a significant contributor to mortality in both developed and developing nations. Accurate biomarker identification is imperative for effective cancer diagnosis and therapeutic strategies. This study introduces the Voting-Based Enhanced Binary Ebola Optimization Search Algorithm (VBEOSA), an innovative ensemble-based approach combining binary optimization and the Ebola optimization search algorithm. VBEOSA harnesses the collective power of the state-of-the-art classification models through soft voting. Moreover, our research applies VBEOSA to an extensive lung cancer gene expression dataset obtained from TCGA, following essential preprocessing steps including outlier detection and removal, data normalization, and filtration. VBEOSA aids in feature selection, leading to the discovery of key hub genes closely associated with lung cancer, validated through comprehensive protein-protein interaction analysis. Notably, our investigation reveals ten significant hub genes-ADRB2, ACTB, ARRB2, GNGT2, ADRB1, ACTG1, ACACA, ATP5A1, ADCY9, and ADRA1B-each demonstrating substantial involvement in the domain of lung cancer. Furthermore, our pathway analysis sheds light on the prominence of strategic pathways such as salivary secretion and the calcium signaling pathway, providing invaluable insights into the intricate molecular mechanisms underpinning lung cancer. We also utilize the weighted gene co-expression network analysis (WGCNA) method to identify gene modules exhibiting strong correlations with clinical attributes associated with lung cancer. Our findings underscore the efficacy of VBEOSA in feature selection and offer profound insights into the multifaceted molecular landscape of lung cancer. Finally, we are confident that this research has the potential to improve diagnostic capabilities and further enrich our understanding of the disease, thus setting the stage for future advancements in the clinical management of lung cancer. The VBEOSA source codes is publicly available at https://github.com/TEHNAN/VBEOSA-A-Novel-Feature-Selection-Algorithm-for-Identifying-hub-Genes-in-Lung-Cancer .
Collapse
Affiliation(s)
- Tehnan I A Mohamed
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, South Africa
- Department of Computer Science, Faculty of Mathematical and Computer Sciences, University of Gezira, Wad Madani, 11123, Sudan
| | - Absalom E Ezugwu
- Unit for Data Science and Computing, North-West University, Potchefstroom, South Africa.
| | - Jean Vincent Fonou-Dombeu
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, South Africa
| | - Mohanad Mohammed
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, South Africa
| | - Japie Greeff
- School of Computer Science and Information Systems, Faculty of Natural and Agricultural Sciences, North-West University, Vanderbijlpark, South Africa
| | - Murtada K Elbashir
- Department of Information Systems, College of Computer and Information Sciences, Jouf University, 72388, Sakaka, Saudi Arabia
| |
Collapse
|
3
|
Kapila R, Saleti S. Optimizing fetal health prediction: Ensemble modeling with fusion of feature selection and extraction techniques for cardiotocography data. Comput Biol Chem 2023; 107:107973. [PMID: 37926049 DOI: 10.1016/j.compbiolchem.2023.107973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Revised: 09/12/2023] [Accepted: 10/19/2023] [Indexed: 11/07/2023]
Abstract
Cardiotocography (CTG) captured the fetal heart rate and the timing of uterine contractions. Throughout pregnancy, CTG intelligent categorization is crucial for monitoring fetal health and preserving proper fetal growth and development. Since CTG provides information on the fetal heartbeat and uterus contractions, which helps determine if the fetus is pathologic or not, obstetricians frequently use it to evaluate a child's physical health during pregnancy. In the past, obstetricians have artificially analyzed CTG data, which is time-consuming and inaccurate. So, developing a fetal health categorization model is crucial as it may help to speed up the diagnosis and treatment and conserve medical resources. The CTG dataset is used in this study. To diagnose the illness, 7 machine learning models are employed, as well as ensemble strategies including voting and stacking classifiers. In order to choose and extract the most significant and critical attributes from the dataset, Feature Selection (FS) techniques like ANOVA and Chi-square, as well as Feature Extraction (FE) strategies like Principal Component Analysis (PCA) and Independent Component Analysis (ICA), are being used. We used the Synthetic Minority Oversampling Technique (SMOTE) approach to balance the dataset because it is unbalanced. In order to forecast the illness, the top 5 models are selected, and these 5 models are used in ensemble methods such as voting and stacking classifiers. The utilization of Stacking Classifiers (SC), which involve Adaboost and Random Forest (RF) as meta-classifiers for disease detection. The performance of the proposed SC with meta-classifier as RF model, which incorporates Chi-square with PCA, outperformed all other state-of-the-art models, achieving scores of 98.79%,98.88%,98.69%,96.32%, and 98.77% for accuracy, precision, recall, specificity, and f1-score respectively.
Collapse
Affiliation(s)
- Ramdas Kapila
- Data Science Laboratory, Computer Science and Engineering, SRM University - AP, India.
| | - Sumalatha Saleti
- Data Science Laboratory, Computer Science and Engineering, SRM University - AP, India.
| |
Collapse
|