1
|
Meng Q, Liu D, Huang J, Yang X, Li H, Yang Z, Wang J, Gao W, Li Y, Liu R, Yang L, Wei J. RGIE: A Gene Selection Method Related to Radiotherapy Resistance in Head and Neck Squamous Cell Carcinoma. Curr Radiopharm 2024:CRP-EPUB-139379. [PMID: 38532606 DOI: 10.2174/0118744710282465240315053136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Revised: 02/12/2024] [Accepted: 02/21/2024] [Indexed: 03/28/2024]
Abstract
BACKGROUND Head and Neck Squamous Cell Carcinoma (HNSCC) is a malignant tumor with a high degree of malignancy, invasiveness, and metastasis rate. Radiotherapy, as an important adjuvant therapy for HNSCC, can reduce the postoperative recurrence rate and improve the survival rate. Identifying the genes related to HNSCC radiotherapy resistance (HNSCC-RR) is helpful in the search for potential therapeutic targets. However, identifying radiotherapy resistance-related genes from tens of thousands of genes is a challenging task. While interactions between genes are important for elucidating complex biological processes, the large number of genes makes the computation of gene interactions infeasible. METHODS We propose a gene selection algorithm, RGIE, which is based on ReliefF, Gene Network Inference with Ensemble of Trees (GENIE3) and Feature Elimination. ReliefF was used to select a feature subset that is discriminative for HNSCC-RR, GENIE3 constructed a gene regulatory network based on this subset to analyze the regulatory relationship among genes, and feature elimination was used to remove redundant and noisy features. RESULTS Nine genes (SPAG1, FIGN, NUBPL, CHMP5, TCF7L2, COQ10B, BSDC1, ZFPM1, GRPEL1) were identified and used to identify HNSCC-RR, which achieved performances of 0.9730, 0.9679, 0.9767, and 0.9885 in terms of accuracy, precision, recall, and AUC, respectively. Finally, qRT-PCR validated the differential expression of the nine signature genes in cell lines (SCC9, SCC9-RR). CONCLUSION RGIE is effective in screening genes related to HNSCC-RR. This approach may help guide clinical treatment modalities for patients and develop potential treatments.
Collapse
Affiliation(s)
- Qingzhe Meng
- Department of Oral and Maxillofacial Surgery, School of Stomatology, Fourth Military Medical University, Xi'an, China
- School of Stomatology, Heilongjiang Key Lab of Oral Biomedicine Materials and Clinical Application & Experimental Center for Stomatology Engineering, Jiamusi University, Jiamusi, China
| | - Dunhui Liu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Junhong Huang
- Department of Oral and Maxillofacial Surgery, School of Stomatology, Fourth Military Medical University, Xi'an, China
| | - Xinjie Yang
- Department of Oral and Maxillofacial Surgery, School of Stomatology, Fourth Military Medical University, Xi'an, China
| | - Huan Li
- Department of Oral and Maxillofacial Surgery, School of Stomatology, Fourth Military Medical University, Xi'an, China
| | - Zihui Yang
- Department of Oral and Maxillofacial Surgery, School of Stomatology, Fourth Military Medical University, Xi'an, China
| | - Jun Wang
- Department of Oral and Maxillofacial Surgery, School of Stomatology, Fourth Military Medical University, Xi'an, China
| | - Wanpeng Gao
- Department of Oral and Maxillofacial Surgery, School of Stomatology, Fourth Military Medical University, Xi'an, China
| | - Yahui Li
- Department of Oral and Maxillofacial Surgery, School of Stomatology, Fourth Military Medical University, Xi'an, China
| | - Rong Liu
- Department of Oral and Maxillofacial Surgery, School of Stomatology, Fourth Military Medical University, Xi'an, China
| | - Liying Yang
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Jianhua Wei
- Department of Oral and Maxillofacial Surgery, School of Stomatology, Fourth Military Medical University, Xi'an, China
| |
Collapse
|
2
|
Al-Haddad LA, Alawee WH, Basem A. Advancing task recognition towards artificial limbs control with ReliefF-based deep neural network extreme learning. Comput Biol Med 2024; 169:107894. [PMID: 38154161 DOI: 10.1016/j.compbiomed.2023.107894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 12/04/2023] [Accepted: 12/21/2023] [Indexed: 12/30/2023]
Abstract
In the rapidly advancing field of biomedical engineering, effective real-time control of artificial limbs is a pressing research concern. Addressing this, the current study introduces a pioneering method for augmenting task recognition in prosthetic control systems, combining a ReliefF-based Deep Neural Networks (DNNs) approach. This paper has leveraged the MILimbEEG dataset, a comprehensive rich source collection of EEG signals, to calculate statistical features of Arithmetic Mean (AM), Standard Deviation (SD), and Skewness (S) across various motor activities. Supreme Feature Selection (SFS), of the adopted time-domain features, was performed using the ReliefF algorithm. The highest scored DNN-ReliefF developed model demonstrated remarkable performance, achieving accuracy, precision, and recall rates of 97.4 %, 97.3 %, and 97.4 %, respectively. In contrast, a traditional DNN model yielded accuracy, precision, and recall rates of 50.8 %, 51.1 %, and 50.8 %, highlighting the significant improvements made possible by incorporating SFS. This stark contrast underscores the transformative potential of incorporating ReliefF, situating the DNN-ReliefF model as a robust platform for forthcoming advancements in real-time prosthetic control systems.
Collapse
Affiliation(s)
- Luttfi A Al-Haddad
- Training and Workshops Center, University of Technology- Iraq, Baghdad, Iraq.
| | - Wissam H Alawee
- Training and Workshops Center, University of Technology- Iraq, Baghdad, Iraq; Control and Systems Engineering Department, University of Technology- Iraq, Baghdad, Iraq
| | - Ali Basem
- Air Conditioning Engineering Department, Faculty of Engineering, Warith Al-Anbiyaa University, Iraq
| |
Collapse
|
3
|
Mohanty S, Shivanna DB, Rao RS, Astekar M, Chandrashekar C, Radhakrishnan R, Sanjeevareddygari S, Kotrashetti V, Kumar P. Building Automation Pipeline for Diagnostic Classification of Sporadic Odontogenic Keratocysts and Non-Keratocysts Using Whole-Slide Images. Diagnostics (Basel) 2023; 13:3384. [PMID: 37958281 PMCID: PMC10648794 DOI: 10.3390/diagnostics13213384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 10/13/2023] [Accepted: 10/27/2023] [Indexed: 11/15/2023] Open
Abstract
The microscopic diagnostic differentiation of odontogenic cysts from other cysts is intricate and may cause perplexity for both clinicians and pathologists. Of particular interest is the odontogenic keratocyst (OKC), a developmental cyst with unique histopathological and clinical characteristics. Nevertheless, what distinguishes this cyst is its aggressive nature and high tendency for recurrence. Clinicians encounter challenges in dealing with this frequently encountered jaw lesion, as there is no consensus on surgical treatment. Therefore, the accurate and early diagnosis of such cysts will benefit clinicians in terms of treatment management and spare subjects from the mental agony of suffering from aggressive OKCs, which impact their quality of life. The objective of this research is to develop an automated OKC diagnostic system that can function as a decision support tool for pathologists, whether they are working locally or remotely. This system will provide them with additional data and insights to enhance their decision-making abilities. This research aims to provide an automation pipeline to classify whole-slide images of OKCs and non-keratocysts (non-KCs: dentigerous and radicular cysts). OKC diagnosis and prognosis using the histopathological analysis of tissues using whole-slide images (WSIs) with a deep-learning approach is an emerging research area. WSIs have the unique advantage of magnifying tissues with high resolution without losing information. The contribution of this research is a novel, deep-learning-based, and efficient algorithm that reduces the trainable parameters and, in turn, the memory footprint. This is achieved using principal component analysis (PCA) and the ReliefF feature selection algorithm (ReliefF) in a convolutional neural network (CNN) named P-C-ReliefF. The proposed model reduces the trainable parameters compared to standard CNN, achieving 97% classification accuracy.
Collapse
Affiliation(s)
- Samahit Mohanty
- Department of Computer Science and Engineering, M S Ramaiah University of Applied Sciences, Bengaluru 560054, India;
| | - Divya B. Shivanna
- Department of Computer Science and Engineering, M S Ramaiah University of Applied Sciences, Bengaluru 560054, India;
| | - Roopa S. Rao
- Department of Oral Pathology and Microbiology, Faculty of Dental Sciences, M S Ramaiah University of Applied Sciences, Bengaluru 560054, India;
| | - Madhusudan Astekar
- Department of Oral Pathology, Institute of Dental Sciences, Bareilly 243006, India;
| | - Chetana Chandrashekar
- Department of Oral & Maxillofacial Pathology & Microbiology, Manipal College of Dental Sciences, Manipal 576104, India; (C.C.); (R.R.)
| | - Raghu Radhakrishnan
- Department of Oral & Maxillofacial Pathology & Microbiology, Manipal College of Dental Sciences, Manipal 576104, India; (C.C.); (R.R.)
| | | | - Vijayalakshmi Kotrashetti
- Department of Oral & Maxillofacial Pathology & Microbiology, Maratha Mandal’s Nathajirao G Halgekar, Institute of Dental Science & Research Centre, Belgaum 590010, India;
| | - Prashant Kumar
- Department of Oral & Maxillofacial Pathology, Nijalingappa Institute of Dental Science & Research, Gulbarga 585105, India;
| |
Collapse
|
4
|
Li Y, Yu ND, Ye XL, Jiang MC, Chen XQ. Construction of lung cancer serum markers based on ReliefF feature selection. Comput Methods Biomech Biomed Engin 2023:1-9. [PMID: 37489703 DOI: 10.1080/10255842.2023.2235045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/26/2023]
Abstract
Serum miRNAs are available clinical samples for cancer screening. Identifying early serum markers in lung cancer (LC) is essential for patients' early diagnosis and clinical treatment. Expression data of serum miRNAs of lung adenocarcinoma (LUAD) patients and healthy individuals were downloaded from the Gene Expression Omnibus (GEO). These data were normalized and subjected to differential expression analysis to obtain differentially expressed miRNAs (DEmiRNAs). The DEmiRNAs were subsequently subjected to ReliefF feature selection, and subsets closely related to cancer were screened as candidate feature miRNAs. Thereafter, a Gaussian Naive Bayes (NB), Support Vector Machine (SVM), and Random Forest (RF) classifier were constructed based on these candidate feature miRNAs. Then the best diagnostic signature was constructed through NB combined with incremental feature selection (IFS). Thereafter, these samples were subjected to principal component analysis (PCA) based on miRNAs with optimal predictive performance. Finally, the peripheral serum miRNAs of 64 LUAD patients and 59 normal individuals were extracted for qRT-PCR analysis to validate the performance of the diagnostic model in respect of clinical detection. Finally, according to area under the curve (AUC) and accuracy values, the NB classifier composed of miR-5100 and miR-663a manifested the most outstanding diagnostic performance. The PCA results also revealed that the 2-miRNA diagnostic signature could effectively distinguish cancer patients from healthy individuals. Finally, qRT-PCR results of clinical serum samples revealed that miR-5100 and miR-663a expression in tumor samples was remarkably higher than that in normal samples. The AUC of the 2-miRNA diagnostic signature was 0.968. In summary, we identified markers (miR-5100 and miR-663a) in serum for early LUAD screening, providing ideas for developing early LUAD diagnostic models.
Collapse
Affiliation(s)
- Yong Li
- Department of Respiration Medicine, Fujian Medical University Union Hospital, Fuzhou, Fujian, China
| | - Nan-Ding Yu
- Department of Respiration Medicine, Fujian Medical University Union Hospital, Fuzhou, Fujian, China
| | - Xiang-Li Ye
- Department of Respiration Medicine, Fujian Medical University Union Hospital, Fuzhou, Fujian, China
| | - Mei-Chen Jiang
- Department of Pathology, Fujian Medical University Union Hospital, Fuzhou, Fujian, China
| | - Xiang-Qi Chen
- Department of Respiration Medicine, Fujian Medical University Union Hospital, Fuzhou, Fujian, China
| |
Collapse
|
5
|
Fu Q, Li Q, Li X. An improved multi-objective marine predator algorithm for gene selection in classification of cancer microarray data. Comput Biol Med 2023; 160:107020. [PMID: 37196457 DOI: 10.1016/j.compbiomed.2023.107020] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Revised: 04/09/2023] [Accepted: 05/05/2023] [Indexed: 05/19/2023]
Abstract
Gene selection (GS) is an important branch of interest within the field of feature selection, which is widely used in cancer classification. It provides essential insights into the pathogenesis of cancer and enables a deeper understanding of cancer data. In cancer classification, GS is essentially a multi-objective optimization problem, which aims to simultaneously optimize the two objectives of classification accuracy and the size of the gene subset. The marine predator algorithm (MPA) has been successfully employed in practical applications, however, its random initialization can lead to blindness, which may adversely affect the convergence of the algorithm. Furthermore, the elite individuals in guiding evolution are randomly chosen from the Pareto solutions, which may degrade the good exploration performance of the population. To overcome these limitations, a multi-objective improved MPA with continuous mapping initialization and leader selection strategies is proposed. In this work, a new continuous mapping initialization with ReliefF overwhelms the defects with less information in late evolution. Moreover, an improved elite selection mechanism with Gaussian distribution guides the population to evolve towards a better Pareto front. Finally, an efficient mutation method is adopted to prevent evolutionary stagnation. To evaluate its effectiveness, the proposed algorithm was compared with 9 famous algorithms. The experimental results on 16 datasets demonstrate that the proposed algorithm can significantly reduce the data dimension and obtain the highest classification accuracy on most of high-dimension cancer microarray datasets.
Collapse
Affiliation(s)
- Qiyong Fu
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China
| | - Qi Li
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China
| | - Xiaobo Li
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China.
| |
Collapse
|
6
|
Yacob YM, Alquran H, Mustafa WA, Alsalatie M, Sakim HAM, Lola MS. H. pylori Related Atrophic Gastritis Detection Using Enhanced Convolution Neural Network (CNN) Learner. Diagnostics (Basel) 2023; 13:diagnostics13030336. [PMID: 36766441 PMCID: PMC9914156 DOI: 10.3390/diagnostics13030336] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 01/09/2023] [Accepted: 01/12/2023] [Indexed: 01/19/2023] Open
Abstract
Atrophic gastritis (AG) is commonly caused by the infection of the Helicobacter pylori (H. pylori) bacteria. If untreated, AG may develop into a chronic condition leading to gastric cancer, which is deemed to be the third primary cause of cancer-related deaths worldwide. Precursory detection of AG is crucial to avoid such cases. This work focuses on H. pylori-associated infection located at the gastric antrum, where the classification is of binary classes of normal versus atrophic gastritis. Existing work developed the Deep Convolution Neural Network (DCNN) of GoogLeNet with 22 layers of the pre-trained model. Another study employed GoogLeNet based on the Inception Module, fast and robust fuzzy C-means (FRFCM), and simple linear iterative clustering (SLIC) superpixel algorithms to identify gastric disease. GoogLeNet with Caffe framework and ResNet-50 are machine learners that detect H. pylori infection. Nonetheless, the accuracy may become abundant as the network depth increases. An upgrade to the current standards method is highly anticipated to avoid untreated and inaccurate diagnoses that may lead to chronic AG. The proposed work incorporates improved techniques revolving within DCNN with pooling as pre-trained models and channel shuffle to assist streams of information across feature channels to ease the training of networks for deeper CNN. In addition, Canonical Correlation Analysis (CCA) feature fusion method and ReliefF feature selection approaches are intended to revamp the combined techniques. CCA models the relationship between the two data sets of significant features generated by pre-trained ShuffleNet. ReliefF reduces and selects essential features from CCA and is classified using the Generalized Additive Model (GAM). It is believed the extended work is justified with a 98.2% testing accuracy reading, thus providing an accurate diagnosis of normal versus atrophic gastritis.
Collapse
Affiliation(s)
- Yasmin Mohd Yacob
- Faculty of Electronic Engineering & Technology, Pauh Putra Campus, Universiti Malaysia Perlis (UniMAP), Arau 02600, Perlis, Malaysia
- Centre of Excellence for Advanced Computing, Pauh Putra Campus, Universiti Malaysia Perlis (UniMAP), Arau 02600, Perlis, Malaysia
| | - Hiam Alquran
- Department of Biomedical Systems and Informatics Engineering, Yarmouk University, Irbid 21163, Jordan
- Department of Biomedical Engineering, Jordan University of Science and Technology, Irbid 22110, Jordan
| | - Wan Azani Mustafa
- Centre of Excellence for Advanced Computing, Pauh Putra Campus, Universiti Malaysia Perlis (UniMAP), Arau 02600, Perlis, Malaysia
- Faculty of Electrical Engineering & Technology, Pauh Putra Campus, Universiti Malaysia Perlis (UniMAP), Arau 02600, Perlis, Malaysia
- Correspondence:
| | - Mohammed Alsalatie
- King Hussein Medical Center, Royal Jordanian Medical Service, The Institute of Biomedical Technology, Amman 11855, Jordan
| | - Harsa Amylia Mat Sakim
- School of Electrical and Electronic Engineering, Engineering Campus, Universiti Sains Malaysia, Nibong Tebal 11800, Penang, Malaysia
| | - Muhamad Safiih Lola
- Faculty of Ocean Engineering Technology and Informatics, Universiti Malaysia Terengganu, Kuala Terengganu 21030, Terengganu, Malaysia
| |
Collapse
|
7
|
Souza A, Rojas MZ, Yang Y, Lee L, Hoagland L. Classifying cadmium contaminated leafy vegetables using hyperspectral imaging and machine learning. Heliyon 2022; 8:e12256. [PMID: 36590539 PMCID: PMC9800301 DOI: 10.1016/j.heliyon.2022.e12256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 10/18/2022] [Accepted: 12/02/2022] [Indexed: 12/15/2022] Open
Abstract
Cadmium (Cd) is a toxic element that can accumulate in edible plant tissues and negatively impact human health. Traditional Cd quantification methods are time-consuming, expensive, and generate a lot of toxic waste, slowing development of methods to reduce uptake. The objective of this study was to determine whether hyperspectral imaging (HSI) and machine learning (ML) can be used to predict Cd concentrations in plants using kale (Brassica oleracea) and basil (Ocimum basilicum) as model crops. The experiments were conducted in an automated phenotyping facility where all environmental conditions except soil Cd concentration were kept constant. Cd concentrations were determined at harvest using traditional methods and used to train the ML models with data collected from the imaging sensor. Visible/near infrared (VNIR) images were also collected at harvest and processed to calculate reflectance at 473 bands between 400 to 998 nm. All reflectance spectra were subject to the feature selection algorithm ReliefF and Principal Component Analysis (PCA) to generate data and provide input to evaluate three ML classification models: artificial neural network (ANN), ensemble learning (EL), and support vector machine (SVM). Plants were categorized according to Cd concentrations higher or lower than the safety threshold of 0.2 mg kg-1 Cd. Wavelengths with the highest ranks for Cd detection were between 519 and 574, and 692 and 732 nm, indicating that Cd content likely altered the plants' chlorophyll content and altered leaf internal structure. All models were able to sort the plants into groups, though the model with the best F1 score was the ANN for the validation subset that utilized reflectance from all wavelengths. This study demonstrates that HSI and ML are promising technologies for the fast and precise diagnosis of Cd in leafy green plants, though additional studies are needed to adapt this approach for more complex field environments.
Collapse
Affiliation(s)
- Augusto Souza
- Institute for Plant Sciences, Purdue University, West Lafayette, IN, USA
| | - Maria Zea Rojas
- Horticulture and Landscape Architecture Department, Purdue University, West Lafayette, IN, USA
| | - Yang Yang
- Institute for Plant Sciences, Purdue University, West Lafayette, IN, USA
| | - Linda Lee
- Agronomy Department, Purdue University, West Lafayette, IN, USA
| | - Lori Hoagland
- Horticulture and Landscape Architecture Department, Purdue University, West Lafayette, IN, USA,Corresponding author.
| |
Collapse
|
8
|
Ali MU, Kallu KD, Masood H, Hussain SJ, Ullah S, Byun JH, Zafar A, Kim KS. A Robust Computer-Aided Automated Brain Tumor Diagnosis Approach Using PSO- ReliefF Optimized Gaussian and Non-Linear Feature Space. Life (Basel) 2022; 12:life12122036. [PMID: 36556401 PMCID: PMC9782364 DOI: 10.3390/life12122036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 11/22/2022] [Accepted: 11/28/2022] [Indexed: 12/12/2022]
Abstract
Brain tumors are among the deadliest diseases in the modern world. This study proposes an optimized machine-learning approach for the detection and identification of the type of brain tumor (glioma, meningioma, or pituitary tumor) in brain images recorded using magnetic resonance imaging (MRI). The Gaussian features of the image are extracted using speed-up robust features (SURF), whereas its non-linear features are obtained using KAZE, owing to their high performance against rotation, scaling, and noise problems. To retrieve local-level information, all brain MRI images are segmented into an 8 × 8 pixel grid. To enhance the accuracy and reduce the computational time, the variance-based k-means clustering and PSO-ReliefF algorithms are employed to eliminate the redundant features of the brain MRI images. Finally, the performance of the proposed hybrid optimized feature vector is evaluated using various machine learning classifiers. An accuracy of 96.30% is obtained with 169 features using a support vector machine (SVM). Furthermore, the computational time is also reduced to 1 min compared to the non-optimized features used for training of the SVM. The findings are also compared with previous research, demonstrating that the suggested approach might assist physicians and doctors in the timely detection of brain tumors.
Collapse
Affiliation(s)
- Muhammad Umair Ali
- Department of Unmanned Vehicle Engineering, Sejong University, Seoul 05006, Republic of Korea
| | - Karam Dad Kallu
- Department of Robotics & Artificial Intelligence (R&AI), School of Mechanical and Manufacturing Engineering (SMME), National University of Sciences and Technology (NUST) H−12, Islamabad 44000, Pakistan
| | - Haris Masood
- Electrical Engineering Department, Wah Engineering College, University of Wah, Wah Cantt 47040, Pakistan
| | - Shaik Javeed Hussain
- Department of Electrical and Electronics, Global College of Engineering and Technology, Muscat 112, Oman
| | - Safee Ullah
- Department of Electrical Engineering HITEC University, Taxila 47080, Pakistan
| | - Jong Hyuk Byun
- Department of Mathematics, College of Natural Sciences, Pusan National University, Busan 46241, Republic of Korea
| | - Amad Zafar
- Department of Intelligent Mechatronics Engineering, Sejong University, Seoul 05006, Republic of Korea
- Correspondence: (A.Z.); (K.S.K.)
| | - Kawang Su Kim
- Department of Scientific computing, Pukyong National University, Busan 48513, Republic of Korea
- Interdisciplinary Biology Laboratory (iBLab), Division of Biological Science, Graduate School of Science, Nagoya University, Nagoya 464-8602, Japan
- Correspondence: (A.Z.); (K.S.K.)
| |
Collapse
|
9
|
Gamage HN, Chetty M, Shatte A, Hallinan J. Filter feature selection based Boolean Modelling for Genetic Network Inference. Biosystems 2022; 221:104757. [PMID: 36007675 DOI: 10.1016/j.biosystems.2022.104757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 08/04/2022] [Accepted: 08/04/2022] [Indexed: 11/02/2022]
Abstract
The reconstruction of Gene Regulatory Networks (GRNs) from time series gene expression data is highly relevant for the discovery of complex biological interactions and dynamics. Various computational strategies have been developed for this task, but most approaches have low computational efficiency and are not able to cope with high-dimensional, low sample-number, gene expression data. In this paper, we introduce a novel combined filter feature selection approach for efficient and accurate inference of GRNs. A Boolean framework for network modelling is used to demonstrate the efficacy of the proposed approach. Using discretized microarray expression data, the genes most relevant to each target gene are first filtered using ReliefF, an instance-based feature ranking method that is here applied for the first time to GRN inference. Then, further gene selection from the filtered-gene list is done using a mutual information-based min-redundancy max-relevance criterion by eliminating irrelevant genes. This combined method is executed on resampled datasets to finalize the optimal set of regulatory genes. Building upon our previous research, a Pearson correlation coefficient-based Boolean modelling approach is utilized for the efficient identification of the optimal regulatory rules associated with selected regulatory genes. The proposed approach was evaluated using gene expression datasets from small-scale and medium-scale real gene networks, and was observed to be more effective than Linear Discriminant Analysis, performed better than the individual feature selection methods, and obtained improved Structural Accuracy with a higher number of true positives than other state-of-the-art methods, while outperforming these methods with respect to Dynamic Accuracy and efficiency.
Collapse
Affiliation(s)
| | - Madhu Chetty
- Health Innovation and Transformation Centre, Federation University, Victoria, Australia
| | - Adrian Shatte
- Health Innovation and Transformation Centre, Federation University, Victoria, Australia
| | | |
Collapse
|
10
|
Lapajne J, Knapič M, Žibrat U. Comparison of Selected Dimensionality Reduction Methods for Detection of Root-Knot Nematode Infestations in Potato Tubers Using Hyperspectral Imaging. Sensors (Basel) 2022; 22:s22010367. [PMID: 35009907 PMCID: PMC8749520 DOI: 10.3390/s22010367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 12/23/2021] [Accepted: 12/27/2021] [Indexed: 11/29/2022]
Abstract
Hyperspectral imaging is a popular tool used for non-invasive plant disease detection. Data acquired with it usually consist of many correlated features; hence most of the acquired information is redundant. Dimensionality reduction methods are used to transform the data sets from high-dimensional, to low-dimensional (in this study to one or a few features). We have chosen six dimensionality reduction methods (partial least squares, linear discriminant analysis, principal component analysis, RandomForest, ReliefF, and Extreme gradient boosting) and tested their efficacy on a hyperspectral data set of potato tubers. The extracted or selected features were pipelined to support vector machine classifier and evaluated. Tubers were divided into two groups, healthy and infested with Meloidogyne luci. The results show that all dimensionality reduction methods enabled successful identification of inoculated tubers. The best and most consistent results were obtained using linear discriminant analysis, with 100% accuracy in both potato tuber inside and outside images. Classification success was generally higher in the outside data set, than in the inside. Nevertheless, accuracy was in all cases above 0.6.
Collapse
|
11
|
Song C, Zhao W, Jiang H, Liu X, Duan Y, Yu X, Yu X, Zhang J, Kui J, Liu C, Tang Y. Stability Evaluation of Brain Changes in Parkinson's Disease Based on Machine Learning. Front Comput Neurosci 2021; 15:735991. [PMID: 34795570 PMCID: PMC8594429 DOI: 10.3389/fncom.2021.735991] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 09/24/2021] [Indexed: 02/05/2023] Open
Abstract
Structural MRI (sMRI) has been widely used to examine the cerebral changes that occur in Parkinson's disease (PD). However, previous studies have aimed for brain changes at the group level rather than at the individual level. Additionally, previous studies have been inconsistent regarding the changes they identified. It is difficult to identify which brain regions are the true biomarkers of PD. To overcome these two issues, we employed four different feature selection methods [ReliefF, graph-theory, recursive feature elimination (RFE), and stability selection] to obtain a minimal set of relevant features and nonredundant features from gray matter (GM) and white matter (WM). Then, a support vector machine (SVM) was utilized to learn decision models from selected features. Based on machine learning technique, this study has not only extended group level statistical analysis with identifying group difference to individual level with predicting patients with PD from healthy controls (HCs), but also identified most informative brain regions with feature selection methods. Furthermore, we conducted horizontal and vertical analyses to investigate the stability of the identified brain regions. On the one hand, we compared the brain changes found by different feature selection methods and considered these brain regions found by feature selection methods commonly as the potential biomarkers related to PD. On the other hand, we compared these brain changes with previous findings reported by conventional statistical analysis to evaluate their stability. Our experiments have demonstrated that the proposed machine learning techniques achieve satisfactory and robust classification performance. The highest classification performance was 92.24% (specificity), 92.42% (sensitivity), 89.58% (accuracy), and 89.77% (AUC) for GM and 71.93% (specificity), 74.87% (sensitivity), 71.18% (accuracy), and 71.82% (AUC) for WM. Moreover, most brain regions identified by machine learning were consistent with previous findings, which means that these brain regions are related to the pathological brain changes characteristic of PD and can be regarded as potential biomarkers of PD. Besides, we also found the brain abnormality of superior frontal gyrus (dorsolateral, SFGdor) and lingual gyrus (LING), which have been confirmed in other studies of PD. This further demonstrates that machine learning models are beneficial for clinicians as a decision support system in diagnosing PD.
Collapse
Affiliation(s)
- Chenggang Song
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
- Center for Information in Medicine, University of Electronic Science and Technology of China, Chengdu, China
- MOE Key Lab for Neuroinformation, The Clinical Hospital of Chengdu Brain Science Institute, Chengdu, China
- College of Computer, Chengdu University, Chengdu, China
| | - Weidong Zhao
- College of Computer, Chengdu University, Chengdu, China
| | - Hong Jiang
- Department of Neurosurgery, Rui-Jin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Xiaoju Liu
- Department of Abdominal Oncology, Cancer Center, West China Hospital, Sichuan University, Chengdu, China
| | - Yumei Duan
- Department of Computer and Software, Chengdu Jincheng College, Chengdu, China
| | - Xiaodong Yu
- College of Computer, Chengdu University, Chengdu, China
| | - Xi Yu
- College of Computer, Chengdu University, Chengdu, China
| | - Jian Zhang
- School of Physics and Electronic Engineering, Sichuan Normal University, Chengdu, China
| | - Jingyue Kui
- Department of Urology, Tonghai County People's Hospital, Yuxi, China
| | - Chang Liu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
- Center for Information in Medicine, University of Electronic Science and Technology of China, Chengdu, China
- MOE Key Lab for Neuroinformation, The Clinical Hospital of Chengdu Brain Science Institute, Chengdu, China
- College of Computer, Chengdu University, Chengdu, China
| | - Yiqian Tang
- College of Computer, Chengdu University, Chengdu, China
| |
Collapse
|
12
|
Abstract
The COVID-19 epidemic, in which millions of people suffer, has affected the whole world in a short time. This virus, which has a high rate of transmission, directly affects the respiratory system of people. While symptoms such as difficulty in breathing, cough, and fever are common, hospitalization and fatal consequences can be seen in progressive situations. For this reason, the most important issue in combating the epidemic is to detect COVID-19(+) early and isolate those with COVID-19(+) from other people. In addition to the RT-PCR test, those with COVID-19(+) can be detected with imaging methods. In this study, it was aimed to detect COVID-19(+) patients with cough acoustic data, which is one of the important symptoms. Based on these data, features were obtained from traditional feature extraction methods using empirical mode decomposition (EMD) and discrete wavelet transform (DWT). Deep features were also obtained using pre-trained ResNet50 and pre-trained MobileNet models. Feature selection was applied to all obtained features with the ReliefF algorithm. In this case, the highest 98.4% accuracy and 98.6% F1-score values were obtained by selecting the EMD + DWT features using ReliefF. In another study in which deep features were used, features obtained from ResNet50 and MobileNet using scalogram images were used. For the features selected using the ReliefF algorithm, the highest performance was found with support vector machines-cubic as 97.8% accuracy and 98.0% F1-score. It has been determined that the features obtained by traditional feature approaches show higher performance than deep features. Among the chaotic measurements, the approximate entropy measurement was determined to be the highest distinguishing feature. According to the results, a highly successful study is presented with cough acoustic data that can easily be obtained from mobile and computer-based applications. We anticipate that this study will be useful as a decision support system in this epidemic period, when it is important to correctly identify even one person.
Collapse
Affiliation(s)
- Yunus Emre Erdoğan
- Zonguldak Bulent Ecevit University, Faculty of Engineering, Department of Electrical and Electronics Engineering, Zonguldak, Turkey; Eregli Iron and Steel Works Co., Electronics Automation Department, Zonguldak, Turkey.
| | - Ali Narin
- Zonguldak Bulent Ecevit University, Faculty of Engineering, Department of Electrical and Electronics Engineering, Zonguldak, Turkey.
| |
Collapse
|
13
|
Javed F, Gilani SO, Latif S, Waris A, Jamil M, Waqas A. Predicting Risk of Antenatal Depression and Anxiety Using Multi-Layer Perceptrons and Support Vector Machines. J Pers Med 2021; 11:jpm11030199. [PMID: 33809177 PMCID: PMC8000443 DOI: 10.3390/jpm11030199] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Revised: 03/02/2021] [Accepted: 03/08/2021] [Indexed: 01/20/2023] Open
Abstract
Perinatal depression and anxiety are defined to be the mental health problems a woman faces during pregnancy, around childbirth, and after child delivery. While this often occurs in women and affects all family members including the infant, it can easily go undetected and underdiagnosed. The prevalence rates of antenatal depression and anxiety worldwide, especially in low-income countries, are extremely high. The wide majority suffers from mild to moderate depression with the risk of leading to impaired child–mother relationship and infant health, few women end up taking their own lives. Owing to high costs and non-availability of resources, it is almost impossible to diagnose every pregnant woman for depression/anxiety whereas under-detection can have a lasting impact on mother and child’s health. This work proposes a multi-layer perceptron based neural network (MLP-NN) classifier to predict the risk of depression and anxiety in pregnant women. We trained and evaluated our proposed system on a Pakistani dataset of 500 women in their antenatal period. ReliefF was used for feature selection before classifier training. Evaluation metrics such as accuracy, sensitivity, specificity, precision, F1 score, and area under the receiver operating characteristic curve were used to evaluate the performance of the trained model. Multilayer perceptron and support vector classifier achieved an area under the receiving operating characteristic curve of 88% and 80% for antenatal depression and 85% and 77% for antenatal anxiety, respectively. The system can be used as a facilitator for screening women during their routine visits in the hospital’s gynecology and obstetrics departments.
Collapse
Affiliation(s)
- Fajar Javed
- Department of Biomedical Engineering, SMME, National University of Sciences & Technology (NUST), Islamabad 44000, Pakistan; (F.J.); (S.O.G.); (A.W.); (M.J.)
| | - Syed Omer Gilani
- Department of Biomedical Engineering, SMME, National University of Sciences & Technology (NUST), Islamabad 44000, Pakistan; (F.J.); (S.O.G.); (A.W.); (M.J.)
| | - Seemab Latif
- Department of Computing, SEECS, National University of Sciences & Technology (NUST), Islamabad 44000, Pakistan;
| | - Asim Waris
- Department of Biomedical Engineering, SMME, National University of Sciences & Technology (NUST), Islamabad 44000, Pakistan; (F.J.); (S.O.G.); (A.W.); (M.J.)
| | - Mohsin Jamil
- Department of Biomedical Engineering, SMME, National University of Sciences & Technology (NUST), Islamabad 44000, Pakistan; (F.J.); (S.O.G.); (A.W.); (M.J.)
- Department of Electrical and Computer Engineering, Faculty of Engineering and Applied Sciences, Memorial University of Newfoundland, St Johns, NL A1B 3X5, Canada
| | - Ahmed Waqas
- Institute of Population Health Sciences, University of Liverpool, Liverpool L69 3BX, UK
- Correspondence: ; Tel.: +44-07947673943
| |
Collapse
|
14
|
Baliarsingh SK, Vipsita S, Gandomi AH, Panda A, Bakshi S, Ramasubbareddy S. Analysis of high-dimensional genomic data using MapReduce based probabilistic neural network. Comput Methods Programs Biomed 2020; 195:105625. [PMID: 32650089 DOI: 10.1016/j.cmpb.2020.105625] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 06/19/2020] [Indexed: 06/11/2023]
Abstract
BACKGROUND The size of genomics data has been growing rapidly over the last decade. However, the conventional data analysis techniques are incapable of processing this huge amount of data. For the efficient processing of high dimensional datasets, it is essential to develop some new parallel methods. METHODS In this work, a novel distributed method is presented using Map-Reduce (MR)-based approach. The proposed algorithm consists of MR-based Fisher score (mrFScore), MR-based ReliefF (mrRefiefF), and MR-based probabilistic neural network (mrPNN) using a weighted chaotic grey wolf optimization technique (WCGWO). Here, mrFScore, and mrRefiefF methods are introduced for feature selection (FS), and mrPNN is implemented as an effective method for microarray classification. The proper choice of smoothing parameter (σ) plays a major role in the prediction ability of the PNN which is addressed using a novel technique namely, WCGWO. The WCGWO algorithm is used to select the optimal value of σ in PNN. RESULTS These algorithms have been successfully implemented using the Hadoop framework. The proposed model is tested by using three large and one small microarray datasets, and a comparative analysis is carried out with the existing FS and classification techniques. The results suggest that WCGWO-mrPNN can outperform other methods for high dimensional microarray classification. CONCLUSION The effectiveness of the proposed methods are compared with other existing schemes. Experimental results reveal that the proposed scheme is accurate and robust. Hence, the suggested scheme is considered to be a reliable framework for microarray data analysis. SIGNIFICANCE Such a method promotes the application of parallel programming using Hadoop cluster for the analysis of large-scale genomics data, particularly when the dataset is of high dimension.
Collapse
Affiliation(s)
| | - Swati Vipsita
- Department of Computer Science and Engineering, International Institute of Information Technology, Bhubaneswar, India.
| | - Amir H Gandomi
- Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, NSW 2007, Australia.
| | - Abhijeet Panda
- International Institute of Information Technology, Hyderabad 500032, India.
| | - Sambit Bakshi
- Department of Computer Science and Engineering, National Institute of Technology, Rourkela, India.
| | | |
Collapse
|
15
|
Mahato S, Goyal N, Ram D, Paul S. Detection of Depression and Scaling of Severity Using Six Channel EEG Data. J Med Syst 2020; 44:118. [PMID: 32435986 DOI: 10.1007/s10916-020-01573-y] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Accepted: 03/31/2020] [Indexed: 01/13/2023]
Abstract
Depression is a psychiatric problem which affects the growth of a person, like how a person thinks, feels and behaves. The major reason behind wrong diagnosis of depression is absence of any laboratory test for detection as well as severity scaling of depression. Any degradation in the working of the brain can be identified through change in the electroencephalogram (EEG) signal. Thus detection as well as severity scaling of depression is done in this study using EEG signal. In this study, features are extracted from the temporal region of the brain using six (FT7, FT8, T7, T8, TP7, TP8) channels. The linear features used are delta, theta, alpha, beta, gamma1 and gamma2 band power and their corresponding asymmetry as well as paired asymmetry. The non-linear features used are Sample Entropy (SampEn) and Detrended Fluctuation Analysis (DFA). The classifiers used are: Bagging along with three different kernel functions (Polynomial, Gaussian and Sigmoidal) of Support Vector Machine (SVM). Feature selection technique used is ReliefF. Highest classification accuracy of 96.02% and 79.19% was achieved for detection and severity scaling of depression using SVM (Gaussian Kernel Function) and ReliefF as feature selection. From the analysis, it was found that depression affects the temporal region of the brain (temporo-parietal region).It was also found that depression affects the higher frequency band features more and it affects each hemisphere differently. It can also be analysed that out of all the kernel of SVM, Gaussian kernel is more efficient to other kernels. Of all the features, combination of all paired asymmetry and asymmetry showed high classification accuracy (accuracy of 90.26% for detection of depression and accuracy of 75.31% for severity scaling).
Collapse
|
16
|
Huang W, Guo B, Shen Y, Tang X, Zhang T, Li D, Jiang Z. Sleep staging algorithm based on multichannel data adding and multifeature screening. Comput Methods Programs Biomed 2020; 187:105253. [PMID: 31812884 DOI: 10.1016/j.cmpb.2019.105253] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 11/27/2019] [Accepted: 11/29/2019] [Indexed: 06/10/2023]
Abstract
BACKGROUND Sleep staging is an important basis of sleep research, which is closely related to both normal sleep physiology and sleep disorders. Many studies have reported various sleep staging algorithms of which the framework generally consists of three parts: signal preprocessing, feature extraction and classification. However, there are few studies on the superposition of signals and feature screening for sleep staging. OBJECTIVE The objectives were to (1) Analyze the effective signal enhancement based on the superposition of homologous and heterogeneous signals, (2) Find a better way to use multichannel signals, (3) Study a systematic method of feature screening for sleep staging, and (4) Improve the performance of automatic sleep staging. METHODS In this paper, a novel method of signal preprocessing and feature screening was proposed. In the signal preprocessing, multi-channel signal superposition was applied to improve the effective information contained in the original signal. In the feature screening, 62 features were initially selected including the time-domain features, frequency-domain features and nonlinear features, and a ReliefF algorithm was employed to select 14 features highly correlated to sleep stages from the former 62 features. Then, Pearson correlation coefficients were used to remove 2 redundant features from the 14 features to eventually obtain 12 features. Next, with the aforementioned signal preprocessing method, the 12 selected features and a support vector machine (SVM) classifier were used for sleep staging based on thirty recordings. RESULTS Comparing the performance of sleep staging using different single-channel signals and different multi-channel superposition signals, we found that the best performance was obtained while using the superposition of two electroencephalogram (EEG) signals. The overall accuracies of sleep staging with 2-6 classes obtained by superposing the two EEG signals reach 98.28%, 95.50%, 94.28%, 93.08% and 92.34%, respectively, and the kappa coefficient of sleep staging with 6 classes reaches 84.07%. CONCLUSIONS Among the proposed sleep staging methods of using single-channel signal and multi-channel signal superposition, the best performance and consistency were obtained while using the superposition of two electroencephalogram (EEG) signals. The multichannel signal superposition method pointed out a valuable direction for improving the performance of automatic sleep staging in both theoretical research and engineering applications, and the proposed systematical feature screening method opened up a reasonable pathway for better selecting type and number of features for sleep staging.
Collapse
Affiliation(s)
- Wu Huang
- Sichuan University, Chengdu, SC, China
| | - Bing Guo
- Sichuan University, Chengdu, SC, China.
| | - Yan Shen
- Chengdu University of Information Technology, Chengdu, SC, China
| | - Xiangdong Tang
- Sleep Medicine Center, West China Hospital, Sichuan University,Chengdu, SC, China
| | - Tao Zhang
- Chengdu Techman Software Co.,Ltd, Chengdu, SC, China
| | - Dan Li
- Chengdu Techman Software Co.,Ltd, Chengdu, SC, China
| | | |
Collapse
|
17
|
Urbanowicz RJ, Meeker M, La Cava W, Olson RS, Moore JH. Relief-based feature selection: Introduction and review. J Biomed Inform 2018; 85:189-203. [PMID: 30031057 PMCID: PMC6299836 DOI: 10.1016/j.jbi.2018.07.014] [Citation(s) in RCA: 298] [Impact Index Per Article: 49.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 06/29/2018] [Accepted: 07/14/2018] [Indexed: 01/25/2023]
Abstract
Feature selection plays a critical role in biomedical data mining, driven by increasing feature dimensionality in target problems and growing interest in advanced but computationally expensive methodologies able to model complex associations. Specifically, there is a need for feature selection methods that are computationally efficient, yet sensitive to complex patterns of association, e.g. interactions, so that informative features are not mistakenly eliminated prior to downstream modeling. This paper focuses on Relief-based algorithms (RBAs), a unique family of filter-style feature selection algorithms that have gained appeal by striking an effective balance between these objectives while flexibly adapting to various data characteristics, e.g. classification vs. regression. First, this work broadly examines types of feature selection and defines RBAs within that context. Next, we introduce the original Relief algorithm and associated concepts, emphasizing the intuition behind how it works, how feature weights generated by the algorithm can be interpreted, and why it is sensitive to feature interactions without evaluating combinations of features. Lastly, we include an expansive review of RBA methodological research beyond Relief and its popular descendant, ReliefF. In particular, we characterize branches of RBA research, and provide comparative summaries of RBA algorithms including contributions, strategies, functionality, time complexity, adaptation to key data characteristics, and software availability.
Collapse
Affiliation(s)
- Ryan J Urbanowicz
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | | | - William La Cava
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | - Randal S Olson
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | - Jason H Moore
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
18
|
Urbanowicz RJ, Olson RS, Schmitt P, Meeker M, Moore JH. Benchmarking relief-based feature selection methods for bioinformatics data mining. J Biomed Inform 2018; 85:168-88. [PMID: 30030120 DOI: 10.1016/j.jbi.2018.07.015] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 06/30/2018] [Accepted: 07/14/2018] [Indexed: 11/23/2022]
Abstract
Modern biomedical data mining requires feature selection methods that can (1) be applied to large scale feature spaces (e.g. 'omics' data), (2) function in noisy problems, (3) detect complex patterns of association (e.g. gene-gene interactions), (4) be flexibly adapted to various problem domains and data types (e.g. genetic variants, gene expression, and clinical data) and (5) are computationally tractable. To that end, this work examines a set of filter-style feature selection algorithms inspired by the 'Relief' algorithm, i.e. Relief-Based algorithms (RBAs). We implement and expand these RBAs in an open source framework called ReBATE (Relief-Based Algorithm Training Environment). We apply a comprehensive genetic simulation study comparing existing RBAs, a proposed RBA called MultiSURF, and other established feature selection methods, over a variety of problems. The results of this study (1) support the assertion that RBAs are particularly flexible, efficient, and powerful feature selection methods that differentiate relevant features having univariate, multivariate, epistatic, or heterogeneous associations, (2) confirm the efficacy of expansions for classification vs. regression, discrete vs. continuous features, missing data, multiple classes, or class imbalance, (3) identify previously unknown limitations of specific RBAs, and (4) suggest that while MultiSURF∗ performs best for explicitly identifying pure 2-way interactions, MultiSURF yields the most reliable feature selection performance across a wide range of problem types.
Collapse
|
19
|
Jafari M, Ghavami B, Sattari V. A hybrid framework for reverse engineering of robust Gene Regulatory Networks. Artif Intell Med 2017; 79:15-27. [PMID: 28602483 DOI: 10.1016/j.artmed.2017.05.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2016] [Revised: 03/06/2017] [Accepted: 05/08/2017] [Indexed: 12/29/2022]
Abstract
The inference of Gene Regulatory Networks (GRNs) using gene expression data in order to detect the basic cellular processes is a key issue in biological systems. Inferring GRN correctly requires inferring predictor set accurately. In this paper, a fast and accurate predictor set inference framework which linearly combines some inference methods is proposed. The purpose of the combination of various methods is to increase the accuracy of inferred GRN. The proposed framework offers a linear weighted combination of Pearson Correlation Coefficient (PCC) and two different feature selection approaches, namely: Information Gain (IG) and ReliefF. In order to set the appropriate weights, Genetic Algorithm (GA) is used. Similarity measure is considered as fitness function to guide GA. At the end, based on the obtained weights, the best predictor set of GRN using three aforementioned inference methods is selected and the network topology is formed. Due to the huge volume of gene expression data, GRN inference algorithms should infer GRN at a reasonable runtime. Hence, a novel criterion is provided to evaluate GRNs based on runtime and accuracy. The simulation results using biological data indicate that the proposed framework is fast and more reliable compared to other recent methods [1-7].
Collapse
Affiliation(s)
- Mina Jafari
- Department of Computer Engineering, Shahid Bahonar University of Kerman, Kerman, Iran.
| | - Behnam Ghavami
- Department of Computer Engineering, Shahid Bahonar University of Kerman, Kerman, Iran.
| | - Vahid Sattari
- Department of Computer Engineering, Shahid Bahonar University of Kerman, Kerman, Iran.
| |
Collapse
|
20
|
Zhang J, Chen M, Zhao S, Hu S, Shi Z, Cao Y. ReliefF-Based EEG Sensor Selection Methods for Emotion Recognition. Sensors (Basel) 2016; 16:s16101558. [PMID: 27669247 PMCID: PMC5087347 DOI: 10.3390/s16101558] [Citation(s) in RCA: 102] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/29/2016] [Revised: 09/13/2016] [Accepted: 09/14/2016] [Indexed: 11/24/2022]
Abstract
Electroencephalogram (EEG) signals recorded from sensor electrodes on the scalp can directly detect the brain dynamics in response to different emotional states. Emotion recognition from EEG signals has attracted broad attention, partly due to the rapid development of wearable computing and the needs of a more immersive human-computer interface (HCI) environment. To improve the recognition performance, multi-channel EEG signals are usually used. A large set of EEG sensor channels will add to the computational complexity and cause users inconvenience. ReliefF-based channel selection methods were systematically investigated for EEG-based emotion recognition on a database for emotion analysis using physiological signals (DEAP). Three strategies were employed to select the best channels in classifying four emotional states (joy, fear, sadness and relaxation). Furthermore, support vector machine (SVM) was used as a classifier to validate the performance of the channel selection results. The experimental results showed the effectiveness of our methods and the comparison with the similar strategies, based on the F-score, was given. Strategies to evaluate a channel as a unity gave better performance in channel reduction with an acceptable loss of accuracy. In the third strategy, after adjusting channels’ weights according to their contribution to the classification accuracy, the number of channels was reduced to eight with a slight loss of accuracy (58.51% ± 10.05% versus the best classification accuracy 59.13% ± 11.00% using 19 channels). In addition, the study of selecting subject-independent channels, related to emotion processing, was also implemented. The sensors, selected subject-independently from frontal, parietal lobes, have been identified to provide more discriminative information associated with emotion processing, and are distributed symmetrically over the scalp, which is consistent with the existing literature. The results will make a contribution to the realization of a practical EEG-based emotion recognition system.
Collapse
Affiliation(s)
- Jianhai Zhang
- College of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, China.
| | - Ming Chen
- College of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, China.
| | - Shaokai Zhao
- College of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, China.
| | - Sanqing Hu
- College of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, China.
| | - Zhiguo Shi
- Department of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310012, China.
| | - Yu Cao
- Department of Computer Science, The University of Massachusetts Lowell, Lowell, MA 01854, USA.
| |
Collapse
|
21
|
Kumari P, Nath A, Chaube R. Identification of human drug targets using machine-learning algorithms. Comput Biol Med 2014; 56:175-81. [PMID: 25437231 DOI: 10.1016/j.compbiomed.2014.11.008] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2014] [Revised: 11/01/2014] [Accepted: 11/06/2014] [Indexed: 01/29/2023]
Abstract
Identification of potential drug targets is a crucial task in the drug-discovery pipeline. Successful identification of candidate drug targets in entire genomes is very useful, and computational prediction methods can speed up this process. In the current work we have developed a sequence-based prediction method for the successful identification and discrimination of human drug target proteins, from human non-drug target proteins. The training features include sequence-based features, such as amino acid composition, amino acid property group composition, and dipeptide composition for generating predictive models. The classification of human drug target proteins presents a classic example of class imbalance. We have addressed this issue by using SMOTE (Synthetic Minority Over-sampling Technique) as a preprocessing step, for balancing the training data with a ratio of 1:1 between drug targets (minority samples) and non-drug targets (majority samples). Using ensemble classification learning method-Rotation Forest and ReliefF feature-selection technique for selecting the optimal subset of salient features, the best model with selected features can achieve 87.1% sensitivity, 83.6% specificity, and 85.3% accuracy, with 0.71 Matthews correlation coefficient (mcc) on a tenfold stratified cross-validation test. The subset of identified optimal features may help in assessing the compositional patterns in human drug targets. For further validation, using a rigorous leave-one-out cross-validation test, the model achieved 88.1% sensitivity, 83.0% specificity, 85.5% accuracy, and 0.712 mcc. The proposed method was tested on a second dataset, for which the current pipeline gave promising results. We suggest that the present approach can be applied successfully as a complementary tool to existing methods for novel drug target prediction.
Collapse
Affiliation(s)
- Priyanka Kumari
- Bioinformatics Section, Mahila Mahavidyalaya, Banaras Hindu University, Varanasi 221005, India
| | - Abhigyan Nath
- Bioinformatics Section, Mahila Mahavidyalaya, Banaras Hindu University, Varanasi 221005, India
| | - Radha Chaube
- Zoology/Bioinformatic Section, Mahila Mahavidyalaya, Banaras Hindu University, Varanasi 221005, India.
| |
Collapse
|
22
|
Scott IM, Lin W, Liakata M, Wood JE, Vermeer CP, Allaway D, Ward JL, Draper J, Beale MH, Corol DI, Baker JM, King RD. Merits of random forests emerge in evaluation of chemometric classifiers by external validation. Anal Chim Acta 2013; 801:22-33. [PMID: 24139571 DOI: 10.1016/j.aca.2013.09.027] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2013] [Revised: 09/06/2013] [Accepted: 09/14/2013] [Indexed: 10/26/2022]
Abstract
Real-world applications will inevitably entail divergence between samples on which chemometric classifiers are trained and the unknowns requiring classification. This has long been recognized, but there is a shortage of empirical studies on which classifiers perform best in 'external validation' (EV), where the unknown samples are subject to sources of variation relative to the population used to train the classifier. Survey of 286 classification studies in analytical chemistry found only 6.6% that stated elements of variance between training and test samples. Instead, most tested classifiers using hold-outs or resampling (usually cross-validation) from the same population used in training. The present study evaluated a wide range of classifiers on NMR and mass spectra of plant and food materials, from four projects with different data properties (e.g., different numbers and prevalence of classes) and classification objectives. Use of cross-validation was found to be optimistic relative to EV on samples of different provenance to the training set (e.g., different genotypes, different growth conditions, different seasons of crop harvest). For classifier evaluations across the diverse tasks, we used ranks-based non-parametric comparisons, and permutation-based significance tests. Although latent variable methods (e.g., PLSDA) were used in 64% of the surveyed papers, they were among the less successful classifiers in EV, and orthogonal signal correction was counterproductive. Instead, the best EV performances were obtained with machine learning schemes that coped with the high dimensionality (914-1898 features). Random forests confirmed their resilience to high dimensionality, as best overall performers on the full data, despite being used in only 4.5% of the surveyed papers. Most other machine learning classifiers were improved by a feature selection filter (ReliefF), but still did not out-perform random forests.
Collapse
Affiliation(s)
- I M Scott
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, SY23 3FG, UK.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|