101
|
Mochammad S, Noh Y, Kang YJ, Park S, Lee J, Chin S. Multi-Filter Clustering Fusion for Feature Selection in Rotating Machinery Fault Classification. SENSORS 2022; 22:s22062192. [PMID: 35336363 PMCID: PMC8950067 DOI: 10.3390/s22062192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 03/08/2022] [Accepted: 03/09/2022] [Indexed: 02/04/2023]
Abstract
In the fault classification process, filter methods that sequentially remove unnecessary features have long been studied. However, the existing filter methods do not have guidelines on which, and how many, features are needed. This study developed a multi-filter clustering fusion (MFCF) technique, to effectively and efficiently select features. In the MFCF process, a multi-filter method combining existing filter methods is first applied for feature clustering; then, key features are automatically selected. The union of key features is utilized to find all potentially important features, and an exhaustive search is used to obtain the best combination of selected features to maximize the accuracy of the classification model. In the rotating machinery examples, fault classification models using MFCF were generated to classify normal and abnormal conditions of rotational machinery. The obtained results demonstrated that classification models using MFCF provide good accuracy, efficiency, and robustness in the fault classification of rotational machinery.
Collapse
Affiliation(s)
- Solichin Mochammad
- School of Mechanical Engineering, Pusan National University, Busan 46241, Korea;
- Department of Mechanical Engineering, Institut Teknologi Sepuluh Nopember, Surabaya 60111, Indonesia
| | - Yoojeong Noh
- School of Mechanical Engineering, Pusan National University, Busan 46241, Korea;
- Correspondence:
| | - Young-Jin Kang
- Research Institute of Mechanical Technology, Pusan National University, Busan 46241, Korea;
| | - Sunhwa Park
- H&A Research Center, LG Electronics, Changwon 51554, Korea; (S.P.); (J.L.); (S.C.)
| | - Jangwoo Lee
- H&A Research Center, LG Electronics, Changwon 51554, Korea; (S.P.); (J.L.); (S.C.)
| | - Simon Chin
- H&A Research Center, LG Electronics, Changwon 51554, Korea; (S.P.); (J.L.); (S.C.)
| |
Collapse
|
102
|
Kundu R, Chattopadhyay S, Cuevas E, Sarkar R. AltWOA: Altruistic Whale Optimization Algorithm for feature selection on microarray datasets. Comput Biol Med 2022; 144:105349. [PMID: 35303580 DOI: 10.1016/j.compbiomed.2022.105349] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Revised: 02/22/2022] [Accepted: 02/22/2022] [Indexed: 12/15/2022]
Abstract
The data-driven modern era has enabled the collection of large amounts of biomedical and clinical data. DNA microarray gene expression datasets have mainly gained significant attention to the research community owing to their ability to identify diseases through the "bio-markers" or specific alterations in the gene sequence that represent that particular disease (for example, different types of cancer). However, gene expression datasets are very high-dimensional, while only a few of those are "bio-markers". Meta-heuristic-based feature selection effectively filters out only the relevant genes from a large set of attributes efficiently to reduce data storage and computation requirements. To this end, in this paper, we propose an Altruistic Whale Optimization Algorithm (AltWOA) for the feature selection problem in high-dimensional microarray data. AltWOA is an improvement on the basic Whale Optimization Algorithm. We embed the concept of altruism in the whale population to help efficient propagation of candidate solutions that can reach the global optima over the iterations. Evaluation of the proposed method on eight high dimensional microarray datasets reveals the superiority of AltWOA compared to popular and classical techniques in the literature on the same datasets both in terms of accuracy and the final number of features selected. The relevant codes for the proposed approach are available publicly at https://github.com/Rohit-Kundu/AltWOA.
Collapse
Affiliation(s)
- Rohit Kundu
- Department of Electrical Engineering, Jadavpur University, Kolkata, 700032, India.
| | - Soham Chattopadhyay
- Department of Electrical Engineering, Jadavpur University, Kolkata, 700032, India.
| | - Erik Cuevas
- Departamento de Electrónica, Universidad de Guadalajara, CUCEI, Av. Revolución 1500, Guadalajara, Jal, Mexico.
| | - Ram Sarkar
- Department of Computer Science & Engineering, Jadavpur University, Kolkata, 700032, India.
| |
Collapse
|
103
|
Eckermann HA, Ou Y, Lahti L, Weerth C. Can gut microbiota throughout the first 10 years of life predict executive functioning in childhood? Dev Psychobiol 2022; 64:e22226. [DOI: 10.1002/dev.22226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Revised: 11/12/2021] [Accepted: 11/15/2021] [Indexed: 11/08/2022]
Affiliation(s)
- Henrik Andreas Eckermann
- Department of Cognitive Neuroscience Cognition and Behavior Radboud University Medical Center Donders Institute for Brain Nijmegen The Netherlands
| | - Yangwenshan Ou
- Laboratory of Microbiology Wageningen University Wageningen The Netherlands
| | - Leo Lahti
- Department of Computing University of Turku Turku Finland
| | - Carolina Weerth
- Department of Cognitive Neuroscience Cognition and Behavior Radboud University Medical Center Donders Institute for Brain Nijmegen The Netherlands
| |
Collapse
|
104
|
Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Comput Stat 2022. [DOI: 10.1007/s00180-022-01207-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
AbstractSince most machine learning (ML) algorithms are designed for numerical inputs, efficiently encoding categorical variables is a crucial aspect in data analysis. A common problem are high cardinality features, i.e. unordered categorical predictor variables with a high number of levels. We study techniques that yield numeric representations of categorical variables which can then be used in subsequent ML applications. We focus on the impact of these techniques on a subsequent algorithm’s predictive performance, and—if possible—derive best practices on when to use which technique. We conducted a large-scale benchmark experiment, where we compared different encoding strategies together with five ML algorithms (lasso, random forest, gradient boosting, k-nearest neighbors, support vector machine) using datasets from regression, binary- and multiclass–classification settings. In our study, regularized versions of target encoding (i.e. using target predictions based on the feature levels in the training set as a new numerical feature) consistently provided the best results. Traditionally widely used encodings that make unreasonable assumptions to map levels to integers (e.g. integer encoding) or to reduce the number of levels (possibly based on target information, e.g. leaf encoding) before creating binary indicator variables (one-hot or dummy encoding) were not as effective in comparison.
Collapse
|
105
|
Sen R, Goswami S, Mandal AK, Chakraborty B. An effective feature subset selection approach based on Jeffries-Matusita distance for multiclass problems. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-202796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Jeffries-Matusita (JM) distance, a transformation of the Bhattacharyya distance, is a widely used measure of the spectral separability distance between the two class density functions and is generally used as a class separability measure. It can be considered to have good potential to be used for evaluation of the effectiveness of a feature in discriminating two classes. The capability of JM distance as a ranking based feature selection technique for binary classification problems has been verified in some research works as well as in our earlier work. It was found by our simulation experiments with benchmark data sets that JM distance works equally well compared to other popular feature ranking methods based on mutual information, information gain or Relief. Extension of JM distance measure for feature ranking in multiclass problems has also been reported in the literature. But all of them are basically rank based approaches which deliver the ranking of the features and do not automatically produce the final optimal feature subset. In this work, a novel heuristic approach for finding out the optimum feature subset from JM distance based ranked feature lists for multiclass problems have been developed without explicitly using any specific search technique. The proposed approach integrates the extension of JM measure for multiclass problems and the selection of the final optimal feature subset in a unified process. The performance of the proposed algorithm has been evaluated by simulation experiments with benchmark data sets in comparison with two other previously developed multiclass JM distance measures (weighted average JM distance and another multiclass extension equivalent to Bhattacharyya bound) and some other popular filter based feature ranking algorithms. It is found that the proposed algorithm performs better in terms of classification accuracy, F-measure, AUC with a reduced set of features and computational cost.
Collapse
Affiliation(s)
- Rikta Sen
- Graduate School of Software and Information Science, Iwate Prefectural University, Iwate, Japan
| | - Saptarsi Goswami
- Bangabasi Morning College, University of Calcutta, Kolkata, India
| | - Ashis Kumar Mandal
- Graduate School of Software and Information Science, Iwate Prefectural University, Iwate, Japan
| | - Basabi Chakraborty
- Faculty of Software and Information Science, Iwate Prefectural University, Iwate, Japan
| |
Collapse
|
106
|
Combination of Reduction Detection Using TOPSIS for Gene Expression Data Analysis. BIG DATA AND COGNITIVE COMPUTING 2022. [DOI: 10.3390/bdcc6010024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
In high-dimensional data analysis, Feature Selection (FS) is one of the most fundamental issues in machine learning and requires the attention of researchers. These datasets are characterized by huge space due to a high number of features, out of which only a few are significant for analysis. Thus, significant feature extraction is crucial. There are various techniques available for feature selection; among them, the filter techniques are significant in this community, as they can be used with any type of learning algorithm and drastically lower the running time of optimization algorithms and improve the performance of the model. Furthermore, the application of a filter approach depends on the characteristics of the dataset as well as on the machine learning model. Thus, to avoid these issues in this research, a combination of feature reduction (CFR) is considered designing a pipeline of filter approaches for high-dimensional microarray data classification. Considering four filter approaches, sixteen combinations of pipelines are generated. The feature subset is reduced in different levels, and ultimately, the significant feature set is evaluated. The pipelined filter techniques are Correlation-Based Feature Selection (CBFS), Chi-Square Test (CST), Information Gain (InG), and Relief Feature Selection (RFS), and the classification techniques are Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), and k-Nearest Neighbor (k-NN). The performance of CFR depends highly on the datasets as well as on the classifiers. Thereafter, the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) method is used for ranking all reduction combinations and evaluating the superior filter combination among all.
Collapse
|
107
|
Zhou L, Wang H. A Combined Feature Screening Approach of Random Forest and Filter-based Methods for Ultra-high Dimensional Data. Curr Bioinform 2022. [DOI: 10.2174/1574893617666220221120618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Various feature (variable) screening approaches have been proposed in the past decade to mitigate the impact of ultra-high dimensionality in classification and regression problems, including filter based methods such as sure indepen¬dence screening, and wrapper based methods such random forest. However, the former type of methods rely heavily on strong modelling assumptions while the latter ones requires an adequate sample size to make the data speak for themselves. These require¬ments can seldom be met in biochemical studies in cases where we have only access to ultra-high dimensional data with a complex structure and a small number of observations.
Objective:
In this research, we want to investigate the possibility of combing both filter based screening methods and random forest based screening methods in the regression context.
Method:
We have combined four state-of-art filter approaches, namely, sure independence screening (SIS) , robust rank corre¬lation based screening (RRCS), high dimensional ordinary least squares projection (HOLP) and a model free sure independence screening procedure based on the distance correlation (DCSIS) from the statistical community with a random forest based Boruta screening method from the machine learning community for regression problems.
Result:
Among all combined methods, RF-DCSIS performs better than the other methods in terms of screening accuracy and prediction capability on the simulated scenarios and real benchmark datasets.
Conclusion:
By empirical study from both extensive simulation and real data, we have shown that both filter based screening and random forest based screening have their pros and cons while a combination of both may lead to a better feature screening result and prediction capability
Keywords:
feature screening, filter-based method, ultra-high dimensional data, variable selection, random forest,RF-DCSIS
Collapse
Affiliation(s)
- Lifeng Zhou
- School of Economics and Management, Changsha University, China
| | - Hong Wang
- School of Mathematics and Statistics, Central South University, China
| |
Collapse
|
108
|
Very High-Resolution Imagery and Machine Learning for Detailed Mapping of Riparian Vegetation and Substrate Types. REMOTE SENSING 2022. [DOI: 10.3390/rs14040954] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Riparian zones fulfill diverse ecological and economic functions. Sustainable management requires detailed spatial information about vegetation and hydromorphological properties. In this study, we propose a machine learning classification workflow to map classes of the thematic levels Basic surface types (BA), Vegetation units (VE), Dominant stands (DO) and Substrate types (SU) based on multispectral imagery from an unmanned aerial system (UAS). A case study was carried out in Emmericher Ward on the river Rhine, Germany. The results showed that: (I) In terms of overall accuracy, classification results decreased with increasing detail of classes from BA (88.9%) and VE (88.4%) to DO (74.8%) or SU (62%), respectively. (II) The use of Support Vector Machines and Extreme Gradient Boost algorithms did not increase classification performance in comparison to Random Forest. (III) Based on probability maps, classification performance was lower in areas of shaded vegetation and in the transition zones. (IV) In order to cover larger areas, a gyrocopter can be used applying the same workflow and achieving comparable results as by UAS for thematic levels BA, VE and homogeneous classes covering larger areas. The generated classification maps are a valuable tool for ecologically integrated water management.
Collapse
|
109
|
Jaddi NS, Saniee Abadeh M. Cell separation algorithm with enhanced search behaviour in miRNA feature selection for cancer diagnosis. INFORM SYST 2022. [DOI: 10.1016/j.is.2021.101906] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
110
|
Siegismund D, Fassler M, Heyse S, Steigele S. Benchmarking feature selection methods for compressing image information in high-content screening. SLAS Technol 2022; 27:85-93. [DOI: 10.1016/j.slast.2021.10.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
111
|
Li L, Liu ZP. A connected network-regularized logistic regression model for feature selection. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02877-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
112
|
Abstract
Conventional approaches to modelling driver risk have incorporated measures such as driver gender, age, place of residence, vehicle model, and annual miles driven. However, in the last decade, research has shown that assessing a driver’s crash risk based on these variables does not go far enough—especially as advanced technology changes today’s vehicles, as well as the role and behavior of the driver. There is growing recognition that actual driver usage patterns and driving behavior, when it can be properly captured in modelling risk, offers higher accuracy and more individually tailored projections. However, several challenges make this difficult. These challenges include accessing the right types of data, dealing with high-dimensional data, and identifying the underlying structure of the variance in driving behavior. There is also the challenge of how to identify key variables for detecting and predicting risk, and how to combine them in predictive algorithms. This paper proposes a systematic feature extraction and selection framework for building Comprehensive Driver Profiles that serves as a foundation for driver behavior analysis and building whole driver profiles. Features are extracted from raw data using statistical feature extraction techniques, and a hybrid feature selection algorithm is used to select the best driver profile feature set based on outcomes of interest such as crash risk. It can give rise to individualized detection and prediction of risk, and can also be used to identify types of drivers who exhibit similar patterns of driving and vehicle/technology usage. The developed framework is applied to a naturalistic driving dataset—NEST, derived from the larger SHRP2 naturalistic driving study to illustrate the types of information about driver behavior that can be harnessed—as well as some of the important applications that can be derived from it.
Collapse
|
113
|
Abstract
OBJECTIVES A critical problem in radiomic studies is the high dimensionality of the datasets, which stems from small sample sizes and many generic features extracted from the volume of interest. Therefore, feature selection methods are used, which aim to remove redundant as well as irrelevant features. Because there are many feature selection algorithms, it is key to understand their performance in the context of radiomics. MATERIALS AND METHODS A total of 29 feature selection algorithms and 10 classifiers were evaluated on 10 publicly available radiomic datasets. Feature selection methods were compared for training times, for the stability of the selected features, and for ranking, which measures the pairwise similarity of the methods. In addition, the predictive performance of the algorithms was measured by utilizing the area under the receiver operating characteristic curve of the best-performing classifier. RESULTS Feature selections differed largely in training times as well as stability and similarity. No single method was able to outperform another one consistently in predictive performance. CONCLUSION Our results indicated that simpler methods are more stable than complex ones and do not perform worse in terms of area under the receiver operating characteristic curve. Analysis of variance, least absolute shrinkage and selection operator, and minimum redundancy, maximum relevance ensemble appear to be good choices for radiomic studies in terms of predictive performance, as they outperformed most other feature selection methods.
Collapse
|
114
|
Benchmarking Eliminative Radiomic Feature Selection for Head and Neck Lymph Node Classification. Cancers (Basel) 2022; 14:cancers14030477. [PMID: 35158745 PMCID: PMC8833684 DOI: 10.3390/cancers14030477] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 01/13/2022] [Accepted: 01/16/2022] [Indexed: 12/12/2022] Open
Abstract
Simple Summary Pathologic cervical lymph nodes (LN) in head and neck squamous cell carcinoma (HNSCC) deteriorate prognosis. Current radiologic criteria for LN-classification are primarily shape-based. Radiomics is an emerging data-driven technique that aids in extraction, processing and analyzing features and is potentially capable of LN-classification. Currently available sets of features are too complex for clinical applicability. We identified the combination of sparse discriminant analysis and genetic algorithms as a potentially useful algorithm for eliminative feature selection. In this retrospective, cohort-study, from 252 LNs with over extracted 30,000 features, this algorithm retained a classification accuracy of up to 90% with only 10% of the original number of features. From a clinical perspective, the selected features appeared plausible and potentially capable of correctly classifying LNs. Both the identified algorithm and features need further exploration of their potential as prospective classifiers for LNs in HNSCC. Abstract In head and neck squamous cell carcinoma (HNSCC) pathologic cervical lymph nodes (LN) remain important negative predictors. Current criteria for LN-classification in contrast-enhanced computed-tomography scans (contrast-CT) are shape-based; contrast-CT imagery allows extraction of additional quantitative data (“features”). The data-driven technique to extract, process, and analyze features from contrast-CTs is termed “radiomics”. Extracted features from contrast-CTs at various levels are typically redundant and correlated. Current sets of features for LN-classification are too complex for clinical application. Effective eliminative feature selection (EFS) is a crucial preprocessing step to reduce the complexity of sets identified. We aimed at exploring EFS-algorithms for their potential to identify sets of features, which were as small as feasible and yet retained as much accuracy as possible for LN-classification. In this retrospective cohort-study, which adhered to the STROBE guidelines, in total 252 LNs were classified as “non-pathologic” (n = 70), “pathologic” (n = 182) or “pathologic with extracapsular spread” (n = 52) by two experienced head-and-neck radiologists based on established criteria which served as a reference. The combination of sparse discriminant analysis and genetic optimization retained up to 90% of the classification accuracy with only 10% of the original numbers of features. From a clinical perspective, the selected features appeared plausible and potentially capable of correctly classifying LNs. Both the identified EFS-algorithm and the identified features need further exploration to assess their potential to prospectively classify LNs in HNSCC.
Collapse
|
115
|
Jiao Z, Chen S, Shi H, Xu J. Multi-Modal Feature Selection with Feature Correlation and Feature Structure Fusion for MCI and AD Classification. Brain Sci 2022; 12:80. [PMID: 35053823 PMCID: PMC8773824 DOI: 10.3390/brainsci12010080] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Revised: 12/24/2021] [Accepted: 12/29/2021] [Indexed: 11/16/2022] Open
Abstract
Feature selection for multiple types of data has been widely applied in mild cognitive impairment (MCI) and Alzheimer's disease (AD) classification research. Combining multi-modal data for classification can better realize the complementarity of valuable information. In order to improve the classification performance of feature selection on multi-modal data, we propose a multi-modal feature selection algorithm using feature correlation and feature structure fusion (FC2FS). First, we construct feature correlation regularization by fusing a similarity matrix between multi-modal feature nodes. Then, based on manifold learning, we employ feature matrix fusion to construct feature structure regularization, and learn the local geometric structure of the feature nodes. Finally, the two regularizations are embedded in a multi-task learning model that introduces low-rank constraint, the multi-modal features are selected, and the final features are linearly fused and input into a support vector machine (SVM) for classification. Different controlled experiments were set to verify the validity of the proposed method, which was applied to MCI and AD classification. The accuracy of normal controls versus Alzheimer's disease, normal controls versus late mild cognitive impairment, normal controls versus early mild cognitive impairment, and early mild cognitive impairment versus late mild cognitive impairment achieve 91.85 ± 1.42%, 85.33 ± 2.22%, 78.29 ± 2.20%, and 77.67 ± 1.65%, respectively. This method makes up for the shortcomings of the traditional multi-modal feature selection based on subjects and fully considers the relationship between feature nodes and the local geometric structure of feature space. Our study not only enhances the interpretation of feature selection but also improves the classification performance, which has certain reference values for the identification of MCI and AD.
Collapse
Affiliation(s)
- Zhuqing Jiao
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou 213164, China; (Z.J.); (S.C.)
| | - Siwei Chen
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou 213164, China; (Z.J.); (S.C.)
| | - Haifeng Shi
- Department of Radiology, Changzhou Second People’s Hospital, Nanjing Medical University, Changzhou 213003, China
- School of Microelectronics and Control Engineering, Changzhou University, Changzhou 213164, China
| | - Jia Xu
- School of Medicine, Ningbo University, Ningbo 315211, China
| |
Collapse
|
116
|
How can dense results be differentiated in comprehensive evaluations? A hybrid information filtering model. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
117
|
Evaluation of Feature Selection Methods on Psychosocial Education Data Using Additive Ratio Assessment. ELECTRONICS 2021. [DOI: 10.3390/electronics11010114] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Artificial intelligence, particularly machine learning, is the fastest-growing research trend in educational fields. Machine learning shows an impressive performance in many prediction models, including psychosocial education. The capability of machine learning to discover hidden patterns in large datasets encourages researchers to invent data with high-dimensional features. In contrast, not all features are needed by machine learning, and in many cases, high-dimensional features decrease the performance of machine learning. The feature selection method is one of the appropriate approaches to reducing the features to ensure machine learning works efficiently. Various selection methods have been proposed, but research to determine the essential subset feature in psychosocial education has not been established thus far. This research investigated and proposed methods to determine the best feature selection method in the domain of psychosocial education. We used a multi-criteria decision system (MCDM) approach with Additive Ratio Assessment (ARAS) to rank seven feature selection methods. The proposed model evaluated the best feature selection method using nine criteria from the performance metrics provided by machine learning. The experimental results showed that the ARAS is promising for evaluating and recommending the best feature selection method for psychosocial education data using the teacher’s psychosocial risk levels dataset.
Collapse
|
118
|
Pes B, Lai G. Cost-sensitive learning strategies for high-dimensional and imbalanced data: a comparative study. PeerJ Comput Sci 2021; 7:e832. [PMID: 35036539 PMCID: PMC8725666 DOI: 10.7717/peerj-cs.832] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Accepted: 12/06/2021] [Indexed: 05/28/2023]
Abstract
High dimensionality and class imbalance have been largely recognized as important issues in machine learning. A vast amount of literature has indeed investigated suitable approaches to address the multiple challenges that arise when dealing with high-dimensional feature spaces (where each problem instance is described by a large number of features). As well, several learning strategies have been devised to cope with the adverse effects of imbalanced class distributions, which may severely impact on the generalization ability of the induced models. Nevertheless, although both the issues have been largely studied for several years, they have mostly been addressed separately, and their combined effects are yet to be fully understood. Indeed, little research has been so far conducted to investigate which approaches might be best suited to deal with datasets that are, at the same time, high-dimensional and class-imbalanced. To make a contribution in this direction, our work presents a comparative study among different learning strategies that leverage both feature selection, to cope with high dimensionality, as well as cost-sensitive learning methods, to cope with class imbalance. Specifically, different ways of incorporating misclassification costs into the learning process have been explored. Also different feature selection heuristics have been considered, both univariate and multivariate, to comparatively evaluate their effectiveness on imbalanced data. The experiments have been conducted on three challenging benchmarks from the genomic domain, gaining interesting insight into the beneficial impact of combining feature selection and cost-sensitive learning, especially in the presence of highly skewed data distributions.
Collapse
Affiliation(s)
- Barbara Pes
- Dipartimento di Matematica e Informatica, Università degli Studi di Cagliari, Cagliari, Italy
| | - Giuseppina Lai
- Dipartimento di Matematica e Informatica, Università degli Studi di Cagliari, Cagliari, Italy
| |
Collapse
|
119
|
Bhattacharjee S, Ikromjanov K, Carole KS, Madusanka N, Cho NH, Hwang YB, Sumon RI, Kim HC, Choi HK. Cluster Analysis of Cell Nuclei in H&E-Stained Histological Sections of Prostate Cancer and Classification Based on Traditional and Modern Artificial Intelligence Techniques. Diagnostics (Basel) 2021; 12:diagnostics12010015. [PMID: 35054182 PMCID: PMC8774423 DOI: 10.3390/diagnostics12010015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 12/14/2021] [Accepted: 12/20/2021] [Indexed: 11/16/2022] Open
Abstract
Biomarker identification is very important to differentiate the grade groups in the histopathological sections of prostate cancer (PCa). Assessing the cluster of cell nuclei is essential for pathological investigation. In this study, we present a computer-based method for cluster analyses of cell nuclei and performed traditional (i.e., unsupervised method) and modern (i.e., supervised method) artificial intelligence (AI) techniques for distinguishing the grade groups of PCa. Two datasets on PCa were collected to carry out this research. Histopathology samples were obtained from whole slides stained with hematoxylin and eosin (H&E). In this research, state-of-the-art approaches were proposed for color normalization, cell nuclei segmentation, feature selection, and classification. A traditional minimum spanning tree (MST) algorithm was employed to identify the clusters and better capture the proliferation and community structure of cell nuclei. K-medoids clustering and stacked ensemble machine learning (ML) approaches were used to perform traditional and modern AI-based classification. The binary and multiclass classification was derived to compare the model quality and results between the grades of PCa. Furthermore, a comparative analysis was carried out between traditional and modern AI techniques using different performance metrics (i.e., statistical parameters). Cluster features of the cell nuclei can be useful information for cancer grading. However, further validation of cluster analysis is required to accomplish astounding classification results.
Collapse
Affiliation(s)
| | - Kobiljon Ikromjanov
- Department of Digital Anti-Aging Healthcare, u-AHRC, Inje University, Gimhae 50834, Korea; (K.I.); (K.S.C.); (Y.-B.H.); (R.I.S.); (H.-C.K.)
| | - Kouayep Sonia Carole
- Department of Digital Anti-Aging Healthcare, u-AHRC, Inje University, Gimhae 50834, Korea; (K.I.); (K.S.C.); (Y.-B.H.); (R.I.S.); (H.-C.K.)
| | - Nuwan Madusanka
- School of Computing & IT, Sri Lanka Technological Campus, Paduka 10500, Sri Lanka;
| | - Nam-Hoon Cho
- Department of Pathology, Yonsei University Hospital, Seoul 03722, Korea;
| | - Yeong-Byn Hwang
- Department of Digital Anti-Aging Healthcare, u-AHRC, Inje University, Gimhae 50834, Korea; (K.I.); (K.S.C.); (Y.-B.H.); (R.I.S.); (H.-C.K.)
| | - Rashadul Islam Sumon
- Department of Digital Anti-Aging Healthcare, u-AHRC, Inje University, Gimhae 50834, Korea; (K.I.); (K.S.C.); (Y.-B.H.); (R.I.S.); (H.-C.K.)
| | - Hee-Cheol Kim
- Department of Digital Anti-Aging Healthcare, u-AHRC, Inje University, Gimhae 50834, Korea; (K.I.); (K.S.C.); (Y.-B.H.); (R.I.S.); (H.-C.K.)
| | - Heung-Kook Choi
- Department of Computer Engineering, u-AHRC, Inje University, Gimhae 50834, Korea;
- Correspondence: ; Tel.: +82-10-6733-3437
| |
Collapse
|
120
|
Syed FH, Tahir MA, Rafi M, Shahab MD. Feature selection for semi-supervised multi-target regression using genetic algorithm. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02291-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
|
121
|
Yang P, Huang H, Liu C. Feature selection revisited in the single-cell era. Genome Biol 2021; 22:321. [PMID: 34847932 PMCID: PMC8638336 DOI: 10.1186/s13059-021-02544-3] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Accepted: 11/15/2021] [Indexed: 12/13/2022] Open
Abstract
Recent advances in single-cell biotechnologies have resulted in high-dimensional datasets with increased complexity, making feature selection an essential technique for single-cell data analysis. Here, we revisit feature selection techniques and summarise recent developments. We review their application to a range of single-cell data types generated from traditional cytometry and imaging technologies and the latest array of single-cell omics technologies. We highlight some of the challenges and future directions and finally consider their scalability and make general recommendations on each type of feature selection method. We hope this review stimulates future research and application of feature selection in the single-cell era.
Collapse
Affiliation(s)
- Pengyi Yang
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW, 2006, Australia.
- Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW, 2145, Australia.
- Charles Perkins Centre, University of Sydney, Sydney, NSW, 2006, Australia.
| | - Hao Huang
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW, 2006, Australia
- Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW, 2145, Australia
| | - Chunlei Liu
- Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW, 2145, Australia
| |
Collapse
|
122
|
López-Dorado A, Pérez J, Rodrigo M, Miguel-Jiménez J, Ortiz M, de Santiago L, López-Guillén E, Blanco R, Cavalliere C, Morla EMS, Boquete L, Garcia-Martin E. Diagnosis of multiple sclerosis using multifocal ERG data feature fusion. AN INTERNATIONAL JOURNAL ON INFORMATION FUSION 2021; 76:157-167. [PMID: 34867127 PMCID: PMC8475498 DOI: 10.1016/j.inffus.2021.05.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2020] [Revised: 11/15/2020] [Accepted: 05/17/2021] [Indexed: 05/16/2023]
Abstract
The purpose of this paper is to implement a computer-aided diagnosis (CAD) system for multiple sclerosis (MS) based on analysing the outer retina as assessed by multifocal electroretinograms (mfERGs). MfERG recordings taken with the RETI-port/scan 21 (Roland Consult) device from 15 eyes of patients diagnosed with incipient relapsing-remitting MS and without prior optic neuritis, and from 6 eyes of control subjects, are selected. The mfERG recordings are grouped (whole macular visual field, five rings, and four quadrants). For each group, the correlation with a normative database of adaptively filtered signals, based on empirical model decomposition (EMD) and three features from the continuous wavelet transform (CWT) domain, are obtained. Of the initial 40 features, the 4 most relevant are selected in two stages: a) using a filter method and b) using a wrapper-feature selection method. The Support Vector Machine (SVM) is used as a classifier. With the optimal CAD configuration, a Matthews correlation coefficient value of 0.89 (accuracy = 0.95, specificity = 1.0 and sensitivity = 0.93) is obtained. This study identified an outer retina dysfunction in patients with recent MS by analysing the outer retina responses in the mfERG and employing an SVM as a classifier. In conclusion, a promising new electrophysiological-biomarker method based on feature fusion for MS diagnosis was identified.
Collapse
Affiliation(s)
- A. López-Dorado
- Biomedical Engineering Group, Department of Electronics, University of Alcalá, Alcalá de Henares, Spain
| | - J. Pérez
- Department of Ophthalmology, Miguel Servet University Hospital, Zaragoza, Spain
- Aragon Institute for Health Research (IIS Aragon). Miguel Servet Ophthalmology Innovation and Research Group (GIMSO), University of Zaragoza, Spain
| | - M.J. Rodrigo
- Department of Ophthalmology, Miguel Servet University Hospital, Zaragoza, Spain
- Aragon Institute for Health Research (IIS Aragon). Miguel Servet Ophthalmology Innovation and Research Group (GIMSO), University of Zaragoza, Spain
- RETICS: Thematic Networks for Co-operative Research in Health for Ocular Diseases, Spain
| | - J.M. Miguel-Jiménez
- Biomedical Engineering Group, Department of Electronics, University of Alcalá, Alcalá de Henares, Spain
| | - M. Ortiz
- School of Physics, University of Melbourne, VIC 3010, Australia
| | - L. de Santiago
- Biomedical Engineering Group, Department of Electronics, University of Alcalá, Alcalá de Henares, Spain
| | - E. López-Guillén
- Biomedical Engineering Group, Department of Electronics, University of Alcalá, Alcalá de Henares, Spain
| | - R. Blanco
- Department of Surgery, Medical and Social Sciences, University of Alcalá, Alcalá de Henares, Spain
- RETICS: Thematic Networks for Co-operative Research in Health for Ocular Diseases, Spain
| | - C. Cavalliere
- Biomedical Engineering Group, Department of Electronics, University of Alcalá, Alcalá de Henares, Spain
| | - E. Mª Sánchez Morla
- Department of Psychiatry, Hospital 12 de Octubre Research Institute (i+12), 28041 Madrid, Spain
- Faculty of Medicine, Complutense University of Madrid, 28040 Madrid, Spain
- CIBERSAM: Biomedical Research Networking Centre in Mental Health, 28029 Madrid, Spain
| | - L. Boquete
- Biomedical Engineering Group, Department of Electronics, University of Alcalá, Alcalá de Henares, Spain
- RETICS: Thematic Networks for Co-operative Research in Health for Ocular Diseases, Spain
| | - E. Garcia-Martin
- Department of Ophthalmology, Miguel Servet University Hospital, Zaragoza, Spain
- Aragon Institute for Health Research (IIS Aragon). Miguel Servet Ophthalmology Innovation and Research Group (GIMSO), University of Zaragoza, Spain
- RETICS: Thematic Networks for Co-operative Research in Health for Ocular Diseases, Spain
| |
Collapse
|
123
|
Monitoring Forest Health Using Hyperspectral Imagery: Does Feature Selection Improve the Performance of Machine-Learning Techniques? REMOTE SENSING 2021. [DOI: 10.3390/rs13234832] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
This study analyzed highly correlated, feature-rich datasets from hyperspectral remote sensing data using multiple statistical and machine-learning methods. The effect of filter-based feature selection methods on predictive performance was compared. In addition, the effect of multiple expert-based and data-driven feature sets, derived from the reflectance data, was investigated. Defoliation of trees (%), derived from in situ measurements from fall 2016, was modeled as a function of reflectance. Variable importance was assessed using permutation-based feature importance. Overall, the support vector machine (SVM) outperformed other algorithms, such as random forest (RF), extreme gradient boosting (XGBoost), and lasso (L1) and ridge (L2) regressions by at least three percentage points. The combination of certain feature sets showed small increases in predictive performance, while no substantial differences between individual feature sets were observed. For some combinations of learners and feature sets, filter methods achieved better predictive performances than using no feature selection. Ensemble filters did not have a substantial impact on performance. The most important features were located around the red edge. Additional features in the near-infrared region (800–1000 nm) were also essential to achieve the overall best performances. Filter methods have the potential to be helpful in high-dimensional situations and are able to improve the interpretation of feature effects in fitted models, which is an essential constraint in environmental modeling studies. Nevertheless, more training data and replication in similar benchmarking studies are needed to be able to generalize the results.
Collapse
|
124
|
Mahendran N, P M DRV. A deep learning framework with an embedded-based feature selection approach for the early detection of the Alzheimer's disease. Comput Biol Med 2021; 141:105056. [PMID: 34839903 DOI: 10.1016/j.compbiomed.2021.105056] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 11/20/2021] [Accepted: 11/20/2021] [Indexed: 12/29/2022]
Abstract
Ageing is associated with various ailments including Alzheimer 's disease (AD), which is a progressive form of dementia. AD symptoms develop over a period of years and, unfortunately, there is no cure. Existing AD treatments can only slow down the progression of symptoms and thus it is critical to diagnose the disease at an early stage. To help improve the early diagnosis of AD, a deep learning-based classification model with an embedded feature selection approach was used to classify AD patients. An AD DNA methylation data set (64 records with 34 cases and 34 controls) from the GEO omnibus database was used for the analysis. Before selecting the relevant features, the data were preprocessed by performing quality control, normalization and downstream analysis. As the number of associated CpG sites was huge, four embedded-based feature selection models were compared and the best method was used for the proposed classification model. An Enhanced Deep Recurrent Neural Network (EDRNN) was implemented and compared to other existing classification models, including a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), and a Deep Recurrent Neural Network (DRNN). The results showed a significant improvement in the classification accuracy of the proposed model as compared to the other methods.
Collapse
Affiliation(s)
- Nivedhitha Mahendran
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India.
| | - Durai Raj Vincent P M
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India.
| |
Collapse
|
125
|
Siddhartha M, Kumar V, Nath R. Early-stage diagnosis of chronic kidney disease using majority vote – Grey Wolf optimization (MV-GWO). HEALTH AND TECHNOLOGY 2021. [DOI: 10.1007/s12553-021-00617-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
126
|
Chaddad A, Li J, Lu Q, Li Y, Okuwobi IP, Tanougast C, Desrosiers C, Niazi T. Can Autism Be Diagnosed with Artificial Intelligence? A Narrative Review. Diagnostics (Basel) 2021; 11:2032. [PMID: 34829379 PMCID: PMC8618159 DOI: 10.3390/diagnostics11112032] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Revised: 10/31/2021] [Accepted: 10/31/2021] [Indexed: 11/16/2022] Open
Abstract
Radiomics with deep learning models have become popular in computer-aided diagnosis and have outperformed human experts on many clinical tasks. Specifically, radiomic models based on artificial intelligence (AI) are using medical data (i.e., images, molecular data, clinical variables, etc.) for predicting clinical tasks such as autism spectrum disorder (ASD). In this review, we summarized and discussed the radiomic techniques used for ASD analysis. Currently, the limited radiomic work of ASD is related to the variation of morphological features of brain thickness that is different from texture analysis. These techniques are based on imaging shape features that can be used with predictive models for predicting ASD. This review explores the progress of ASD-based radiomics with a brief description of ASD and the current non-invasive technique used to classify between ASD and healthy control (HC) subjects. With AI, new radiomic models using the deep learning techniques will be also described. To consider the texture analysis with deep CNNs, more investigations are suggested to be integrated with additional validation steps on various MRI sites.
Collapse
Affiliation(s)
- Ahmad Chaddad
- School of Artificial Intelligence, Guilin Universiy of Electronic Technology, Guilin 541004, China; (J.L.); (Q.L.); (Y.L.); (I.P.O.)
- The Laboratory for Imagery, Vision and Artificial Intelligence, École de Technologie Supérieure (ETS), Montreal, QC H3C 1K3, Canada;
| | - Jiali Li
- School of Artificial Intelligence, Guilin Universiy of Electronic Technology, Guilin 541004, China; (J.L.); (Q.L.); (Y.L.); (I.P.O.)
| | - Qizong Lu
- School of Artificial Intelligence, Guilin Universiy of Electronic Technology, Guilin 541004, China; (J.L.); (Q.L.); (Y.L.); (I.P.O.)
| | - Yujie Li
- School of Artificial Intelligence, Guilin Universiy of Electronic Technology, Guilin 541004, China; (J.L.); (Q.L.); (Y.L.); (I.P.O.)
| | - Idowu Paul Okuwobi
- School of Artificial Intelligence, Guilin Universiy of Electronic Technology, Guilin 541004, China; (J.L.); (Q.L.); (Y.L.); (I.P.O.)
| | - Camel Tanougast
- Laboratoire de Conception, Optimisation et Modélisation des Systèmes, University of Lorraine, 57070 Metz, France;
| | - Christian Desrosiers
- The Laboratory for Imagery, Vision and Artificial Intelligence, École de Technologie Supérieure (ETS), Montreal, QC H3C 1K3, Canada;
| | - Tamim Niazi
- Lady Davis Institute for Medical Research, McGill University, Montreal, QC H3T 1E2, Canada;
| |
Collapse
|
127
|
A highly predictive autoantibody-based biomarker panel for prognosis in early-stage NSCLC with potential therapeutic implications. Br J Cancer 2021; 126:238-246. [PMID: 34728792 PMCID: PMC8770460 DOI: 10.1038/s41416-021-01572-x] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Revised: 09/12/2021] [Accepted: 09/30/2021] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Lung cancer is the leading cause of cancer-related death worldwide. Surgical resection remains the definitive curative treatment for early-stage disease offering an overall 5-year survival rate of 62%. Despite careful case selection, a significant proportion of early-stage cancers relapse aggressively within the first year post-operatively. Identification of these patients is key to accurate prognostication and understanding the biology that drives early relapse might open up potential novel adjuvant therapies. METHODS We performed an unsupervised interrogation of >1600 serum-based autoantibody biomarkers using an iterative machine-learning algorithm. RESULTS We identified a 13 biomarker signature that was highly predictive for survivorship in post-operative early-stage lung cancer; this outperforms currently used autoantibody biomarkers in solid cancers. Our results demonstrate significantly poor survivorship in high expressers of this biomarker signature with an overall 5-year survival rate of 7.6%. CONCLUSIONS We anticipate that the data will lead to the development of an off-the-shelf prognostic panel and further that the oncogenic relevance of the proteins recognised in the panel may be a starting point for a new adjuvant therapy.
Collapse
|
128
|
Ouchani M, Gharibzadeh S, Jamshidi M, Amini M. A Review of Methods of Diagnosis and Complexity Analysis of Alzheimer's Disease Using EEG Signals. BIOMED RESEARCH INTERNATIONAL 2021; 2021:5425569. [PMID: 34746303 PMCID: PMC8566072 DOI: 10.1155/2021/5425569] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Revised: 06/20/2021] [Accepted: 10/18/2021] [Indexed: 01/27/2023]
Abstract
This study will concentrate on recent research on EEG signals for Alzheimer's diagnosis, identifying and comparing key steps of EEG-based Alzheimer's disease (AD) detection, such as EEG signal acquisition, preprocessing function extraction, and classification methods. Furthermore, highlighting general approaches, variations, and agreement in the use of EEG identified shortcomings and guidelines for multiple experimental stages ranging from demographic characteristics to outcomes monitoring for future research. Two main targets have been defined based on the article's purpose: (1) discriminative (or detection), i.e., look for differences in EEG-based features across groups, such as MCI, moderate Alzheimer's disease, extreme Alzheimer's disease, other forms of dementia, and stable normal elderly controls; and (2) progression determination, i.e., look for correlations between EEG-based features and clinical markers linked to MCI-to-AD conversion and Alzheimer's disease intensity progression. Limitations mentioned in the reviewed papers were also gathered and explored in this study, with the goal of gaining a better understanding of the problems that need to be addressed in order to advance the use of EEG in Alzheimer's disease science.
Collapse
Affiliation(s)
- Mahshad Ouchani
- Institute for Cognitive and Brain Sciences, Shahid Beheshti University, Tehran, Iran
| | - Shahriar Gharibzadeh
- Institute for Cognitive and Brain Sciences, Shahid Beheshti University, Tehran, Iran
| | - Mahdieh Jamshidi
- Institute for Cognitive and Brain Sciences, Shahid Beheshti University, Tehran, Iran
| | - Morteza Amini
- Shahid Beheshti University, Tehran, Iran
- Institute for Cognitive Science Studies (ICSS), Tehran, Iran
| |
Collapse
|
129
|
Li Y, Li G, Guo L. Feature Selection for Regression Based on Gamma Test Nested Monte Carlo Tree Search. ENTROPY 2021; 23:e23101331. [PMID: 34682055 PMCID: PMC8535147 DOI: 10.3390/e23101331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 10/06/2021] [Accepted: 10/07/2021] [Indexed: 12/03/2022]
Abstract
This paper investigates the nested Monte Carlo tree search (NMCTS) for feature selection on regression tasks. NMCTS starts out with an empty subset and uses search results of lower nesting level simulation. Level 0 is based on random moves until the path reaches the leaf node. In order to accomplish feature selection on the regression task, the Gamma test is introduced to play the role of the reward function at the end of the simulation. The concept Vratio of the Gamma test is also combined with the original UCT-tuned1 and the design of stopping conditions in the selection and simulation phases. The proposed GNMCTS method was tested on seven numeric datasets and compared with six other feature selection methods. It shows better performance than the vanilla MCTS framework and maintains the relevant information in the original feature space. The experimental results demonstrate that GNMCTS is a robust and effective tool for feature selection. It can accomplish the task well in a reasonable computation budget.
Collapse
Affiliation(s)
- Ying Li
- Beijing Key Lab of Petroleum Data Mining, Department of Geophysics, China University of Petroleum, Beijing 102249, China; (Y.L.); (L.G.)
| | - Guohe Li
- Beijing Key Lab of Petroleum Data Mining, Department of Geophysics, China University of Petroleum, Beijing 102249, China; (Y.L.); (L.G.)
- Correspondence:
| | - Lingun Guo
- Beijing Key Lab of Petroleum Data Mining, Department of Geophysics, China University of Petroleum, Beijing 102249, China; (Y.L.); (L.G.)
- College of Software, Henan Normal University, Xinxiang 453007, China
| |
Collapse
|
130
|
Jiang Z, Zhang Y, Wang J. A multi-surrogate-assisted dual-layer ensemble feature selection algorithm. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
131
|
Bommert A, Welchowski T, Schmid M, Rahnenführer J. Benchmark of filter methods for feature selection in high-dimensional gene expression survival data. Brief Bioinform 2021; 23:6366322. [PMID: 34498681 PMCID: PMC8769710 DOI: 10.1093/bib/bbab354] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Revised: 08/05/2021] [Accepted: 08/10/2021] [Indexed: 11/30/2022] Open
Abstract
Feature selection is crucial for the analysis of high-dimensional data, but benchmark studies for data with a survival outcome are rare. We compare 14 filter methods for feature selection based on 11 high-dimensional gene expression survival data sets. The aim is to provide guidance on the choice of filter methods for other researchers and practitioners. We analyze the accuracy of predictive models that employ the features selected by the filter methods. Also, we consider the run time, the number of selected features for fitting models with high predictive accuracy as well as the feature selection stability. We conclude that the simple variance filter outperforms all other considered filter methods. This filter selects the features with the largest variance and does not take into account the survival outcome. Also, we identify the correlation-adjusted regression scores filter as a more elaborate alternative that allows fitting models with similar predictive accuracy. Additionally, we investigate the filter methods based on feature rankings, finding groups of similar filters.
Collapse
Affiliation(s)
- Andrea Bommert
- Department of Statistics, TU Dortmund University, Vogelpothsweg 87, 44227, Dortmund, Germany
| | - Thomas Welchowski
- Institute of Medical Biometry, Informatics and Epidemiology (IMBIE), Medical Faculty, University of Bonn, Venusberg-Campus 1, 53127, Bonn, Germany
| | - Matthias Schmid
- Institute of Medical Biometry, Informatics and Epidemiology (IMBIE), Medical Faculty, University of Bonn, Venusberg-Campus 1, 53127, Bonn, Germany
| | - Jörg Rahnenführer
- Department of Statistics, TU Dortmund University, Vogelpothsweg 87, 44227, Dortmund, Germany
| |
Collapse
|
132
|
Sundaram S, Zeid A. Smart Prognostics and Health Management (SPHM) in Smart Manufacturing: An Interoperable Framework. SENSORS (BASEL, SWITZERLAND) 2021; 21:5994. [PMID: 34577203 PMCID: PMC8472989 DOI: 10.3390/s21185994] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Revised: 08/28/2021] [Accepted: 09/01/2021] [Indexed: 11/18/2022]
Abstract
Advances in the manufacturing industry have led to modern approaches such as Industry 4.0, Cyber-Physical Systems, Smart Manufacturing (SM) and Digital Twins. The traditional manufacturing architecture that consisted of hierarchical layers has evolved into a hierarchy-free network in which all the areas of a manufacturing enterprise are interconnected. The field devices on the shop floor generate large amounts of data that can be useful for maintenance planning. Prognostics and Health Management (PHM) approaches use this data and help us in fault detection and Remaining Useful Life (RUL) estimation. Although there is a significant amount of research primarily focused on tool wear prediction and Condition-Based Monitoring (CBM), there is not much importance given to the multiple facets of PHM. This paper conducts a review of PHM approaches, the current research trends and proposes a three-phased interoperable framework to implement Smart Prognostics and Health Management (SPHM). The uniqueness of SPHM lies in its framework, which makes it applicable to any manufacturing operation across the industry. The framework consists of three phases: Phase 1 consists of the shopfloor setup and data acquisition steps, Phase 2 describes steps to prepare and analyze the data and Phase 3 consists of modeling, predictions and deployment. The first two phases of SPHM are addressed in detail and an overview is provided for the third phase, which is a part of ongoing research. As a use-case, the first two phases of the SPHM framework are applied to data from a milling machine operation.
Collapse
Affiliation(s)
| | - Abe Zeid
- College of Engineering, Northeastern University, Boston, MA 02135, USA;
| |
Collapse
|
133
|
Barone S, Cannella R, Comelli A, Pellegrino A, Salvaggio G, Stefano A, Vernuccio F. Hybrid descriptive‐inferential method for key feature selection in prostate cancer radiomics. APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY 2021; 37:961-972. [DOI: 10.1002/asmb.2642] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Accepted: 07/18/2021] [Indexed: 01/03/2025]
Abstract
AbstractIn healthcare industry 4.0, a big role is played by radiomics. Radiomics concerns the extraction and analysis of quantitative information not visible to the naked eye, even by expert operators, from biomedical images. Radiomics involves the management of digital images as data matrices, with the aim of extracting a number of morphological and predictive variables, named features, using automatic or semi‐automatic methods. Multidisciplinary methods as machine learning and deep learning are fully involved in this field. However, the large number of features requires efficient and effective core methods for their selection, in order to avoid bias or misinterpretations problems. In this work, the authors propose a novel method for feature selection in radiomics. The proposed method is based on an original combination of descriptive and inferential statistics. Its validity is illustrated through a case study on prostate cancer analysis, conducted at the university hospital of Palermo, Italy.
Collapse
Affiliation(s)
- Stefano Barone
- Dipartimento di Scienze Agrarie, Alimentari e Forestali Università degli Studi di Palermo Palermo Italy
| | - Roberto Cannella
- Dipartimento di Biomedicina, Neuroscienze e Diagnostica Avanzata Università degli Studi di Palermo Palermo Italy
| | - Albert Comelli
- Fondazione Ri.MED Palermo Italy
- Istituto di Bioimmagini e Fisiologia Molecolare, Consiglio Nazionale delle Ricerche (IBFM‐CNR) Cefalù Italy
| | - Arianna Pellegrino
- Dipartimento di Ingegneria Meccanica e Aerospaziale Politecnico di Torino Turin Italy
| | - Giuseppe Salvaggio
- Dipartimento di Biomedicina, Neuroscienze e Diagnostica Avanzata Università degli Studi di Palermo Palermo Italy
| | - Alessandro Stefano
- Istituto di Bioimmagini e Fisiologia Molecolare, Consiglio Nazionale delle Ricerche (IBFM‐CNR) Cefalù Italy
| | - Federica Vernuccio
- Dipartimento di Biomedicina, Neuroscienze e Diagnostica Avanzata Università degli Studi di Palermo Palermo Italy
| |
Collapse
|
134
|
Hamid TMTA, Sallehuddin R, Yunos ZM, Ali A. Ensemble Based Filter Feature Selection with Harmonize Particle Swarm Optimization and Support Vector Machine for Optimal Cancer Classification. MACHINE LEARNING WITH APPLICATIONS 2021. [DOI: 10.1016/j.mlwa.2021.100054] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
|
135
|
Topolski M. Application of Feature Extraction Methods for Chemical Risk Classification in the Pharmaceutical Industry. SENSORS 2021; 21:s21175753. [PMID: 34502644 PMCID: PMC8434006 DOI: 10.3390/s21175753] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2021] [Revised: 08/20/2021] [Accepted: 08/21/2021] [Indexed: 11/25/2022]
Abstract
The features that are used in the classification process are acquired from sensor data on the production site (associated with toxic, physicochemical properties) and also a dataset associated with cybersecurity that may affect the above-mentioned risk. These are large datasets, so it is important to reduce them. The author’s motivation was to develop a method of assessing the dimensionality of features based on correlation measures and the discriminant power of features allowing for a more accurate reduction of their dimensions compared to the classical Kaiser criterion and assessment of scree plot. The method proved to be promising. The results obtained in the experiments demonstrate that the quality of classification after extraction is better than using classical criteria for estimating the number of components and features. Experiments were carried out for various extraction methods, demonstrating that the rotation of factors according to centroids of a class in this classification task gives the best risk assessment of chemical threats. The classification quality increased by about 7% compared to a model where feature extraction was not used and resulted in an improvement of 4% compared to the classical PCA method with the Kaiser criterion, with an evaluation of the scree plot. Furthermore, it has been shown that there is a certain subspace of cybersecurity features, which complemented with the features of the concentration of volatile substances, affects the risk assessment of chemical hazards. The identified cybersecurity factors are the number of packets lost, incorrect Logins, incorrect sensor responses, increased email spam, and excessive traffic in the computer network. To visualize the speed of classification in real-time, simulations were carried out for various systems used in Industry 4.0.
Collapse
Affiliation(s)
- Mariusz Topolski
- Department of Systems and Computer Networks, Faculty of Electronics, Wrocław University of Science and Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland
| |
Collapse
|
136
|
Khaleghi MK, Savizi ISP, Lewis NE, Shojaosadati SA. Synergisms of machine learning and constraint-based modeling of metabolism for analysis and optimization of fermentation parameters. Biotechnol J 2021; 16:e2100212. [PMID: 34390201 DOI: 10.1002/biot.202100212] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 08/10/2021] [Accepted: 08/11/2021] [Indexed: 11/06/2022]
Abstract
Recent noteworthy advances in the development of high-performing microbial and mammalian strains have enabled the sustainable production of bio-economically valuable substances such as bio-compounds, biofuels, and biopharmaceuticals. However, to obtain an industrially viable mass-production scheme, much time and effort are required. The robust and rational design of fermentation processes requires analysis and optimization of different extracellular conditions and medium components, which have a massive effect on growth and productivity. In this regard, knowledge- and data-driven modeling methods have received much attention. Constraint-based modeling (CBM) is a knowledge-driven mathematical approach that has been widely used in fermentation analysis and optimization due to its capabilities of predicting the cellular phenotype from genotype through high-throughput means. On the other hand, machine learning (ML) is a data-driven statistical method that identifies the data patterns within sophisticated biological systems and processes, where there is inadequate knowledge to represent underlying mechanisms. Furthermore, ML models are becoming a viable complement to constraint-based models in a reciprocal manner when one is used as a pre-step of another. As a result, more predictable model is produced. This review highlights the applications of CBM and ML independently and the combination of these two approaches for analyzing and optimizing fermentation parameters. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Mohammad Karim Khaleghi
- Biotechnology Department, Faculty of Chemical Engineering, Tarbiat Modares University, Tehran, Iran
| | - Iman Shahidi Pour Savizi
- Biotechnology Department, Faculty of Chemical Engineering, Tarbiat Modares University, Tehran, Iran
| | - Nathan E Lewis
- Department of Bioengineering, University of California, San Diego, USA.,Department of Pediatrics, University of California, San Diego, USA
| | - Seyed Abbas Shojaosadati
- Biotechnology Department, Faculty of Chemical Engineering, Tarbiat Modares University, Tehran, Iran
| |
Collapse
|
137
|
Degeest A, Frénay B, Verleysen M. Reading grid for feature selection relevance criteria in regression. Pattern Recognit Lett 2021. [DOI: 10.1016/j.patrec.2021.04.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
138
|
Usmani S, Saboor A, Haris M, Khan MA, Park H. Latest Research Trends in Fall Detection and Prevention Using Machine Learning: A Systematic Review. SENSORS 2021; 21:s21155134. [PMID: 34372371 PMCID: PMC8347190 DOI: 10.3390/s21155134] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 07/16/2021] [Accepted: 07/24/2021] [Indexed: 12/15/2022]
Abstract
Falls are unusual actions that cause a significant health risk among older people. The growing percentage of people of old age requires urgent development of fall detection and prevention systems. The emerging technology focuses on developing such systems to improve quality of life, especially for the elderly. A fall prevention system tries to predict and reduce the risk of falls. In contrast, a fall detection system observes the fall and generates a help notification to minimize the consequences of falls. A plethora of technical and review papers exist in the literature with a primary focus on fall detection. Similarly, several studies are relatively old, with a focus on wearables only, and use statistical and threshold-based approaches with a high false alarm rate. Therefore, this paper presents the latest research trends in fall detection and prevention systems using Machine Learning (ML) algorithms. It uses recent studies and analyzes datasets, age groups, ML algorithms, sensors, and location. Additionally, it provides a detailed discussion of the current trends of fall detection and prevention systems with possible future directions. This overview can help researchers understand the current systems and propose new methodologies by improving the highlighted issues.
Collapse
Affiliation(s)
- Sara Usmani
- School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST), Islamabad 44000, Pakistan; (S.U.); (M.H.)
| | - Abdul Saboor
- Department of Electrical Engineering (ESAT), Katholieke Universiteit (KU) Leuven, 3000 Leuven, Belgium;
| | - Muhammad Haris
- School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST), Islamabad 44000, Pakistan; (S.U.); (M.H.)
| | - Muneeb A. Khan
- Department of Software, Sangmyung University, Cheonan 31066, Korea;
| | - Heemin Park
- Department of Software, Sangmyung University, Cheonan 31066, Korea;
- Correspondence:
| |
Collapse
|
139
|
Biological knowledge-slanted random forest approach for the classification of calcified aortic valve stenosis. BioData Min 2021; 14:35. [PMID: 34301292 PMCID: PMC8305490 DOI: 10.1186/s13040-021-00269-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2021] [Accepted: 07/18/2021] [Indexed: 11/29/2022] Open
Abstract
Background Calcific aortic valve stenosis (CAVS) is a fatal disease and there is no pharmacological treatment to prevent the progression of CAVS. This study aims to identify genes potentially implicated with CAVS in patients with congenital bicuspid aortic valve (BAV) and tricuspid aortic valve (TAV) in comparison with patients having normal valves, using a knowledge-slanted random forest (RF). Results This study implemented a knowledge-slanted random forest (RF) using information extracted from a protein-protein interactions network to rank genes in order to modify their selection probability to draw the candidate split-variables. A total of 15,191 genes were assessed in 19 valves with CAVS (BAV, n = 10; TAV, n = 9) and 8 normal valves. The performance of the model was evaluated using accuracy, sensitivity, and specificity to discriminate cases with CAVS. A comparison with conventional RF was also performed. The performance of this proposed approach reported improved accuracy in comparison with conventional RF to classify cases separately with BAV and TAV (Slanted RF: 59.3% versus 40.7%). When patients with BAV and TAV were grouped against patients with normal valves, the addition of prior biological information was not relevant with an accuracy of 92.6%. Conclusion The knowledge-slanted RF approach reflected prior biological knowledge, leading to better precision in distinguishing between cases with BAV, TAV, and normal valves. The results of this study suggest that the integration of biological knowledge can be useful during difficult classification tasks. Supplementary Information The online version contains supplementary material available at 10.1186/s13040-021-00269-4.
Collapse
|
140
|
Outlier Detection Based Feature Selection Exploiting Bio-Inspired Optimization Algorithms. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11156769] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The curse of dimensionality problem occurs when the data are high-dimensional. It affects the learning process and reduces the accuracy. Feature selection is one of the dimensionality reduction approaches that mainly contribute to solving the curse of the dimensionality problem by selecting the relevant features. Irrelevant features are the dependent and redundant features that cause noise in the data and then reduce its quality. The main well-known feature-selection methods are wrapper and filter techniques. However, wrapper feature selection techniques are computationally expensive, whereas filter feature selection methods suffer from multicollinearity. In this research study, four new feature selection methods based on outlier detection using the Projection Pursuit method are proposed. Outlier detection involves identifying abnormal data (irrelevant features of the transpose matrix obtained from the original dataset matrix). The concept of outlier detection using projection pursuit has proved its efficiency in many applications but has not yet been used as a feature selection approach. To the author’s knowledge, this study is the first of its kind. Experimental results on nineteen real datasets using three classifiers (k-NN, SVM, and Random Forest) indicated that the suggested methods enhanced the classification accuracy rate by an average of 6.64% when compared to the classification accuracy without applying feature selection. It also outperformed the state-of-the-art methods on most of the used datasets with an improvement rate ranging between 0.76% and 30.64%. Statistical analysis showed that the results of the proposed methods are statistically significant.
Collapse
|
141
|
Abstract
Class imbalance and high dimensionality are two major issues in several real-life applications, e.g., in the fields of bioinformatics, text mining and image classification. However, while both issues have been extensively studied in the machine learning community, they have mostly been treated separately, and little research has been thus far conducted on which approaches might be best suited to deal with datasets that are class-imbalanced and high-dimensional at the same time (i.e., with a large number of features). This work attempts to give a contribution to this challenging research area by studying the effectiveness of hybrid learning strategies that involve the integration of feature selection techniques, to reduce the data dimensionality, with proper methods that cope with the adverse effects of class imbalance (in particular, data balancing and cost-sensitive methods are considered). Extensive experiments have been carried out across datasets from different domains, leveraging a well-known classifier, the Random Forest, which has proven to be effective in high-dimensional spaces and has also been successfully applied to imbalanced tasks. Our results give evidence of the benefits of such a hybrid approach, when compared to using only feature selection or imbalance learning methods alone.
Collapse
|
142
|
Particle Swarm Optimization and Multiple Stacked Generalizations to Detect Nitrogen and Organic-Matter in Organic-Fertilizer Using Vis-NIR. SENSORS 2021; 21:s21144882. [PMID: 34300620 PMCID: PMC8309747 DOI: 10.3390/s21144882] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2021] [Revised: 07/13/2021] [Accepted: 07/16/2021] [Indexed: 11/29/2022]
Abstract
Organic fertilizer is a key component of agricultural sustainability and significantly contributes to the improvement of soil fertility. The values of nutrients such as organic matter and nitrogen in organic fertilizers positively affect plant growth and cause environmental problems when used in large amounts. Hence the importance of implementing fast detection of nitrogen (N) and organic matter (OM). This paper examines the feasibility of a framework that combined a particle swarm optimization (PSO) and two multiple stacked generalizations to determine the amount of nitrogen and organic matter in organic-fertilizer using visible near-infrared spectroscopy (Vis-NIR). The first multiple stacked generalizations for classification coupled with PSO (FSGC-PSO) were for feature selection purposes, while the second stacked generalizations for regression (SSGR) improved the detection of nitrogen and organic matter. The computation of root means square error (RMSE) and the coefficient of determination for calibration and prediction set (R2) was used to gauge the different models. The obtained FSGC-PSO subset combined with SSGR achieved significantly better prediction results than conventional methods such as Ridge, support vector machine (SVM), and partial least square (PLS) for both nitrogen (R2p = 0.9989, root mean square error of prediction (RMSEP) = 0.031 and limit of detection (LOD) = 2.97) and organic matter (R2p = 0.9972, RMSEP = 0.051 and LOD = 2.97). Therefore, our settled approach can be implemented as a promising way to monitor and evaluate the amount of N and OM in organic fertilizer.
Collapse
|
143
|
Sheikhi G, Altınçay H. A novel dissimilarity metric based on feature‐to‐feature scatter frequencies for clustering‐based feature selection in biomedical data. Comput Intell 2021. [DOI: 10.1111/coin.12470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Ghazaal Sheikhi
- Department of Computer Engineering Final International University Kyrenia North Cyprus Turkey
| | - Hakan Altınçay
- Department of Computer Engineering Eastern Mediterranean University Famagusta North Cyprus Turkey
| |
Collapse
|
144
|
Krarti M, Aldubyan M. Review analysis of COVID-19 impact on electricity demand for residential buildings. RENEWABLE & SUSTAINABLE ENERGY REVIEWS 2021; 143:110888. [PMID: 36310544 PMCID: PMC9586839 DOI: 10.1016/j.rser.2021.110888] [Citation(s) in RCA: 54] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2020] [Revised: 02/18/2021] [Accepted: 02/25/2021] [Indexed: 05/02/2023]
Abstract
In this paper, a systematic review analysis of fully enforced stay at home orders and government lockdowns is presented. The main goal of the analysis is to identify the impacts of stay home living patterns on energy consumption of residential buildings. Specifically, metered data collected from various reported sources are reviewed and analyzed to assess the changes in overall electricity demand for various countries and US states. Weather adjusted time series data of electricity demand before and after COVID-19 lockdowns are used to determine the magnitude of changes in electricity demand and residential energy use patterns. The analysis results indicate that while overall electricity demand is lower because of lockdowns that impact commercial buildings and manufacturing sectors, the energy consumption for the housing sector has increased by as much as 30% during the full 2020 lockdown period. Analysis of reported end-use data indicates that most of the increase in household energy demand is due to higher occupancy patterns during daytime hours, resulting in increased use of energy intensive systems such as heating, air conditioning, lighting, and appliances. Several energy efficiency and renewable energy solutions are presented to cost-effectively mitigate the increase in energy demands due to extended stayhome living patterns.
Collapse
Affiliation(s)
- Moncef Krarti
- University of Colorado Boulder, CO, USA
- KAPSARC, Riyadh, Saudi Arabia
| | | |
Collapse
|
145
|
Robust variable selection for model-based learning in presence of adulteration. Comput Stat Data Anal 2021. [DOI: 10.1016/j.csda.2021.107186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
146
|
Pashaei E, Pashaei E. Gene selection using hybrid dragonfly black hole algorithm: A case study on RNA-seq COVID-19 data. Anal Biochem 2021; 627:114242. [PMID: 33974890 DOI: 10.1016/j.ab.2021.114242] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 04/12/2021] [Accepted: 05/02/2021] [Indexed: 11/18/2022]
Abstract
This paper introduces a new hybrid approach (DBH) for solving gene selection problem that incorporates the strengths of two existing metaheuristics: binary dragonfly algorithm (BDF) and binary black hole algorithm (BBHA). This hybridization aims to identify a limited and stable set of discriminative genes without sacrificing classification accuracy, whereas most current methods have encountered challenges in extracting disease-related information from a vast amount of redundant genes. The proposed approach first applies the minimum redundancy maximum relevancy (MRMR) filter method to reduce the dimensionality of feature space and then utilizes the suggested hybrid DBH algorithm to determine a smaller set of significant genes. The proposed approach was evaluated on eight benchmark gene expression datasets, and then, was compared against the latest state-of-art techniques to demonstrate algorithm efficiency. The comparative study shows that the proposed approach achieves a significant improvement as compared with existing methods in terms of classification accuracy and the number of selected genes. Moreover, the performance of the suggested method was examined on real RNA-Seq coronavirus-related gene expression data of asthmatic patients for selecting the most significant genes in order to improve the discriminative accuracy of angiotensin-converting enzyme 2 (ACE2). ACE2, as a coronavirus receptor, is a biomarker that helps to classify infected patients from uninfected in order to identify subgroups at risk for COVID-19. The result denotes that the suggested MRMR-DBH approach represents a very promising framework for finding a new combination of most discriminative genes with high classification accuracy.
Collapse
Affiliation(s)
- Elnaz Pashaei
- Department of Software Engineering, Istanbul Aydin University, Istanbul, Turkey.
| | - Elham Pashaei
- Department of Computer Engineering, Istanbul Gelisim University, Istanbul, Turkey.
| |
Collapse
|
147
|
Novel Prediction Model for Steel Mechanical Properties with MSVR Based on MIC and Complex Network Clustering. METALS 2021. [DOI: 10.3390/met11050747] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Traditional mechanical properties prediction models are mostly based on experience and mechanism, which neglect the linear and nonlinear relationships between process parameters. Aiming at the high-dimensional data collected in the complex industrial process of steel production, a new prediction model is proposed. The multidimensional support vector regression (MSVR)-based model is combined with the feature selection method, which involves maximum information coefficient (MIC) correlation characterization and complex network clustering. Firstly, MIC is used to measure the correlation between process parameters and mechanical properties, based on which a complex network is constructed and hierarchical clustering is performed. Secondly, we evaluate all parameters and select a representative one for each partition as the input of the subsequent model based on the centrality and influence indicators. Finally, an actual steel production case is used to train the MSVR prediction model. The prediction results show that our proposed framework can capture effective features from the full parameters in terms of higher prediction accuracy and is less time-consuming compared with the Pearson-based subset, full-parameter subset, and empirical subset input. The feature selection method based on MIC can dig out some nonlinear relationships which cannot be found by Pearson coefficient.
Collapse
|
148
|
P A, G SS, Srivastava G, Maddikunta PKR, Gadekallu TR. A Two-stage Text Feature Selection Algorithm for Improving Text Classification. ACM T ASIAN LOW-RESO 2021. [DOI: 10.1145/3425781] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
As the number of digital text documents increases on a daily basis, the classification of text is becoming a challenging task. Each text document consists of a large number of words (or features) that drive down the efficiency of a classification algorithm. This article presents an optimized feature selection algorithm designed to reduce a large number of features to improve the accuracy of the text classification algorithm. The proposed algorithm uses noun-based filtering, a word ranking that enhances the performance of the text classification algorithm. Experiments are carried out on three benchmark datasets, and the results show that the proposed classification algorithm has achieved the maximum accuracy when compared to the existing algorithms. The proposed algorithm is compared to Term Frequency-Inverse Document Frequency, Balanced Accuracy Measure, GINI Index, Information Gain, and Chi-Square. The experimental results clearly show the strength of the proposed algorithm.
Collapse
Affiliation(s)
- Ashokkumar P
- Sri Ramachandra College of Engineering and Technology, Sri Ramachandra Institute of Higher Education and Research, Chennai, Tamil Nadu
| | - Siva Shankar G
- Sri Ramachandra College of Engineering and Technology, Sri Ramachandra Institute of Higher Education and Research, Chennai, Tamil Nadu
| | - Gautam Srivastava
- Department of Mathematics and Computer Science, Brandon University Research Center for Interneural Computing, China Medical University, Taichung, Taiwan, Republic of China
| | | | | |
Collapse
|
149
|
Computational methods for integrative evaluation of confidence, accuracy, and reaction time in facial affect recognition in schizophrenia. SCHIZOPHRENIA RESEARCH-COGNITION 2021; 25:100196. [PMID: 33996517 PMCID: PMC8093458 DOI: 10.1016/j.scog.2021.100196] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Revised: 03/06/2021] [Accepted: 03/10/2021] [Indexed: 11/21/2022]
Abstract
People with schizophrenia (SZ) process emotions less accurately than do healthy comparators (HC), and emotion recognition have expanded beyond accuracy to performance variables like reaction time (RT) and confidence. These domains are typically evaluated independently, but complex inter-relationships can be evaluated through machine learning at an item-by-item level. Using a mix of ranking and machine learning tools, we investigated item-by-item discrimination of facial affect with two emotion recognition tests (BLERT and ER-40) between SZ and HC. The best performing multi-domain model for ER40 had a large effect size in differentiating SZ and HC (d = 1.24) compared to a standard comparison of accuracy alone (d = 0.48); smaller increments in effect sizes were evident for the BLERT (d = 0.87 vs. d = 0.58). Almost half of the selected items were confidence ratings. Within SZ, machine learning models with ER40 (generally accuracy and reaction time) items predicted severity of depression and overconfidence in social cognitive ability, but not psychotic symptoms. Pending independent replication, the results support machine learning, and the inclusion of confidence ratings, in characterizing the social cognitive deficits in SZ. This moderate-sized study (n = 372) included subjects with schizophrenia (SZ, n = 218) and healthy controls (HC, n = 154). This paper explores the value of integrative evaluation of confidence, accuracy, and reaction time by way of machine learning in understanding the unique aspects of facial affect recognition in schizophrenia. Machine learning models better separated schizophrenia from healthy comparators that standard statistical comparison, confidence ratings contributed to this separation in a disproportionate manner. Machine learning approaches provide a novel way to analyze item-by-item associations with social cognition measures, or potentially other tests, where multiple overlapping dimensions exist. Aberrant confidence ratings interact with performance variables in complex ways to contribute to social cognitive deficits in schizophrenia.
Collapse
|
150
|
Kim YJ, Jeon JS, Cho SE, Kim KG, Kang SG. Prediction Models for Obstructive Sleep Apnea in Korean Adults Using Machine Learning Techniques. Diagnostics (Basel) 2021; 11:diagnostics11040612. [PMID: 33808100 PMCID: PMC8066462 DOI: 10.3390/diagnostics11040612] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 03/24/2021] [Accepted: 03/26/2021] [Indexed: 12/01/2022] Open
Abstract
This study aimed to investigate the applicability of machine learning to predict obstructive sleep apnea (OSA) among individuals with suspected OSA in South Korea. A total of 92 clinical variables for OSA were collected from 279 South Koreans (OSA, n = 213; no OSA, n = 66), from which seven major clinical indices were selected. The data were randomly divided into training data (OSA, n = 149; no OSA, n = 46) and test data (OSA, n = 64; no OSA, n = 20). Using the seven clinical indices, the OSA prediction models were trained using four types of machine learning models—logistic regression, support vector machine (SVM), random forest, and XGBoost (XGB)—and each model was validated using the test data. In the validation, the SVM showed the best OSA prediction result with a sensitivity, specificity, and area under curve (AUC) of 80.33%, 86.96%, and 0.87, respectively, while the XGB showed the lowest OSA prediction performance with a sensitivity, specificity, and AUC of 78.69%, 73.91%, and 0.80, respectively. The machine learning algorithms showed high OSA prediction performance using data from South Koreans with suspected OSA. Hence, machine learning will be helpful in clinical applications for OSA prediction in the Korean population.
Collapse
Affiliation(s)
- Young Jae Kim
- Department of Biomedical Engineering, Gil Medical Center, Gachon University College of Medicine, Incheon 21565, Korea; (Y.J.K.); (J.S.J.)
| | - Ji Soo Jeon
- Department of Biomedical Engineering, Gil Medical Center, Gachon University College of Medicine, Incheon 21565, Korea; (Y.J.K.); (J.S.J.)
| | - Seo-Eun Cho
- Department of Psychiatry, Gil Medical Center, Gachon University College of Medicine, Incheon 21565, Korea;
| | - Kwang Gi Kim
- Department of Biomedical Engineering, Gil Medical Center, Gachon University College of Medicine, Incheon 21565, Korea; (Y.J.K.); (J.S.J.)
- Correspondence: (K.G.K.); (S.-G.K.); Tel.: +82-32-458-2818 (S.-G.K.)
| | - Seung-Gul Kang
- Department of Psychiatry, Gil Medical Center, Gachon University College of Medicine, Incheon 21565, Korea;
- Correspondence: (K.G.K.); (S.-G.K.); Tel.: +82-32-458-2818 (S.-G.K.)
| |
Collapse
|