1
|
Wang Z, Wu D, Wang R, Nie F, Wang F. Joint Anchor Graph Embedding and Discrete Feature Scoring for Unsupervised Feature Selection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7974-7987. [PMID: 36417731 DOI: 10.1109/tnnls.2022.3222466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The success of existing unsupervised feature selection (UFS) methods heavily relies on the assumption that the intrinsic relationships among original high-dimensional (HD) data samples exist in the discriminative low-dimension (LD) subspace. However, previous UFS methods commonly construct pairwise graphs and employ l2,1 -norm regularization to severally preserve the local structure and calculate the score of features, which is computationally complex and easy to get stuck into local optimum, so that those approaches cannot be applied in dealing with large-scale datasets in practice. To overcome this challenge, we propose a novel UFS method, in which a novel anchor graph embedding paradigm is designed to extract the local neighborhood relationships among data samples by reducing the computational complexity of graph construction to be linear in the number of data. Moreover, to improve the optimality of selected features as well as the performance of downstream tasks, we propose a discrete feature scoring mechanism, which imposes orthogonal l2,0 -norm constraints on learned projections, in order to enhance the distinction of feature scores as well as reduce the probability of falling into local optimum. In addition, solving the proposed nonconvex and nonsmooth NP-hard problem is challenging, and we present an efficient optimization algorithm to address it and acquire a closed-form solution of the transformation matrix. Extensive experiments demonstrate the effectiveness and efficiency of the proposed UFS by comparison with several state-of-the-art approaches to clustering and image segmentation tasks.
Collapse
|
2
|
Wang S, Nie F, Wang Z, Wang R, Li X. Robust Principal Component Analysis via Joint Reconstruction and Projection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7175-7189. [PMID: 36367910 DOI: 10.1109/tnnls.2022.3214307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Principal component analysis (PCA) is one of the most widely used unsupervised dimensionality reduction algorithms, but it is very sensitive to outliers because the squared l2 -norm is used as distance metric. Recently, many scholars have devoted themselves to solving this difficulty. They learn the projection matrix from minimum reconstruction error or maximum projection variance as the starting point, which leads them to ignore a serious problem, that is, the original PCA learns the projection matrix by minimizing the reconstruction error and maximizing the projection variance simultaneously, but they only consider one of them, which imposes various limitations on the performance of model. To solve this problem, we propose a novel robust principal component analysis via joint reconstruction and projection, namely, RPCA-RP, which combines reconstruction error and projection variance to fully mine the potential information of data. Furthermore, we carefully design a discrete weight for model to implicitly distinguish between normal data and outliers, so as to easily remove outliers and improve the robustness of method. In addition, we also unexpectedly discovered that our method has anomaly detection capabilities. Subsequently, an effective iterative algorithm is explored to solve this problem and perform related theoretical analysis. Extensive experimental results on several real-world datasets and RGB large-scale dataset demonstrate the superiority of our method.
Collapse
|
3
|
Karami S, Saberi-Movahed F, Tiwari P, Marttinen P, Vahdati S. Unsupervised feature selection based on variance-covariance subspace distance. Neural Netw 2023; 166:188-203. [PMID: 37499604 DOI: 10.1016/j.neunet.2023.06.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 03/04/2023] [Accepted: 06/12/2023] [Indexed: 07/29/2023]
Abstract
Subspace distance is an invaluable tool exploited in a wide range of feature selection methods. The power of subspace distance is that it can identify a representative subspace, including a group of features that can efficiently approximate the space of original features. On the other hand, employing intrinsic statistical information of data can play a significant role in a feature selection process. Nevertheless, most of the existing feature selection methods founded on the subspace distance are limited in properly fulfilling this objective. To pursue this void, we propose a framework that takes a subspace distance into account which is called "Variance-Covariance subspace distance". The approach gains advantages from the correlation of information included in the features of data, thus determines all the feature subsets whose corresponding Variance-Covariance matrix has the minimum norm property. Consequently, a novel, yet efficient unsupervised feature selection framework is introduced based on the Variance-Covariance distance to handle both the dimensionality reduction and subspace learning tasks. The proposed framework has the ability to exclude those features that have the least variance from the original feature set. Moreover, an efficient update algorithm is provided along with its associated convergence analysis to solve the optimization side of the proposed approach. An extensive number of experiments on nine benchmark datasets are also conducted to assess the performance of our method from which the results demonstrate its superiority over a variety of state-of-the-art unsupervised feature selection methods. The source code is available at https://github.com/SaeedKarami/VCSDFS.
Collapse
Affiliation(s)
- Saeed Karami
- Department of Mathematics, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, 45137-66731, Iran
| | - Farid Saberi-Movahed
- Department of Applied Mathematics, Faculty of Sciences and Modern Technologies, Graduate University of Advanced Technology, Kerman, Iran.
| | - Prayag Tiwari
- School of Information Technology, Halmstad University, Sweden; Department of Computer Science, Aalto University, Espoo, Finland.
| | - Pekka Marttinen
- Department of Computer Science, Aalto University, Espoo, Finland
| | - Sahar Vahdati
- Nature-Inspired Machine Intelligence Group at InfAI, Dresden, Germany
| |
Collapse
|
4
|
Li Z, Nie F, Bian J, Wu D, Li X. Sparse PCA via l 2,p-Norm Regularization for Unsupervised Feature Selection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:5322-5328. [PMID: 34665722 DOI: 10.1109/tpami.2021.3121329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In the field of data mining, how to deal with high-dimensional data is an inevitable topic. Since it does not rely on labels, unsupervised feature selection has attracted a lot of attention. The performance of spectral-based unsupervised methods depends on the quality of the constructed similarity matrix, which is used to depict the intrinsic structure of data. However, real-world data often contain plenty of noise features, making the similarity matrix constructed by original data cannot be completely reliable. Worse still, the size of a similarity matrix expands rapidly as the number of samples rises, making the computational cost increase significantly. To solve this problem, a simple and efficient unsupervised model is proposed to perform feature selection. We formulate PCA as a reconstruction error minimization problem, and incorporate a l2,p-norm regularization term to make the projection matrix sparse. The learned row-sparse and orthogonal projection matrix is used to select discriminative features. Then, we present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically. Finally, experiments on both synthetic and real-world data sets demonstrate the effectiveness of our proposed method.
Collapse
|
5
|
Li Z, Nie F, Wu D, Hu Z, Li X. Unsupervised Feature Selection With Weighted and Projected Adaptive Neighbors. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:1260-1271. [PMID: 34343100 DOI: 10.1109/tcyb.2021.3087632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In the field of data mining, how to deal with high-dimensional data is a fundamental problem. If they are used directly, it is not only computationally expensive but also difficult to obtain satisfactory results. Unsupervised feature selection is designed to reduce the dimension of data by finding a subset of features in the absence of labels. Many unsupervised methods perform feature selection by exploring spectral analysis and manifold learning, such that the intrinsic structure of data can be preserved. However, most of these methods ignore a fact: due to the existence of noise features, the intrinsic structure directly built from original data may be unreliable. To solve this problem, a new unsupervised feature selection model is proposed. The graph structure, feature weights, and projection matrix are learned simultaneously, such that the intrinsic structure is constructed by the data that have been feature weighted and projected. For each data point, its nearest neighbors are acquired in the process of graph construction. Therefore, we call them adaptive neighbors. Besides, an additional constraint is added to the proposed model. It requires that a graph, corresponding to a similarity matrix, should contain exactly c connected components. Then, we present an optimization algorithm to solve the proposed model. Next, we discuss the method of determining the regularization parameter γ in our proposed method and analyze the computational complexity of the optimization algorithm. Finally, experiments are implemented on both synthetic and real-world datasets to demonstrate the effectiveness of the proposed method.
Collapse
|
6
|
Wang S, Nie F, Wang Z, Wang R, Li X. Max–Min Robust Principal Component Analysis. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2022.11.092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
7
|
Shang R, Kong J, Wang L, Zhang W, Wang C, Li Y, Jiao L. Unsupervised feature selection via discrete spectral clustering and feature weights. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2022.10.053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
8
|
Robust unsupervised feature selection via sparse and minimum-redundant subspace learning with dual regularization. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
9
|
Saberi-Movahed F, Mohammadifard M, Mehrpooya A, Rezaei-Ravari M, Berahmand K, Rostami M, Karami S, Najafzadeh M, Hajinezhad D, Jamshidi M, Abedi F, Mohammadifard M, Farbod E, Safavi F, Dorvash M, Mottaghi-Dastjerdi N, Vahedi S, Eftekhari M, Saberi-Movahed F, Alinejad-Rokny H, Band SS, Tavassoly I. Decoding clinical biomarker space of COVID-19: Exploring matrix factorization-based feature selection methods. Comput Biol Med 2022; 146:105426. [PMID: 35569336 PMCID: PMC8979841 DOI: 10.1016/j.compbiomed.2022.105426] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Revised: 03/01/2022] [Accepted: 03/18/2022] [Indexed: 02/06/2023]
Abstract
One of the most critical challenges in managing complex diseases like COVID-19 is to establish an intelligent triage system that can optimize the clinical decision-making at the time of a global pandemic. The clinical presentation and patients' characteristics are usually utilized to identify those patients who need more critical care. However, the clinical evidence shows an unmet need to determine more accurate and optimal clinical biomarkers to triage patients under a condition like the COVID-19 crisis. Here we have presented a machine learning approach to find a group of clinical indicators from the blood tests of a set of COVID-19 patients that are predictive of poor prognosis and morbidity. Our approach consists of two interconnected schemes: Feature Selection and Prognosis Classification. The former is based on different Matrix Factorization (MF)-based methods, and the latter is performed using Random Forest algorithm. Our model reveals that Arterial Blood Gas (ABG) O2 Saturation and C-Reactive Protein (CRP) are the most important clinical biomarkers determining the poor prognosis in these patients. Our approach paves the path of building quantitative and optimized clinical management systems for COVID-19 and similar diseases.
Collapse
Affiliation(s)
| | | | - Adel Mehrpooya
- School of Mathematical Sciences, Science and Engineering Faculty, Queensland University of Technology (QUT), Brisbane, Australia
| | | | - Kamal Berahmand
- School of Computer Science, Faculty of Science, Queensland University of Technology (QUT), Brisbane, Australia
| | - Mehrdad Rostami
- Center for Machine Vision and Signal Analysis (CMVS), University of Oulu, Oulu, Finland
| | - Saeed Karami
- Department of Mathematics, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, 45137-66731, Iran
| | - Mohammad Najafzadeh
- Department of Applied Mathematics, Faculty of Sciences and Modern Technologies, Graduate University of Advanced Technology, Kerman, Iran
| | | | - Mina Jamshidi
- Department of Applied Mathematics, Faculty of Sciences and Modern Technologies, Graduate University of Advanced Technology, Kerman, Iran
| | - Farshid Abedi
- Infectious Diseases Research Center, Birjand University of Medical Sciences, Birjand, Iran
| | | | - Elnaz Farbod
- Baruch College, City University of New York, New York, USA
| | - Farinaz Safavi
- Neuroimmunology and Neurovirology Branch, National Institute of Neurological Disorders and Stroke, National Institute of Health, Bethesda, MD, USA
| | - Mohammadreza Dorvash
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Viewbank, VIC, Australia
| | - Negar Mottaghi-Dastjerdi
- Department of Pharmacognosy and Pharmaceutical Biotechnology, School of Pharmacy, Iran University of Medical Sciences, Tehran, Iran
| | | | - Mahdi Eftekhari
- Department of Computer Engineering, Shahid Bahonar University of Kerman, Kerman, Iran
| | - Farid Saberi-Movahed
- Department of Applied Mathematics, Faculty of Sciences and Modern Technologies, Graduate University of Advanced Technology, Kerman, Iran,Corresponding author
| | - Hamid Alinejad-Rokny
- BioMedical Machine Learning Lab, The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW, 2052, Australia
| | - Shahab S. Band
- Future Technology Research Center, College of Future, National Yunlin University of Science and Technology, 123 University Road, Section 3, Douliou, Yunlin, 64002, Taiwan
| | - Iman Tavassoly
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY10029, USA,Corresponding author
| |
Collapse
|
10
|
Li W, Chen H, Li T, Wan J, Sang B. Unsupervised feature selection via self-paced learning and low-redundant regularization. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108150] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
11
|
Self-paced non-convex regularized analysis-synthesis dictionary learning for unsupervised feature selection. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108279] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
12
|
Jiang Z, Zhang Y, Wang J. A multi-surrogate-assisted dual-layer ensemble feature selection algorithm. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
13
|
|
14
|
Saberi-Movahed F, Mohammadifard M, Mehrpooya A, Rezaei-Ravari M, Berahmand K, Rostami M, Karami S, Najafzadeh M, Hajinezhad D, Jamshidi M, Abedi F, Mohammadifard M, Farbod E, Safavi F, Dorvash M, Vahedi S, Eftekhari M, Saberi-Movahed F, Tavassoly I. Decoding Clinical Biomarker Space of COVID-19: Exploring Matrix Factorization-based Feature Selection Methods. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2021:2021.07.07.21259699. [PMID: 34268522 PMCID: PMC8282111 DOI: 10.1101/2021.07.07.21259699] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
One of the most critical challenges in managing complex diseases like COVID-19 is to establish an intelligent triage system that can optimize the clinical decision-making at the time of a global pandemic. The clinical presentation and patients’ characteristics are usually utilized to identify those patients who need more critical care. However, the clinical evidence shows an unmet need to determine more accurate and optimal clinical biomarkers to triage patients under a condition like the COVID-19 crisis. Here we have presented a machine learning approach to find a group of clinical indicators from the blood tests of a set of COVID-19 patients that are predictive of poor prognosis and morbidity. Our approach consists of two interconnected schemes: Feature Selection and Prognosis Classification. The former is based on different Matrix Factorization (MF)-based methods, and the latter is performed using Random Forest algorithm. Our model reveals that Arterial Blood Gas (ABG) O 2 Saturation and C-Reactive Protein (CRP) are the most important clinical biomarkers determining the poor prognosis in these patients. Our approach paves the path of building quantitative and optimized clinical management systems for COVID-19 and similar diseases.
Collapse
Affiliation(s)
| | | | - Adel Mehrpooya
- School of Mathematical Sciences, Science and Engineering Faculty, Queensland University of Technology (QUT), Brisbane, Australia
| | | | - Kamal Berahmand
- School of Computer Sciences, Science and Engineering Faculty, Queensland University of Technology (QUT), Brisbane Australia
| | | | - Saeed Karami
- Department of Mathematics, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, 45137-66731, Iran
| | - Mohammad Najafzadeh
- Department of Applied Mathematics, Faculty of Sciences and Modern Technologies, Graduate University of Advanced Technology, Kerman, Iran
| | | | - Mina Jamshidi
- Department of Applied Mathematics, Faculty of Sciences and Modern Technologies, Graduate University of Advanced Technology, Kerman, Iran
| | - Farshid Abedi
- Infectious Diseases Research Center, Birjand University of Medical Sciences, Birjand, Iran
| | | | - Elnaz Farbod
- Baruch College, City University of New York, New York, USA
| | - Farinaz Safavi
- Neuroimmunology and Neurovirology Branch, National Institute of Neurological Disorders and Stroke, National Institute of Health, Bethesda, Maryland, USA
| | - Mohammadreza Dorvash
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Viewbank, VIC, Australia
| | | | - Mahdi Eftekhari
- Department of Computer Engineering, University of Kerman, Kerman, Iran
| | - Farid Saberi-Movahed
- Department of Applied Mathematics, Faculty of Sciences and Modern Technologies, Graduate University of Advanced Technology, Kerman, Iran
| | - Iman Tavassoly
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY10029
| |
Collapse
|
15
|
Anoop V, Bipin PR. Super-Resolution Based Automatic Diagnosis of Retinal Disease Detection for Clinical Applications. Neural Process Lett 2020. [DOI: 10.1007/s11063-020-10292-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|