1
|
Liao Q, Liu Q, Razak FA. Hypergraph regularized nonnegative triple decomposition for multiway data analysis. Sci Rep 2024; 14:9098. [PMID: 38643209 PMCID: PMC11032410 DOI: 10.1038/s41598-024-59300-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 04/09/2024] [Indexed: 04/22/2024] Open
Abstract
Tucker decomposition is widely used for image representation, data reconstruction, and machine learning tasks, but the calculation cost for updating the Tucker core is high. Bilevel form of triple decomposition (TriD) overcomes this issue by decomposing the Tucker core into three low-dimensional third-order factor tensors and plays an important role in the dimension reduction of data representation. TriD, on the other hand, is incapable of precisely encoding similarity relationships for tensor data with a complex manifold structure. To address this shortcoming, we take advantage of hypergraph learning and propose a novel hypergraph regularized nonnegative triple decomposition for multiway data analysis that employs the hypergraph to model the complex relationships among the raw data. Furthermore, we develop a multiplicative update algorithm to solve our optimization problem and theoretically prove its convergence. Finally, we perform extensive numerical tests on six real-world datasets, and the results show that our proposed algorithm outperforms some state-of-the-art methods.
Collapse
Affiliation(s)
- Qingshui Liao
- Department of Mathematical Sciences, Faculty of Science & Technology, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia.
- School of Mathematical Sciences, Guizhou Normal University, Guiyang, 550025, People's Republic of China.
| | - Qilong Liu
- School of Mathematical Sciences, Guizhou Normal University, Guiyang, 550025, People's Republic of China
| | - Fatimah Abdul Razak
- Department of Mathematical Sciences, Faculty of Science & Technology, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia.
| |
Collapse
|
2
|
Li Y, Wang R, Fang Y, Sun M, Luo Z. Alternating Direction Method of Multipliers for Convolutive Non-Negative Matrix Factorization. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:7735-7748. [PMID: 36149991 DOI: 10.1109/tcyb.2022.3204723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Non-negative matrix factorization (NMF) has become a popular method for learning interpretable patterns from data. As one of the variants of standard NMF, convolutive NMF (CNMF) incorporates an extra time dimension to each basis, known as convolutive bases, which is well suited for representing sequential patterns. Previously proposed algorithms for solving CNMF use multiplicative updates which can be derived by either heuristic or majorization-minimization (MM) methods. However, these algorithms suffer from problems, such as low convergence rates, difficulty to reach exact zeroes during iterations and prone to poor local optima. Inspired by the success of alternating direction method of multipliers (ADMMs) on solving NMF, we explore variable splitting (i.e., the core idea of ADMM) for CNMF in this article. New closed-form algorithms of CNMF are derived with the commonly used β -divergences as optimization objectives. Experimental results have demonstrated the efficacy of the proposed algorithms on their faster convergence, better optima, and sparser results than state-of-the-art baselines.
Collapse
|
3
|
Yu J, Pan B, Yu S, Leung MF. Robust capped norm dual hyper-graph regularized non-negative matrix tri-factorization. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:12486-12509. [PMID: 37501452 DOI: 10.3934/mbe.2023556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Non-negative matrix factorization (NMF) has been widely used in machine learning and data mining fields. As an extension of NMF, non-negative matrix tri-factorization (NMTF) provides more degrees of freedom than NMF. However, standard NMTF algorithm utilizes Frobenius norm to calculate residual error, which can be dramatically affected by noise and outliers. Moreover, the hidden geometric information in feature manifold and sample manifold is rarely learned. Hence, a novel robust capped norm dual hyper-graph regularized non-negative matrix tri-factorization (RCHNMTF) is proposed. First, a robust capped norm is adopted to handle extreme outliers. Second, dual hyper-graph regularization is considered to exploit intrinsic geometric information in feature manifold and sample manifold. Third, orthogonality constraints are added to learn unique data presentation and improve clustering performance. The experiments on seven datasets testify the robustness and superiority of RCHNMTF.
Collapse
Affiliation(s)
- Jiyang Yu
- College of Electronic and Information Engineering, Southwest University, Chongqing 400715, China
| | - Baicheng Pan
- Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, College of Electronic and Information Engineering, Southwest University, Chongqing 400715, China
| | - Shanshan Yu
- Training and Basic Education Management Office, Southwest University, Chongqing 400715, China
| | - Man-Fai Leung
- School of Computing and Information Science, Faculty of Science and Engineering, Anglia Ruskin University, Cambridge, United Kingdom
| |
Collapse
|
4
|
Wang Y, Guan T, Zhou G, Zhao H, Gao J. SOJNMF: Identifying Multidimensional Molecular Regulatory Modules by Sparse Orthogonality-Regularized Joint Non-Negative Matrix Factorization Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3695-3703. [PMID: 34546925 DOI: 10.1109/tcbb.2021.3114146] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Cancer is not only a very aggressive but also a very diverse disease. Recent advances in high-throughput omics technologies of cancer have enabled biomedical researchers to have more opportunities for studying its multi-level biological regulatory mechanism. However, there are few methods to explore the underlying mechanism of cancer by identifying its multidimensional molecular regulatory modules from the multidimensional omics data of cancer. In this paper, we propose a sparse orthogonality-regularized joint non-negative matrix factorization (SOJNMF) algorithm which can integratively analyze multidimensional omics data. This method can not only identify multidimensional molecular regulatory modules, but reduce the overlap rate of features among the multidimensional modules while ensuring the sparsity of the coefficient matrix after decomposition. Gene expression data, miRNA expression data and gene methylation data of liver cancer are integratively analyzed based on SOJNMF algorithm. Then, we obtain 238 multidimensional molecular regulatory modules. The results of permutation test indicate that different omics features within these modules are significantly correlated in statistics. Meanwhile, the results of functional enrichment analysis show that these multidimensional modules are significantly related to the underlying mechanism of the occurrence and development of liver cancer.
Collapse
|
5
|
Shu Z, Sun Y, Tang J, You C. Adaptive Graph Regularized Deep Semi-nonnegative Matrix Factorization for Data Representation. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-10882-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
6
|
Mirzal A. Statistical Analysis of Microarray Data Clustering using NMF, Spectral Clustering, Kmeans, and GMM. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1173-1192. [PMID: 32956065 DOI: 10.1109/tcbb.2020.3025486] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In unsupervised learning literature, the study of clustering using microarray gene expression datasets has been extensively conducted with nonnegative matrix factorization (NMF), spectral clustering, kmeans, and gaussian mixture model (GMM)are some of the most used methods. However, there is still a limited number of works that utilize statistical analysis to measure the significances of performance differences between these methods. In this paper, statistical analysis of performance differences between ten NMF, six spectral clustering, four GMM, and the standard kmeans algorithms in clustering eleven publicly available microarray gene expression datasets with the number of clusters ranges from two to ten is presented. The experimental results show that statistically NMFs and kmeans have similar performances and outperform spectral clustering. As spectral clustering can be used to uncover hidden manifold structures, the underperformance of spectral methods leads us to question whether the datasets have manifold structures. Visual inspection using multidimensional scaling plots indicates that such structures do not exist. Moreover, as the plots indicate that clusters in some datasets have elliptical boundaries, GMM methods are also utilized. The experimental results show that GMM methods outperform the other methods to some degree, and thus imply that the datasets follow gaussian distributions.
Collapse
|
7
|
Cui Z, Zhao P, Hu Z, Cai X, Zhang W, Chen J. An improved matrix factorization based model for many-objective optimization recommendation. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.07.077] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
8
|
Zhu YL, Yuan SS, Liu JX. Similarity and Dissimilarity Regularized Nonnegative Matrix Factorization for Single-Cell RNA-seq Analysis. Interdiscip Sci 2021; 14:45-54. [PMID: 34231183 DOI: 10.1007/s12539-021-00457-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 06/24/2021] [Accepted: 06/27/2021] [Indexed: 10/20/2022]
Abstract
In traditional sequencing techniques, the different functions of cells and the different roles they play in differentiation are often ignored. With the advancement of single-cell RNA sequencing (scRNA-seq) techniques, scientists can measure the gene expression value at the single-cell level, and it is helping to understand the heterogeneity hidden in cells. One of the most powerful ways to find heterogeneity is using the unsupervised clustering method to get separate subpopulations. In this paper, we propose a novel clustering method Similarity and Dissimilarity Regularized Nonnegative Matrix Factorization (SDCNMF) that simultaneously impose similarity and dissimilarity constraints on low-dimensional representations. SDCNMF both considers the similarity of closer cells and the dissimilarity of cells that are farther away. It can not only keep the similar cells getting closer in low-dimensional space, but also can push the dissimilar cells away from each other. We test the validity of our proposed method on five scRNA-seq datasets. Clustering results show that SDCNMF is better than other comparative methods, and the gene markers we find are also consistent with previous studies. Therefore, we can conclude that SDCNMF is effective in scRNA-seq data analysis. This paper proposes a novel clustering method Similarity and Dissimilarity Regularized Nonnegative Matrix Factorization (SDCNMF) that simultaneously impose similarity and dissimilarity constraints on low-dimensional representations. SDCNMF both considers the similarity of closer cells and the dissimilarity of cells that are farther away. It can not only keep the similar cells getting closer in low-dimensional space, but also can push the dissimilar cells away from each other. Clustering results show that SDCNMF is better than other comparative methods, and the gene markers we find are also consistent with previous studies.
Collapse
Affiliation(s)
- Ya-Li Zhu
- School of Computer Science, Qufu Normal University, Rizhao, China
| | - Sha-Sha Yuan
- School of Computer Science, Qufu Normal University, Rizhao, China.
| | - Jin-Xing Liu
- School of Computer Science, Qufu Normal University, Rizhao, China.,Rizhao Huilian Zhongchuang Institute of Intelligent Technology, Rizhao, 276826, China
| |
Collapse
|
9
|
Tu Z, He X, Zeng L, Meng D, Zhuang R, Zhao J, Dai W. Exploration of Prognostic Biomarkers for Lung Adenocarcinoma Through Bioinformatics Analysis. Front Genet 2021; 12:647521. [PMID: 33968130 PMCID: PMC8100590 DOI: 10.3389/fgene.2021.647521] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Accepted: 03/30/2021] [Indexed: 12/30/2022] Open
Abstract
With the development of computer technology, screening cancer biomarkers based on public databases has become a common research method. Here, an eight-gene prognostic model, which could be used to judge the prognosis of patients with lung adenocarcinoma (LUAD), was developed through bioinformatics methods. This study firstly used several gene datasets from GEO database to mine differentially expressed genes (DEGs) in LUAD tissue and healthy tissue via joint analysis. Later, enrichment analysis for the DEGs was performed, and it was found that the DEGs were mainly activated in pathways involved in extracellular matrix, cell adhesion, and leukocyte migration. Afterward, a TCGA cohort was used to perform univariate Cox, least absolute shrinkage and selection operator method, and multivariate Cox regression analyses for the DEGs, and a prognostic model consisting of eight genes (GPX3, TCN1, ASPM, PCP4, CAV2, S100P, COL1A1, and SPOK2) was established. Receiver operation characteristic (ROC) curve was then used to substantiate the diagnostic efficacy of the prognostic model. The survival significance of signature genes was verified through the GEPIA database, and the results exhibited that the risk coefficients of the eight genes were basically congruous with the effects of these genes on the prognosis in the GEPIA database, which suggested that the results were accurate. Finally, combined with clinical characteristics of patients, the diagnostic independence of the prognostic model was further validated through univariate and multivariate regression, and the results indicated that the model had independent prognostic value. The overall finding of the study manifested that the eight-gene prognostic model is closely related to the prognosis of LUAD patients, and can be used as an independent prognostic indicator. Additionally, the prognostic model in this study can help doctors make a better diagnosis in treatment and ultimately benefit LUAD patients.
Collapse
Affiliation(s)
- Zhengliang Tu
- Department of Thoracic Surgery, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Xiangfeng He
- Department of Thoracic Surgery, Zhuji People's Hospital, Zhuji, China
| | - Liping Zeng
- Department of Thoracic Surgery, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Di Meng
- Department of Thoracic Surgery, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Runzhou Zhuang
- Department of Thoracic Surgery, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Jiangang Zhao
- Department of Thoracic Surgery, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Wanrong Dai
- Department of Pharmacy, The First Affiliated Hospital, College of Medicine, Hangzhou, China
| |
Collapse
|
10
|
Lin Q, Lin Y, Yu Q, Ma X. Clustering of Cancer Attributed Networks via Integration of Graph Embedding and Matrix Factorization. IEEE ACCESS 2020. [DOI: 10.1109/access.2020.3034623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|