1
|
Mbebi AJ, Nikoloski Z. Gene regulatory network inference using mixed-norms regularized multivariate model with covariance selection. PLoS Comput Biol 2023; 19:e1010832. [PMID: 37523414 PMCID: PMC10414675 DOI: 10.1371/journal.pcbi.1010832] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 08/10/2023] [Accepted: 07/11/2023] [Indexed: 08/02/2023] Open
Abstract
Despite extensive research efforts, reconstruction of gene regulatory networks (GRNs) from transcriptomics data remains a pressing challenge in systems biology. While non-linear approaches for reconstruction of GRNs show improved performance over simpler alternatives, we do not yet have understanding if joint modelling of multiple target genes may improve performance, even under linearity assumptions. To address this problem, we propose two novel approaches that cast the GRN reconstruction problem as a blend between regularized multivariate regression and graphical models that combine the L2,1-norm with classical regularization techniques. We used data and networks from the DREAM5 challenge to show that the proposed models provide consistently good performance in comparison to contenders whose performance varies with data sets from simulation and experiments from model unicellular organisms Escherichia coli and Saccharomyces cerevisiae. Since the models' formulation facilitates the prediction of master regulators, we also used the resulting findings to identify master regulators over all data sets as well as their plasticity across different environments. Our results demonstrate that the identified master regulators are in line with experimental evidence from the model bacterium E. coli. Together, our study demonstrates that simultaneous modelling of several target genes results in improved inference of GRNs and can be used as an alternative in different applications.
Collapse
Affiliation(s)
- Alain J. Mbebi
- Bioinformatics Department, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, Germany
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, Germany
| | - Zoran Nikoloski
- Bioinformatics Department, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, Germany
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, Germany
| |
Collapse
|
2
|
Gao Z, Wang YT, Wu QW, Li L, Ni JC, Zheng CH. A New Method Based on Matrix Completion and Non-Negative Matrix Factorization for Predicting Disease-Associated miRNAs. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:763-772. [PMID: 32991287 DOI: 10.1109/tcbb.2020.3027444] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Numerous studies have shown that microRNAs are associated with the occurrence and development of human diseases. Thus, studying disease-associated miRNAs is significantly valuable to the prevention, diagnosis and treatment of diseases. In this paper, we proposed a novel method based on matrix completion and non-negative matrix factorization (MCNMF)for predicting disease-associated miRNAs. Due to the information inadequacy on miRNA similarities and disease similarities, we calculated the latter via two models, and introduced the Gaussian interaction profile kernel similarity. In addition, the matrix completion (MC)was employed to further replenish the miRNA and disease similarities to improve the prediction performance. And to reduce the sparsity of miRNA-disease association matrix, the method of weighted K nearest neighbor (WKNKN)was used, which is a pre-processing step. We also utilized non-negative matrix factorization (NMF)using dual L2,1-norm, graph Laplacian regularization, and Tikhonov regularization to effectively avoid the overfitting during the prediction. Finally, several experiments and a case study were implemented to evaluate the effectiveness and performance of the proposed MCNMF model. The results indicated that our method could reliably and effectively predict disease-associated miRNAs.
Collapse
|
3
|
Nasser R, Eldar YC, Sharan R. Deep Unfolding for Non-Negative Matrix Factorization with Application to Mutational Signature Analysis. J Comput Biol 2022; 29:45-55. [PMID: 34986029 DOI: 10.1089/cmb.2021.0438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Non-negative matrix factorization (NMF) is a fundamental matrix decomposition technique that is used primarily for dimensionality reduction and is increasing in popularity in the biological domain. Although finding a unique NMF is generally not possible, there are various iterative algorithms for NMF optimization that converge to locally optimal solutions. Such techniques can also serve as a starting point for deep learning methods that unroll the algorithmic iterations into layers of a deep network. In this study, we develop unfolded deep networks for NMF and several regularized variants in both a supervised and an unsupervised setting. We apply our method to various mutation data sets to reconstruct their underlying mutational signatures and their exposures. We demonstrate the increased accuracy of our approach over standard formulations in analyzing simulated and real mutation data.
Collapse
Affiliation(s)
- Rami Nasser
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Yonina C Eldar
- Department of Math and Computer Science, Weizmann Institute of Science, Rehovot, Israel
| | - Roded Sharan
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
4
|
Zheng X, Zhang C. Gene selection for microarray data classification via dual latent representation learning. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.07.047] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
5
|
Gene Correlation Guided Gene Selection for Microarray Data Classification. BIOMED RESEARCH INTERNATIONAL 2021; 2021:6490118. [PMID: 34435048 PMCID: PMC8382518 DOI: 10.1155/2021/6490118] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 08/09/2021] [Indexed: 12/14/2022]
Abstract
The microarray cancer data obtained by DNA microarray technology play an important role for cancer prevention, diagnosis, and treatment. However, predicting the different types of tumors is a challenging task since the sample size in microarray data is often small but the dimensionality is very high. Gene selection, which is an effective means, is aimed at mitigating the curse of dimensionality problem and can boost the classification accuracy of microarray data. However, many of previous gene selection methods focus on model design, but neglect the correlation between different genes. In this paper, we introduce a novel unsupervised gene selection method by taking the gene correlation into consideration, named gene correlation guided gene selection (G3CS). Specifically, we calculate the covariance of different gene dimension pairs and embed it into our unsupervised gene selection model to regularize the gene selection coefficient matrix. In such a manner, redundant genes can be effectively excluded. In addition, we utilize a matrix factorization term to exploit the cluster structure of original microarray data to assist the learning process. We design an iterative updating algorithm with convergence guarantee to solve the resultant optimization problem. Experimental results on six publicly available microarray datasets are conducted to validate the efficacy of our proposed method.
Collapse
|
6
|
Wang CY, Gao YL, Liu JX, Dai LY, Shang J. Sparse robust graph-regularized non-negative matrix factorization based on correntropy. J Bioinform Comput Biol 2021; 19:2050047. [PMID: 33410727 DOI: 10.1142/s021972002050047x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Non-negative Matrix Factorization (NMF) is a popular data dimension reduction method in recent years. The traditional NMF method has high sensitivity to data noise. In the paper, we propose a model called Sparse Robust Graph-regularized Non-negative Matrix Factorization based on Correntropy (SGNMFC). The maximized correntropy replaces the traditional minimized Euclidean distance to improve the robustness of the algorithm. Through the kernel function, correntropy can give less weight to outliers and noise in data but give greater weight to meaningful data. Meanwhile, the geometry structure of the high-dimensional data is completely preserved in the low-dimensional manifold through the graph regularization. Feature selection and sample clustering are commonly used methods for analyzing genes. Sparse constraints are applied to the loss function to reduce matrix complexity and analysis difficulty. Comparing the other five similar methods, the effectiveness of the SGNMFC model is proved by selection of differentially expressed genes and sample clustering experiments in three The Cancer Genome Atlas (TCGA) datasets.
Collapse
Affiliation(s)
- Chuan-Yuan Wang
- School of Computer Science, Qufu Normal University, Rizhao, Shandong, P. R. China
| | - Ying-Lian Gao
- Qufu Normal University Library, Qufu Normal University, Rizhao, Shandong, P. R. China
| | - Jin-Xing Liu
- School of Computer Science, Qufu Normal University, Rizhao, Shandong, P. R. China
| | - Ling-Yun Dai
- School of Computer Science, Qufu Normal University, Rizhao, Shandong, P. R. China
| | - Junliang Shang
- School of Computer Science, Qufu Normal University, Rizhao, Shandong, P. R. China
| |
Collapse
|
7
|
Jiao CN, Gao YL, Yu N, Liu JX, Qi LY. Hyper-Graph Regularized Constrained NMF for Selecting Differentially Expressed Genes and Tumor Classification. IEEE J Biomed Health Inform 2020; 24:3002-3011. [PMID: 32086224 DOI: 10.1109/jbhi.2020.2975199] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Non-negative Matrix Factorization (NMF) is a dimensionality reduction approach for learning a parts-based and linear representation of non-negative data. It has attracted more attention because of that. In practice, NMF not only neglects the manifold structure of data samples, but also overlooks the priori label information of different classes. In this paper, a novel matrix decomposition method called Hyper-graph regularized Constrained Non-negative Matrix Factorization (HCNMF) is proposed for selecting differentially expressed genes and tumor sample classification. The advantage of hyper-graph learning is to capture local spatial information in high dimensional data. This method incorporates a hyper-graph regularization constraint to consider the higher order data sample relationships. The application of hyper-graph theory can effectively find pathogenic genes in cancer datasets. Besides, the label information is further incorporated in the objective function to improve the discriminative ability of the decomposition matrix. Supervised learning with label information greatly improves the classification effect. We also provide the iterative update rules and convergence proofs for the optimization problems of HCNMF. Experiments under The Cancer Genome Atlas (TCGA) datasets confirm the superiority of HCNMF algorithm compared with other representative algorithms through a set of evaluations.
Collapse
|
8
|
Gene selection for microarray data classification via adaptive hypergraph embedded dictionary learning. Gene 2019; 706:188-200. [DOI: 10.1016/j.gene.2019.04.060] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2018] [Revised: 04/03/2019] [Accepted: 04/22/2019] [Indexed: 01/19/2023]
|
9
|
Liu JX, Wang D, Gao YL, Zheng CH, Xu Y, Yu J. Regularized Non-Negative Matrix Factorization for Identifying Differentially Expressed Genes and Clustering Samples: A Survey. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:974-987. [PMID: 28186906 DOI: 10.1109/tcbb.2017.2665557] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Non-negative Matrix Factorization (NMF), a classical method for dimensionality reduction, has been applied in many fields. It is based on the idea that negative numbers are physically meaningless in various data-processing tasks. Apart from its contribution to conventional data analysis, the recent overwhelming interest in NMF is due to its newly discovered ability to solve challenging data mining and machine learning problems, especially in relation to gene expression data. This survey paper mainly focuses on research examining the application of NMF to identify differentially expressed genes and to cluster samples, and the main NMF models, properties, principles, and algorithms with its various generalizations, extensions, and modifications are summarized. The experimental results demonstrate the performance of the various NMF algorithms in identifying differentially expressed genes and clustering samples.
Collapse
|
10
|
Li X, Wong KC. A Comparative Study for Identifying the Chromosome-Wide Spatial Clusters from High-Throughput Chromatin Conformation Capture Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:774-787. [PMID: 28333638 DOI: 10.1109/tcbb.2017.2684800] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In the past years, the high-throughput sequencing technologies have enabled massive insights into genomic annotations. In contrast, the full-scale three-dimensional arrangements of genomic regions are relatively unknown. Thanks to the recent breakthroughs in High-throughput Chromosome Conformation Capture (Hi-C) techniques, non-negative matrix factorization (NMF) has been adopted to identify local spatial clusters of genomic regions from Hi-C data. However, such non-negative matrix factorization entails a high-dimensional non-convex objective function to be optimized with non-negative constraints. We propose and compare more than ten optimization algorithms to improve the identification of local spatial clusters via NMF. To circumvent and optimize the high-dimensional, non-convex, and constrained objective function, we draw inspiration from the nature to perform in silico evolution. The proposed algorithms consist of a population of candidates to be evolved while the NMF acts as local search during the evolutions. The population based optimization algorithm coordinates and guides the non-negative matrix factorization toward global optima. Experimental results show that the proposed algorithms can improve the quality of non-negative matrix factorization over the recent state-of-the-arts. The effectiveness and robustness of the proposed algorithms are supported by comprehensive performance benchmarking on chromosome-wide Hi-C contact maps of yeast and human. In addition, time complexity analysis, convergence analysis, parameter analysis, biological case studies, and gene ontology similarity analysis are conducted to demonstrate the robustness of the proposed methods from different perspectives.
Collapse
|
11
|
Gene selection for microarray data classification via subspace learning and manifold regularization. Med Biol Eng Comput 2017; 56:1271-1284. [PMID: 29256006 DOI: 10.1007/s11517-017-1751-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Accepted: 11/03/2017] [Indexed: 10/18/2022]
Abstract
With the rapid development of DNA microarray technology, large amount of genomic data has been generated. Classification of these microarray data is a challenge task since gene expression data are often with thousands of genes but a small number of samples. In this paper, an effective gene selection method is proposed to select the best subset of genes for microarray data with the irrelevant and redundant genes removed. Compared with original data, the selected gene subset can benefit the classification task. We formulate the gene selection task as a manifold regularized subspace learning problem. In detail, a projection matrix is used to project the original high dimensional microarray data into a lower dimensional subspace, with the constraint that the original genes can be well represented by the selected genes. Meanwhile, the local manifold structure of original data is preserved by a Laplacian graph regularization term on the low-dimensional data space. The projection matrix can serve as an importance indicator of different genes. An iterative update algorithm is developed for solving the problem. Experimental results on six publicly available microarray datasets and one clinical dataset demonstrate that the proposed method performs better when compared with other state-of-the-art methods in terms of microarray data classification. Graphical Abstract The graphical abstract of this work.
Collapse
|
12
|
Liu JX, Wang DQ, Zheng CH, Gao YL, Wu SS, Shang JL. Identifying drug-pathway association pairs based on L 2,1-integrative penalized matrix decomposition. BMC SYSTEMS BIOLOGY 2017; 11:119. [PMID: 29297378 PMCID: PMC5770056 DOI: 10.1186/s12918-017-0480-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
BACKGROUND Traditional drug identification methods follow the "one drug-one target" thought. But those methods ignore the natural characters of human diseases. To overcome this limitation, many identification methods of drug-pathway association pairs have been developed, such as the integrative penalized matrix decomposition (iPaD) method. The iPaD method imposes the L1-norm penalty on the regularization term. However, lasso-type penalties have an obvious disadvantage, that is, the sparsity produced by them is too dispersive. RESULTS Therefore, to improve the performance of the iPaD method, we propose a novel method named L2,1-iPaD to identify paired drug-pathway associations. In the L2,1-iPaD model, we use the L2,1-norm penalty to replace the L1-norm penalty since the L2,1-norm penalty can produce row sparsity. CONCLUSIONS By applying the L2,1-iPaD method to the CCLE and NCI-60 datasets, we demonstrate that the performance of L2,1-iPaD method is superior to existing methods. And the proposed method can achieve better enrichment in terms of discovering validated drug-pathway association pairs than the iPaD method by performing permutation test. The results on the two real datasets prove that our method is effective.
Collapse
Affiliation(s)
- Jin-Xing Liu
- School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| | - Dong-Qin Wang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| | - Chun-Hou Zheng
- School of Information Science and Engineering, Qufu Normal University, Rizhao, China.
| | - Ying-Lian Gao
- Library of Qufu Normal University, Qufu Normal University, Rizhao, China.
| | - Sha-Sha Wu
- School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| | - Jun-Liang Shang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| |
Collapse
|