101
|
Abstract
Although nonnegative matrix factorization (NMF) favors a sparse and part-based representation of nonnegative data, there is no guarantee for this behavior. Several authors proposed NMF methods which enforce sparseness by constraining or penalizing the [Formula: see text] of the factor matrices. On the other hand, little work has been done using a more natural sparseness measure, the [Formula: see text]. In this paper, we propose a framework for approximate NMF which constrains the [Formula: see text] of the basis matrix, or the coefficient matrix, respectively. For this purpose, techniques for unconstrained NMF can be easily incorporated, such as multiplicative update rules, or the alternating nonnegative least-squares scheme. In experiments we demonstrate the benefits of our methods, which compare to, or outperform existing approaches.
Collapse
|
102
|
CICHOCKI ANDRZEJ, ZDUNEK RAFAL. MULTILAYER NONNEGATIVE MATRIX FACTORIZATION USING PROJECTED GRADIENT APPROACHES. Int J Neural Syst 2011; 17:431-46. [DOI: 10.1142/s0129065707001275] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The most popular algorithms for Nonnegative Matrix Factorization (NMF) belong to a class of multiplicative Lee-Seung algorithms which have usually relative low complexity but are characterized by slow-convergence and the risk of getting stuck to in local minima. In this paper, we present and compare the performance of additive algorithms based on three different variations of a projected gradient approach. Additionally, we discuss a novel multilayer approach to NMF algorithms combined with multi-start initializations procedure, which in general, considerably improves the performance of all the NMF algorithms. We demonstrate that this approach (the multilayer system with projected gradient algorithms) can usually give much better performance than standard multiplicative algorithms, especially, if data are ill-conditioned, badly-scaled, and/or a number of observations is only slightly greater than a number of nonnegative hidden components. Our new implementations of NMF are demonstrated with the simulations performed for Blind Source Separation (BSS) data.
Collapse
Affiliation(s)
- ANDRZEJ CICHOCKI
- Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Institute, Wako-shi, Saitama 351-0198, Japan
| | - RAFAL ZDUNEK
- Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Institute, Wako-shi, Saitama 351-0198, Japan
| |
Collapse
|
103
|
He Z, Xie S, Zdunek R, Zhou G, Cichocki A. Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering. ACTA ACUST UNITED AC 2011; 22:2117-31. [PMID: 22042156 DOI: 10.1109/tnn.2011.2172457] [Citation(s) in RCA: 110] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Nonnegative matrix factorization (NMF) is an unsupervised learning method useful in various applications including image processing and semantic analysis of documents. This paper focuses on symmetric NMF (SNMF), which is a special case of NMF decomposition. Three parallel multiplicative update algorithms using level 3 basic linear algebra subprograms directly are developed for this problem. First, by minimizing the Euclidean distance, a multiplicative update algorithm is proposed, and its convergence under mild conditions is proved. Based on it, we further propose another two fast parallel methods: α-SNMF and β -SNMF algorithms. All of them are easy to implement. These algorithms are applied to probabilistic clustering. We demonstrate their effectiveness for facial image clustering, document categorization, and pattern clustering in gene expression.
Collapse
Affiliation(s)
- Zhaoshui He
- Faculty of Automation, Guangdong University of Technology, Guangzhou 510641, China.
| | | | | | | | | |
Collapse
|
104
|
Large margin based nonnegative matrix factorization and partial least squares regression for face recognition. Pattern Recognit Lett 2011. [DOI: 10.1016/j.patrec.2011.07.015] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
105
|
Gaujoux R, Seoighe C. Semi-supervised Nonnegative Matrix Factorization for gene expression deconvolution: a case study. INFECTION GENETICS AND EVOLUTION 2011; 12:913-21. [PMID: 21930246 DOI: 10.1016/j.meegid.2011.08.014] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2011] [Revised: 08/10/2011] [Accepted: 08/11/2011] [Indexed: 10/17/2022]
Abstract
Heterogeneity in sample composition is an inherent issue in many gene expression studies and, in many cases, should be taken into account in the downstream analysis to enable correct interpretation of the underlying biological processes. Typical examples are infectious diseases or immunology-related studies using blood samples, where, for example, the proportions of lymphocyte sub-populations are expected to vary between cases and controls. Nonnegative Matrix Factorization (NMF) is an unsupervised learning technique that has been applied successfully in several fields, notably in bioinformatics where its ability to extract meaningful information from high-dimensional data such as gene expression microarrays has been demonstrated. Very recently, it has been applied to biomarker discovery and gene expression deconvolution in heterogeneous tissue samples. Being essentially unsupervised, standard NMF methods are not guaranteed to find components corresponding to the cell types of interest in the sample, which may jeopardize the correct estimation of cell proportions. We have investigated the use of prior knowledge, in the form of a set of marker genes, to improve gene expression deconvolution with NMF algorithms. We found that this improves the consistency with which both cell type proportions and cell type gene expression signatures are estimated. The proposed method was tested on a microarray dataset consisting of pure cell types mixed in known proportions. Pearson correlation coefficients between true and estimated cell type proportions improved substantially (typically from about 0.5 to approximately 0.8) with the semi-supervised (marker-guided) versions of commonly used NMF algorithms. Furthermore known marker genes associated with each cell type were assigned to the correct cell type more frequently for the guided versions. We conclude that the use of marker genes improves the accuracy of gene expression deconvolution using NMF and suggest modifications to how the marker gene information is used that may lead to further improvements.
Collapse
Affiliation(s)
- Renaud Gaujoux
- Computational Biology Group, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, South Africa.
| | | |
Collapse
|
106
|
Kong W, Mou X, Hu X. Exploring matrix factorization techniques for significant genes identification of Alzheimer's disease microarray gene expression data. BMC Bioinformatics 2011; 12 Suppl 5:S7. [PMID: 21989140 PMCID: PMC3203370 DOI: 10.1186/1471-2105-12-s5-s7] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The wide use of high-throughput DNA microarray technology provide an increasingly detailed view of human transcriptome from hundreds to thousands of genes. Although biomedical researchers typically design microarray experiments to explore specific biological contexts, the relationships between genes are hard to identified because they are complex and noisy high-dimensional data and are often hindered by low statistical power. The main challenge now is to extract valuable biological information from the colossal amount of data to gain insight into biological processes and the mechanisms of human disease. To overcome the challenge requires mathematical and computational methods that are versatile enough to capture the underlying biological features and simple enough to be applied efficiently to large datasets. METHODS Unsupervised machine learning approaches provide new and efficient analysis of gene expression profiles. In our study, two unsupervised knowledge-based matrix factorization methods, independent component analysis (ICA) and nonnegative matrix factorization (NMF) are integrated to identify significant genes and related pathways in microarray gene expression dataset of Alzheimer's disease. The advantage of these two approaches is they can be performed as a biclustering method by which genes and conditions can be clustered simultaneously. Furthermore, they can group genes into different categories for identifying related diagnostic pathways and regulatory networks. The difference between these two method lies in ICA assume statistical independence of the expression modes, while NMF need positivity constrains to generate localized gene expression profiles. RESULTS In our work, we performed FastICA and non-smooth NMF methods on DNA microarray gene expression data of Alzheimer's disease respectively. The simulation results shows that both of the methods can clearly classify severe AD samples from control samples, and the biological analysis of the identified significant genes and their related pathways demonstrated that these genes play a prominent role in AD and relate the activation patterns to AD phenotypes. It is validated that the combination of these two methods is efficient. CONCLUSIONS Unsupervised matrix factorization methods provide efficient tools to analyze high-throughput microarray dataset. According to the facts that different unsupervised approaches explore correlations in the high-dimensional data space and identify relevant subspace base on different hypotheses, integrating these methods to explore the underlying biological information from microarray dataset is an efficient approach. By combining the significant genes identified by both ICA and NMF, the biological analysis shows great efficient for elucidating the molecular taxonomy of Alzheimer's disease and enable better experimental design to further identify potential pathways and therapeutic targets of AD.
Collapse
Affiliation(s)
- Wei Kong
- Information Engineering College, Shanghai Maritime University, Haigang Ave., Shanghai, 201306, P R China.
| | | | | |
Collapse
|
107
|
Jiang X, Weitz JS, Dushoff J. A non-negative matrix factorization framework for identifying modular patterns in metagenomic profile data. J Math Biol 2011; 64:697-711. [PMID: 21630089 DOI: 10.1007/s00285-011-0428-2] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2010] [Revised: 03/18/2011] [Indexed: 11/26/2022]
Abstract
Metagenomic studies sequence DNA directly from environmental samples to explore the structure and function of complex microbial and viral communities. Individual, short pieces of sequenced DNA ("reads") are classified into (putative) taxonomic or metabolic groups which are analyzed for patterns across samples. Analysis of such read matrices is at the core of using metagenomic data to make inferences about ecosystem structure and function. Non-negative matrix factorization (NMF) is a numerical technique for approximating high-dimensional data points as positive linear combinations of positive components. It is thus well suited to interpretation of observed samples as combinations of different components. We develop, test and apply an NMF-based framework to analyze metagenomic read matrices. In particular, we introduce a method for choosing NMF degree in the presence of overlap, and apply spectral-reordering techniques to NMF-based similarity matrices to aid visualization. We show that our method can robustly identify the appropriate degree and disentangle overlapping contributions using synthetic data sets. We then examine and discuss the NMF decomposition of a metabolic profile matrix extracted from 39 publicly available metagenomic samples, and identify canonical sample types, including one associated with coral ecosystems, one associated with highly saline ecosystems and others. We also identify specific associations between pathways and canonical environments, and explore how alternative choices of decompositions facilitate analysis of read matrices at a finer scale.
Collapse
Affiliation(s)
- Xingpeng Jiang
- Department of Biology, McMaster University, Hamilton, Ontario, Canada
| | | | | |
Collapse
|
108
|
Kim Y, Kim TK, Kim Y, Yoo J, You S, Lee I, Carlson G, Hood L, Choi S, Hwang D. Principal network analysis: identification of subnetworks representing major dynamics using gene expression data. ACTA ACUST UNITED AC 2010; 27:391-8. [PMID: 21193522 DOI: 10.1093/bioinformatics/btq670] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Systems biology attempts to describe complex systems behaviors in terms of dynamic operations of biological networks. However, there is lack of tools that can effectively decode complex network dynamics over multiple conditions. RESULTS We present principal network analysis (PNA) that can automatically capture major dynamic activation patterns over multiple conditions and then generate protein and metabolic subnetworks for the captured patterns. We first demonstrated the utility of this method by applying it to a synthetic dataset. The results showed that PNA correctly captured the subnetworks representing dynamics in the data. We further applied PNA to two time-course gene expression profiles collected from (i) MCF7 cells after treatments of HRG at multiple doses and (ii) brain samples of four strains of mice infected with two prion strains. The resulting subnetworks and their interactions revealed network dynamics associated with HRG dose-dependent regulation of cell proliferation and differentiation and early PrPSc accumulation during prion infection. AVAILABILITY The web-based software is available at: http://sbm.postech.ac.kr/pna.
Collapse
Affiliation(s)
- Yongsoo Kim
- School of Interdisciplinary Bioscience and Bioengineering, POSTECH, Pohang, Republic of Korea
| | | | | | | | | | | | | | | | | | | |
Collapse
|
109
|
Nonnegative matrix factorization with bounded total variational regularization for face recognition. Pattern Recognit Lett 2010. [DOI: 10.1016/j.patrec.2010.08.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
110
|
Nonnegative tensor factorization as an alternative Csiszar–Tusnady procedure: algorithms, convergence, probabilistic interpretations and novel probabilistic tensor latent variable analysis algorithms. Data Min Knowl Discov 2010. [DOI: 10.1007/s10618-010-0196-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
111
|
Gaujoux R, Seoighe C. A flexible R package for nonnegative matrix factorization. BMC Bioinformatics 2010; 11:367. [PMID: 20598126 PMCID: PMC2912887 DOI: 10.1186/1471-2105-11-367] [Citation(s) in RCA: 873] [Impact Index Per Article: 62.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2009] [Accepted: 07/02/2010] [Indexed: 11/23/2022] Open
Abstract
Background Nonnegative Matrix Factorization (NMF) is an unsupervised learning technique that has been applied successfully in several fields, including signal processing, face recognition and text mining. Recent applications of NMF in bioinformatics have demonstrated its ability to extract meaningful information from high-dimensional data such as gene expression microarrays. Developments in NMF theory and applications have resulted in a variety of algorithms and methods. However, most NMF implementations have been on commercial platforms, while those that are freely available typically require programming skills. This limits their use by the wider research community. Results Our objective is to provide the bioinformatics community with an open-source, easy-to-use and unified interface to standard NMF algorithms, as well as with a simple framework to help implement and test new NMF methods. For that purpose, we have developed a package for the R/BioConductor platform. The package ports public code to R, and is structured to enable users to easily modify and/or add algorithms. It includes a number of published NMF algorithms and initialization methods and facilitates the combination of these to produce new NMF strategies. Commonly used benchmark data and visualization methods are provided to help in the comparison and interpretation of the results. Conclusions The NMF package helps realize the potential of Nonnegative Matrix Factorization, especially in bioinformatics, providing easy access to methods that have already yielded new insights in many applications. Documentation, source code and sample data are available from CRAN.
Collapse
Affiliation(s)
- Renaud Gaujoux
- Computational Biology Group, Department of Clinical Laboratory Sciences, Faculty of Health Sciences, University of Cape Town, South Africa
| | | |
Collapse
|
112
|
Han X. Nonnegative principal component analysis for cancer molecular pattern discovery. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2010; 7:537-549. [PMID: 20671323 DOI: 10.1109/tcbb.2009.36] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
As a well-established feature selection algorithm, principal component analysis (PCA) is often combined with the state-of-the-art classification algorithms to identify cancer molecular patterns in microarray data. However, the algorithm's global feature selection mechanism prevents it from effectively capturing the latent data structures in the high-dimensional data. In this study, we investigate the benefit of adding nonnegative constraints on PCA and develop a nonnegative principal component analysis algorithm (NPCA) to overcome the global nature of PCA. A novel classification algorithm NPCA-SVM is proposed for microarray data pattern discovery. We report strong classification results from the NPCA-SVM algorithm on five benchmark microarray data sets by direct comparison with other related algorithms. We have also proved mathematically and interpreted biologically that microarray data will inevitably encounter overfitting for an SVM/PCA-SVM learning machine under a Gaussian kernel. In addition, we demonstrate that nonnegative principal component analysis can be used to capture meaningful biomarkers effectively.
Collapse
Affiliation(s)
- Xiaoxu Han
- Department of Mathematics, Eastern Michgan University, Ypsilanti, MI 48197, USA.
| |
Collapse
|
113
|
Robust object recognition under partial occlusions using NMF. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2010:857453. [PMID: 18509493 PMCID: PMC2396239 DOI: 10.1155/2008/857453] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/02/2007] [Revised: 12/18/2007] [Accepted: 03/10/2008] [Indexed: 12/03/2022]
Abstract
In recent years, nonnegative matrix factorization (NMF) methods of a reduced image data representation
attracted the attention of computer vision community. These methods are considered as a convenient part-based
representation of image data for recognition tasks with occluded objects. A novel modification in NMF
recognition tasks is proposed which utilizes the matrix sparseness control introduced by Hoyer. We have
analyzed the influence of sparseness on recognition rates (RRs) for various dimensions of subspaces generated
for two image databases, ORL face database, and USPS handwritten digit database. We have studied the
behavior of four types of distances between a projected unknown image object and feature vectors in NMF subspaces
generated for training data. One of these metrics also is a novelty we proposed. In the recognition
phase, partial occlusions in the test images have been modeled by putting two randomly large, randomly
positioned black rectangles into each test image.
Collapse
|
114
|
Pattern expression nonnegative matrix factorization: algorithm and applications to blind source separation. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2010:168769. [PMID: 18566689 PMCID: PMC2430033 DOI: 10.1155/2008/168769] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/01/2007] [Accepted: 04/18/2008] [Indexed: 11/17/2022]
Abstract
Independent component analysis (ICA) is a widely applicable and effective approach in blind source separation (BSS), with limitations that sources are statistically independent. However, more common situation is blind source separation for nonnegative linear model (NNLM) where the observations are nonnegative linear combinations of nonnegative sources, and the sources may be statistically dependent. We propose a pattern expression nonnegative matrix factorization (PE-NMF) approach from the view point of using basis vectors most effectively to express patterns. Two regularization or penalty terms are introduced to be added to the original loss function of a standard nonnegative matrix factorization (NMF) for effective expression of patterns with basis vectors in the PE-NMF. Learning algorithm is presented, and the convergence of the algorithm is proved theoretically. Three illustrative examples on blind source separation including heterogeneity correction for gene microarray data indicate that the sources can be successfully recovered with the proposed PE-NMF when the two parameters can be suitably chosen from prior knowledge of the problem.
Collapse
|
115
|
Gene tree labeling using nonnegative matrix factorization on biomedical literature. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2010:276535. [PMID: 18431447 PMCID: PMC2292806 DOI: 10.1155/2008/276535] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/23/2007] [Accepted: 02/04/2008] [Indexed: 11/17/2022]
Abstract
Identifying functional groups of genes is a challenging problem for biological applications. Text mining approaches can be used to build hierarchical clusters or trees from the information in the biological literature. In particular, the nonnegative matrix factorization (NMF) is examined as one approach to label hierarchical trees. A generic labeling algorithm as well as an evaluation technique is proposed, and the effects of different NMF parameters with regard to convergence and labeling accuracy are discussed. The primary goals of this study are to provide a qualitative assessment of the NMF and its various parameters and initialization, to provide an automated way to classify biomedical data, and to provide a method for evaluating labeled data assuming a static input tree. As a byproduct, a method for generating gold standard trees is proposed.
Collapse
|
116
|
Zafeiriou S, Petrou M. Nonlinear non-negative component analysis algorithms. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2010; 19:1050-1066. [PMID: 20028626 DOI: 10.1109/tip.2009.2038816] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
In this paper, general solutions for nonlinear non-negative component analysis for data representation and recognition are proposed. Motivated by a combination of the non-negative matrix factorization (NMF) algorithm and kernel theory, which has lead to a recently proposed NMF algorithm in a polynomial feature space, we propose a general framework where one can build a nonlinear non-negative component analysis method using kernels, the so-called projected gradient kernel non-negative matrix factorization (PGKNMF). In the proposed approach, arbitrary positive definite kernels can be adopted while at the same time it is ensured that the limit point of the procedure is a stationary point of the optimization problem. Moreover, we propose fixed point algorithms for the special case of Gaussian radial basis function (RBF) kernels. We demonstrate the power of the proposed methods in face and facial expression recognition applications.
Collapse
Affiliation(s)
- Stefanos Zafeiriou
- Department of Electrical and Electronic Engineering, Imperial College London, London, UK.
| | | |
Collapse
|
117
|
Bertin N, Badeau R, Vincent E. Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription. ACTA ACUST UNITED AC 2010. [DOI: 10.1109/tasl.2010.2041381] [Citation(s) in RCA: 111] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
118
|
Joshi S, Karthikeyan S, Manjunath B, Grafton S, Kiehl KA. Anatomical Parts-Based Regression Using Non-Negative Matrix Factorization. CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS. IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION. WORKSHOPS 2010:2863-2870. [PMID: 24943130 PMCID: PMC4059066 DOI: 10.1109/cvpr.2010.5540022] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Non-negative matrix factorization (NMF) is an excellent tool for unsupervised parts-based learning, but proves to be ineffective when parts of a whole follow a specific pattern. Analyzing such local changes is particularly important when studying anatomical transformations. We propose a supervised method that incorporates a regression constraint into the NMF framework and learns maximally changing parts in the basis images, called Regression based NMF (RNMF). The algorithm is made robust against outliers by learning the distribution of the input manifold space, where the data resides. One of our main goals is to achieve good region localization. By incorporating a gradient smoothing and independence constraint into the factorized bases, contiguous local regions are captured. We apply our technique to a synthetic dataset and structural MRI brain images of subjects with varying ages. RNMF finds the localized regions which are expected to be highly changing over age to be manifested in its significant basis and it also achieves the best performance compared to other statistical regression and dimensionality reduction techniques.
Collapse
Affiliation(s)
- Swapna Joshi
- Department of Electrical and Computer Engineering, University of California Santa Barbara
| | - S. Karthikeyan
- Department of Electrical and Computer Engineering, University of California Santa Barbara
| | - B.S. Manjunath
- Department of Electrical and Computer Engineering, University of California Santa Barbara
| | - Scott Grafton
- Department of Psychology, University of California Santa Barbara
| | | |
Collapse
|
119
|
Fast nonnegative matrix factorization algorithms using projected gradient approaches for large-scale problems. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2009; 2008:939567. [PMID: 18628948 PMCID: PMC2443642 DOI: 10.1155/2008/939567] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/15/2008] [Revised: 04/18/2008] [Accepted: 05/22/2008] [Indexed: 11/18/2022]
Abstract
Recently, a considerable growth of interest in projected gradient (PG) methods has been observed due to their high efficiency in solving large-scale convex minimization problems subject to linear constraints. Since the minimization problems underlying nonnegative matrix factorization (NMF) of large matrices well matches this class of minimization problems, we investigate and test some recent PG methods in the context of their applicability to NMF. In particular, the paper focuses on the following modified methods: projected Landweber, Barzilai-Borwein gradient projection, projected sequential subspace optimization (PSESOP), interior-point Newton (IPN), and sequential coordinate-wise. The proposed and implemented NMF PG algorithms are compared with respect to their performance in terms of signal-to-interference ratio (SIR) and elapsed time, using a simple benchmark of mixed partially dependent nonnegative signals.
Collapse
|
120
|
Lee SJ, Park KR, Kim J. A comparative study of facial appearance modeling methods for active appearance models. Pattern Recognit Lett 2009. [DOI: 10.1016/j.patrec.2009.05.019] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
121
|
Valdés-Sosa PA, Vega-Hernández M, Sánchez-Bornot JM, Martínez-Montes E, Bobes MA. EEG source imaging with spatio-temporal tomographic nonnegative independent component analysis. Hum Brain Mapp 2009; 30:1898-910. [PMID: 19378278 DOI: 10.1002/hbm.20784] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
This article describes a spatio-temporal EEG/MEG source imaging (ESI) that extracts a parsimonious set of "atoms" or components, each the outer product of both a spatial and a temporal signature. The sources estimated are localized as smooth, minimally overlapping patches of cortical activation that are obtained by constraining spatial signatures to be nonnegative (NN), orthogonal, sparse, and smooth-in effect integrating ESI with NN-ICA. This constitutes a generalization of work by this group on the use of multiple penalties for ESI. A multiplicative update algorithm is derived being stable, fast and converging within seconds near the optimal solution. This procedure, spatio-temporal tomographic NN ICA (STTONNICA), is equally able to recover superficial or deep sources without additional weighting constraints as tested with simulations. STTONNICA analysis of ERPs to familiar and unfamiliar faces yields an occipital-fusiform atom activated by all faces and a more frontal atom that only is active with familiar faces. The temporal signatures are at present unconstrained but can be required to be smooth, complex, or following a multivariate autoregressive model.
Collapse
Affiliation(s)
- Pedro A Valdés-Sosa
- Cuban Neuroscience Center, Neurostatistics Department, Cubanacán, Playa, Havana, Cuba.
| | | | | | | | | |
Collapse
|
122
|
Zafeiriou S. Discriminant nonnegative tensor factorization algorithms. IEEE TRANSACTIONS ON NEURAL NETWORKS 2009; 20:217-35. [PMID: 19150796 DOI: 10.1109/tnn.2008.2005293] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Nonnegative matrix factorization (NMF) has proven to be very successful for image analysis, especially for object representation and recognition. NMF requires the object tensor (with valence more than one) to be vectorized. This procedure may result in information loss since the local object structure is lost due to vectorization. Recently, in order to remedy this disadvantage of NMF methods, nonnegative tensor factorizations (NTF) algorithms that can be applied directly to the tensor representation of object collections have been introduced. In this paper, we propose a series of unsupervised and supervised NTF methods. That is, we extend several NMF methods using arbitrary valence tensors. Moreover, by incorporating discriminant constraints inside the NTF decompositions, we present a series of discriminant NTF methods. The proposed approaches are tested for face verification and facial expression recognition, where it is shown that they outperform other popular subspace approaches.
Collapse
Affiliation(s)
- Stefanos Zafeiriou
- Imperial College London, Department of Electrical and Electronic Engineering, Communications and Signal Processing Research Group, South Kensington Campus, London SW7 2AZ, UK.
| |
Collapse
|
123
|
Hutchins LN, Murphy SM, Singh P, Graber JH. Position-dependent motif characterization using non-negative matrix factorization. ACTA ACUST UNITED AC 2008; 24:2684-90. [PMID: 18852176 PMCID: PMC2639279 DOI: 10.1093/bioinformatics/btn526] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Motivation:Cis-acting regulatory elements are frequently constrained by both sequence content and positioning relative to a functional site, such as a splice or polyadenylation site. We describe an approach to regulatory motif analysis based on non-negative matrix factorization (NMF). Whereas existing pattern recognition algorithms commonly focus primarily on sequence content, our method simultaneously characterizes both positioning and sequence content of putative motifs. Results: Tests on artificially generated sequences show that NMF can faithfully reproduce both positioning and content of test motifs. We show how the variation of the residual sum of squares can be used to give a robust estimate of the number of motifs or patterns in a sequence set. Our analysis distinguishes multiple motifs with significant overlap in sequence content and/or positioning. Finally, we demonstrate the use of the NMF approach through characterization of biologically interesting datasets. Specifically, an analysis of mRNA 3′-processing (cleavage and polyadenylation) sites from a broad range of higher eukaryotes reveals a conserved core pattern of three elements. Contact:joel.graber@jax.org Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lucie N Hutchins
- Center for Genome Dynamics, The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | | | | | | |
Collapse
|
124
|
Abstract
In the last decade, advances in high-throughput technologies such as DNA microarrays have made it possible to simultaneously measure the expression levels of tens of thousands of genes and proteins. This has resulted in large amounts of biological data requiring analysis and interpretation. Nonnegative matrix factorization (NMF) was introduced as an unsupervised, parts-based learning paradigm involving the decomposition of a nonnegative matrix V into two nonnegative matrices, W and H, via a multiplicative updates algorithm. In the context of a pxn gene expression matrix V consisting of observations on p genes from n samples, each column of W defines a metagene, and each column of H represents the metagene expression pattern of the corresponding sample. NMF has been primarily applied in an unsupervised setting in image and natural language processing. More recently, it has been successfully utilized in a variety of applications in computational biology. Examples include molecular pattern discovery, class comparison and prediction, cross-platform and cross-species analysis, functional characterization of genes and biomedical informatics. In this paper, we review this method as a data analytical and interpretive tool in computational biology with an emphasis on these applications.
Collapse
Affiliation(s)
- Karthik Devarajan
- Division of Population Science, Fox Chase Cancer Center, Philadelphia, Pennsylvania, USA.
| |
Collapse
|
125
|
|
126
|
Mejía-Roa E, Carmona-Saez P, Nogales R, Vicente C, Vázquez M, Yang XY, García C, Tirado F, Pascual-Montano A. bioNMF: a web-based tool for nonnegative matrix factorization in biology. Nucleic Acids Res 2008; 36:W523-8. [PMID: 18515346 PMCID: PMC2447803 DOI: 10.1093/nar/gkn335] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
In the last few years, advances in high-throughput technologies are generating large amounts of biological data that require analysis and interpretation. Nonnegative matrix factorization (NMF) has been established as a very effective method to reveal information about the complex latent relationships in experimental data sets. Using this method as part of the exploratory data analysis, workflow would certainly help in the process of interpreting and understanding the complex biology mechanisms that are underlying experimental data. We have developed bioNMF, a web-based tool that implements the NMF methodology in different analysis contexts to support some of the most important reported applications in biology. This online tool provides a user-friendly interface, combined with a computational efficient parallel implementation of the NMF methods to explore the data in different analysis scenarios. In addition to the online access, bioNMF also provides the same functionality included in the website as a public web services interface, enabling users with more computer expertise to launch jobs into bioNMF server from their own scripts and workflows. bioNMF application is freely available at http://bionmf.dacya.ucm.es.
Collapse
Affiliation(s)
- E Mejía-Roa
- Computer Architecture Department, Complutense University, Madrid, Spain
| | | | | | | | | | | | | | | | | |
Collapse
|
127
|
Muñoz-Barrutia A, García-Muñoz J, Ucar B, Fernández-García I, Ortiz-de-Solorzano C. Blind spectral unmixing of M-FISH images by non-negative matrix factorization. ACTA ACUST UNITED AC 2008; 2007:6248-51. [PMID: 18003449 DOI: 10.1109/iembs.2007.4353783] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Multi-color Fluorescent in-Situ Hybridization (M-FISH) selectively stains multiple DNA sequences using fluorescently labeled DNA probes. Proper interpretation of M-FISH images is often hampered by spectral overlap between the detected emissions of the fluorochromes. When using more than two or three fluorochromes, the appropriate combination of wide-band excitation and emission filters reduces cross-talk, but cannot completely eliminate it. A number of approaches -both hardware and software-have been proposed in the last decade to facilitate the interpretation of M-FISH images. The most used and efficient approaches use linear unmixing methods that algorithmically compute and correct for the fluorochrome contributions to each detection channel. In contrast to standard methods that require prior knowledge of the fluorochrome spectra, we present a new method, Non-Negative Matrix Factorization (NMF), that blindly estimates the spectral contributions and corrects for the overlap. Our experimental results show that its performance in terms of residual cross-talk and spot counting reliability outperforms the non-blind state-of-the-art method, the Non-Negative Least Squares (NNLS) algorithm.
Collapse
Affiliation(s)
- A Muñoz-Barrutia
- Oncology Division, Center for Applied Medical Research, University of Navarra, Pamplona, Spain.
| | | | | | | | | |
Collapse
|
128
|
SPACE: an algorithm to predict and quantify alternatively spliced isoforms using microarrays. Genome Biol 2008; 9:R46. [PMID: 18312629 PMCID: PMC2374713 DOI: 10.1186/gb-2008-9-2-r46] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2007] [Revised: 09/19/2007] [Accepted: 02/29/2008] [Indexed: 12/25/2022] Open
Abstract
Exon and exon+junction microarrays are promising tools for studying alternative splicing. Current analytical tools applied to these arrays lack two relevant features: the ability to predict unknown spliced forms and the ability to quantify the concentration of known and unknown isoforms. SPACE is an algorithm that has been developed to (1) estimate the number of different transcripts expressed under several conditions, (2) predict the precursor mRNA splicing structure and (3) quantify the transcript concentrations including unknown forms. The results presented here show its robustness and accuracy for real and simulated data.
Collapse
|
129
|
Lohmann G, Volz KG, Ullsperger M. Using non-negative matrix factorization for single-trial analysis of fMRI data. Neuroimage 2007; 37:1148-60. [PMID: 17662621 DOI: 10.1016/j.neuroimage.2007.05.031] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2006] [Revised: 04/26/2007] [Accepted: 05/04/2007] [Indexed: 10/23/2022] Open
Abstract
The analysis of single trials of an fMRI experiment is difficult because the BOLD response has a poor signal to noise ratio and is sometimes even inconsistent across trials. We propose to use non-negative matrix factorization (NMF) as a new technique for analyzing single trials. NMF yields a matrix decomposition that is useful in this context because it elicits the intrinsic structure of the single-trial data. The results of the NMF analysis are then processed further using clustering techniques. In addition to analyzing single trials in one brain region, the method is also suitable for investigating interdependencies between trials across brain regions. The method even allows to analyze the effect that a trial has on a subsequent trial in a different region at a significant temporal offset. This distinguishes the present method from other methods that require interdependencies between brain regions to occur nearly simultaneously. The method was applied to fMRI data and found to be a viable technique that may be superior to other matrix decomposition methods for this particular problem domain.
Collapse
Affiliation(s)
- Gabriele Lohmann
- Max-Planck-Institute for Human Cognitive and Brain Sciences, Stephanstrasse 1a, D-04103 Leipzig, Germany.
| | | | | |
Collapse
|
130
|
Kim H, Park H. Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 2007; 23:1495-502. [PMID: 17483501 DOI: 10.1093/bioinformatics/btm134] [Citation(s) in RCA: 307] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
MOTIVATION Many practical pattern recognition problems require non-negativity constraints. For example, pixels in digital images and chemical concentrations in bioinformatics are non-negative. Sparse non-negative matrix factorizations (NMFs) are useful when the degree of sparseness in the non-negative basis matrix or the non-negative coefficient matrix in an NMF needs to be controlled in approximating high-dimensional data in a lower dimensional space. RESULTS In this article, we introduce a novel formulation of sparse NMF and show how the new formulation leads to a convergent sparse NMF algorithm via alternating non-negativity-constrained least squares. We apply our sparse NMF algorithm to cancer-class discovery and gene expression data analysis and offer biological analysis of the results obtained. Our experimental results illustrate that the proposed sparse NMF algorithm often achieves better clustering performance with shorter computing time compared to other existing NMF algorithms. AVAILABILITY The software is available as supplementary material.
Collapse
Affiliation(s)
- Hyunsoo Kim
- College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA.
| | | |
Collapse
|
131
|
Non-negative Matrix Factorization with Orthogonality Constraints and its Application to Raman Spectroscopy. ACTA ACUST UNITED AC 2007. [DOI: 10.1007/s11265-006-0039-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
132
|
Pascual-Montano A, Carmona-Saez P, Chagoyen M, Tirado F, Carazo JM, Pascual-Marqui RD. bioNMF: a versatile tool for non-negative matrix factorization in biology. BMC Bioinformatics 2006; 7:366. [PMID: 16875499 PMCID: PMC1550731 DOI: 10.1186/1471-2105-7-366] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2006] [Accepted: 07/28/2006] [Indexed: 12/02/2022] Open
Abstract
Background In the Bioinformatics field, a great deal of interest has been given to Non-negative matrix factorization technique (NMF), due to its capability of providing new insights and relevant information about the complex latent relationships in experimental data sets. This method, and some of its variants, has been successfully applied to gene expression, sequence analysis, functional characterization of genes and text mining. Even if the interest on this technique by the bioinformatics community has been increased during the last few years, there are not many available simple standalone tools to specifically perform these types of data analysis in an integrated environment. Results In this work we propose a versatile and user-friendly tool that implements the NMF methodology in different analysis contexts to support some of the most important reported applications of this new methodology. This includes clustering and biclustering gene expression data, protein sequence analysis, text mining of biomedical literature and sample classification using gene expression. The tool, which is named bioNMF, also contains a user-friendly graphical interface to explore results in an interactive manner and facilitate in this way the exploratory data analysis process. Conclusion bioNMF is a standalone versatile application which does not require any special installation or libraries. It can be used for most of the multiple applications proposed in the bioinformatics field or to support new research using this method. This tool is publicly available at .
Collapse
Affiliation(s)
- Alberto Pascual-Montano
- Computer Architecture Department, Facultad de Ciencias Físicas, Universidad Complutense de Madrid, 28040, Spain
| | - Pedro Carmona-Saez
- BioComputing Unit, National Center of Biotechnology, Campus Universidad Autónoma de Madrid, 28049, Spain
| | - Monica Chagoyen
- Computer Architecture Department, Facultad de Ciencias Físicas, Universidad Complutense de Madrid, 28040, Spain
- BioComputing Unit, National Center of Biotechnology, Campus Universidad Autónoma de Madrid, 28049, Spain
| | - Francisco Tirado
- Computer Architecture Department, Facultad de Ciencias Físicas, Universidad Complutense de Madrid, 28040, Spain
| | - Jose M Carazo
- BioComputing Unit, National Center of Biotechnology, Campus Universidad Autónoma de Madrid, 28049, Spain
| | - Roberto D Pascual-Marqui
- The KEY Institute for Brain-Mind Research, University Hospital of Psychiatry. Lenggstr. 31, CH-8029 Zurich, Switzerland
| |
Collapse
|
133
|
Carmona-Saez P, Pascual-Marqui RD, Tirado F, Carazo JM, Pascual-Montano A. Biclustering of gene expression data by Non-smooth Non-negative Matrix Factorization. BMC Bioinformatics 2006; 7:78. [PMID: 16503973 PMCID: PMC1434777 DOI: 10.1186/1471-2105-7-78] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2005] [Accepted: 02/17/2006] [Indexed: 12/01/2022] Open
Abstract
Background The extended use of microarray technologies has enabled the generation and accumulation of gene expression datasets that contain expression levels of thousands of genes across tens or hundreds of different experimental conditions. One of the major challenges in the analysis of such datasets is to discover local structures composed by sets of genes that show coherent expression patterns across subsets of experimental conditions. These patterns may provide clues about the main biological processes associated to different physiological states. Results In this work we present a methodology able to cluster genes and conditions highly related in sub-portions of the data. Our approach is based on a new data mining technique, Non-smooth Non-Negative Matrix Factorization (nsNMF), able to identify localized patterns in large datasets. We assessed the potential of this methodology analyzing several synthetic datasets as well as two large and heterogeneous sets of gene expression profiles. In all cases the method was able to identify localized features related to sets of genes that show consistent expression patterns across subsets of experimental conditions. The uncovered structures showed a clear biological meaning in terms of relationships among functional annotations of genes and the phenotypes or physiological states of the associated conditions. Conclusion The proposed approach can be a useful tool to analyze large and heterogeneous gene expression datasets. The method is able to identify complex relationships among genes and conditions that are difficult to identify by standard clustering algorithms.
Collapse
Affiliation(s)
- Pedro Carmona-Saez
- BioComputing Unit. National Center of Biotechnology. Campus Universidad Autónoma de Madrid. 28049. Spain
| | - Roberto D Pascual-Marqui
- The KEY Institute for Brain-Mind Research, University Hospital of Psychiatry. Lenggstr. 31, CH-8029 Zurich, Switzerland
| | - F Tirado
- Computer Architecture Department. Facultad de Ciencias Físicas. Universidad Complutense de Madrid. 28040. Spain
| | - Jose M Carazo
- BioComputing Unit. National Center of Biotechnology. Campus Universidad Autónoma de Madrid. 28049. Spain
| | - Alberto Pascual-Montano
- Computer Architecture Department. Facultad de Ciencias Físicas. Universidad Complutense de Madrid. 28040. Spain
| |
Collapse
|
134
|
Chagoyen M, Carmona-Saez P, Shatkay H, Carazo JM, Pascual-Montano A. Discovering semantic features in the literature: a foundation for building functional associations. BMC Bioinformatics 2006; 7:41. [PMID: 16438716 PMCID: PMC1386711 DOI: 10.1186/1471-2105-7-41] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2005] [Accepted: 01/26/2006] [Indexed: 11/10/2022] Open
Abstract
Background Experimental techniques such as DNA microarray, serial analysis of gene expression (SAGE) and mass spectrometry proteomics, among others, are generating large amounts of data related to genes and proteins at different levels. As in any other experimental approach, it is necessary to analyze these data in the context of previously known information about the biological entities under study. The literature is a particularly valuable source of information for experiment validation and interpretation. Therefore, the development of automated text mining tools to assist in such interpretation is one of the main challenges in current bioinformatics research. Results We present a method to create literature profiles for large sets of genes or proteins based on common semantic features extracted from a corpus of relevant documents. These profiles can be used to establish pair-wise similarities among genes, utilized in gene/protein classification or can be even combined with experimental measurements. Semantic features can be used by researchers to facilitate the understanding of the commonalities indicated by experimental results. Our approach is based on non-negative matrix factorization (NMF), a machine-learning algorithm for data analysis, capable of identifying local patterns that characterize a subset of the data. The literature is thus used to establish putative relationships among subsets of genes or proteins and to provide coherent justification for this clustering into subsets. We demonstrate the utility of the method by applying it to two independent and vastly different sets of genes. Conclusion The presented method can create literature profiles from documents relevant to sets of genes. The representation of genes as additive linear combinations of semantic features allows for the exploration of functional associations as well as for clustering, suggesting a valuable methodology for the validation and interpretation of high-throughput experimental data.
Collapse
Affiliation(s)
- Monica Chagoyen
- Biocomputing Unit, Centro Nacional de Biotecnologia – CSIC, Madrid, Spain
| | - Pedro Carmona-Saez
- Biocomputing Unit, Centro Nacional de Biotecnologia – CSIC, Madrid, Spain
| | - Hagit Shatkay
- School of Computing, Queen's University, Kingston, Ontario, Canada
| | - Jose M Carazo
- Biocomputing Unit, Centro Nacional de Biotecnologia – CSIC, Madrid, Spain
| | | |
Collapse
|