1
|
Roy K, Simon C, Moghadam P, Harandi M. Subspace distillation for continual learning. Neural Netw 2023; 167:65-79. [PMID: 37625243 DOI: 10.1016/j.neunet.2023.07.047] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 05/31/2023] [Accepted: 07/28/2023] [Indexed: 08/27/2023]
Abstract
An ultimate objective in continual learning is to preserve knowledge learned in preceding tasks while learning new tasks. To mitigate forgetting prior knowledge, we propose a novel knowledge distillation technique that takes into the account the manifold structure of the latent/output space of a neural network in learning novel tasks. To achieve this, we propose to approximate the data manifold up-to its first order, hence benefiting from linear subspaces to model the structure and maintain the knowledge of a neural network while learning novel concepts. We demonstrate that the modeling with subspaces provides several intriguing properties, including robustness to noise and therefore effective for mitigating Catastrophic Forgetting in continual learning. We also discuss and show how our proposed method can be adopted to address both classification and segmentation problems. Empirically, we observe that our proposed method outperforms various continual learning methods on several challenging datasets including Pascal VOC, and Tiny-Imagenet. Furthermore, we show how the proposed method can be seamlessly combined with existing learning approaches to improve their performances. The codes of this article will be available at https://github.com/csiro-robotics/SDCL.
Collapse
Affiliation(s)
- Kaushik Roy
- Monash University, Melbourne, VIC, Australia; CSIRO, Data61, Brisbane, QLD, Australia.
| | - Christian Simon
- Monash University, Melbourne, VIC, Australia; Australian national University, Canberra, ACT, Australia; CSIRO, Data61, Brisbane, QLD, Australia.
| | - Peyman Moghadam
- CSIRO, Data61, Brisbane, QLD, Australia; Queensland University of Technology, Brisbane, QLD, Australia.
| | - Mehrtash Harandi
- Monash University, Melbourne, VIC, Australia; CSIRO, Data61, Brisbane, QLD, Australia.
| |
Collapse
|
2
|
Batalo B, Souza LS, Gatto BB, Sogi N, Fukui K. Temporal-stochastic tensor features for action recognition. MACHINE LEARNING WITH APPLICATIONS 2022. [DOI: 10.1016/j.mlwa.2022.100407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
|
3
|
Wang Y, Zhang J. A weighted sparse coding model on product Grassmann manifold for video-based human gesture recognition. PeerJ Comput Sci 2022; 8:e923. [PMID: 35494827 PMCID: PMC9044265 DOI: 10.7717/peerj-cs.923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Accepted: 02/18/2022] [Indexed: 06/14/2023]
Abstract
It is a challenging problem to classify multi-dimensional data with complex intrinsic geometry inherent, such as human gesture recognition based on videos. In particular, manifold structure is a good way to characterize intrinsic geometry of multi-dimensional data. The recently proposed sparse coding on Grassmann manifold shows high discriminative power in many visual classification tasks. It represents videos on Grassmann manifold using Singular Value Decomposition (SVD) of the data matrix by vectorizing each image in videos, while vectorization destroys the spatial structure of videos. To keep the spatial structure of videos, they can be represented as the form of data tensor. In this paper, we firstly represent human gesture videos on product Grassmann manifold (PGM) by Higher Order Singular Value Decomposition (HOSVD) of data tensor. Each factor manifold characterizes features of human gesture video from different perspectives and can be understood as appearance, horizontal motion and vertical motion of human gesture video respectively. We then propose a weighted sparse coding model on PGM, where weights can be understood as modeling the importance of factor manifolds. Furthermore, we propose an optimization algorithm for learning coding coefficients by embedding each factor Grassmann manifold into symmetric matrices space. Finally, we give a classification algorithm, and experimental results on three public datasets show that our method is competitive to some relevant excellent methods.
Collapse
Affiliation(s)
- Yuping Wang
- School of Statistics, Capital University of Economics and Business, Beijing, China
| | - Junfei Zhang
- School of Statistics and Mathematics, Central University of Finance and Economics, Beijing, China
| |
Collapse
|
4
|
Koniusz P, Zhang H. Power Normalizations in Fine-Grained Image, Few-Shot Image and Graph Classification. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:591-609. [PMID: 34428137 DOI: 10.1109/tpami.2021.3107164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Power Normalizations (PN) are useful non-linear operators which tackle feature imbalances in classification problems. We study PNs in the deep learning setup via a novel PN layer pooling feature maps. Our layer combines the feature vectors and their respective spatial locations in the feature maps produced by the last convolutional layer of CNN into a positive definite matrix with second-order statistics to which PN operators are applied, forming so-called Second-order Pooling (SOP). As the main goal of this paper is to study Power Normalizations, we investigate the role and meaning of MaxExp and Gamma, two popular PN functions. To this end, we provide probabilistic interpretations of such element-wise operators and discover surrogates with well-behaved derivatives for end-to-end training. Furthermore, we look at the spectral applicability of MaxExp and Gamma by studying Spectral Power Normalizations (SPN). We show that SPN on the autocorrelation/covariance matrix and the Heat Diffusion Process (HDP) on a graph Laplacian matrix are closely related, thus sharing their properties. Such a finding leads us to the culmination of our work, a fast spectral MaxExp which is a variant of HDP for covariances/autocorrelation matrices. We evaluate our ideas on fine-grained recognition, scene recognition, and material classification, as well as in few-shot learning and graph classification.
Collapse
|
5
|
|
6
|
Tanfous AB, Drira H, Amor BB. Sparse Coding of Shape Trajectories for Facial Expression and Action Recognition. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:2594-2607. [PMID: 31395537 DOI: 10.1109/tpami.2019.2932979] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The detection and tracking of human landmarks in video streams has gained in reliability partly due to the availability of affordable RGB-D sensors. The analysis of such time-varying geometric data is playing an important role in the automatic human behavior understanding. However, suitable shape representations as well as their temporal evolution, termed trajectories, often lie to nonlinear manifolds. This puts an additional constraint (i.e., nonlinearity) in using conventional Machine Learning techniques. As a solution, this paper accommodates the well-known Sparse Coding and Dictionary Learning approach to study time-varying shapes on the Kendall shape spaces of 2D and 3D landmarks. We illustrate effective coding of 3D skeletal sequences for action recognition and 2D facial landmark sequences for macro- and micro-expression recognition. To overcome the inherent nonlinearity of the shape spaces, intrinsic and extrinsic solutions were explored. As main results, shape trajectories give rise to more discriminative time-series with suitable computational properties, including sparsity and vector space structure. Extensive experiments conducted on commonly-used datasets demonstrate the competitiveness of the proposed approaches with respect to state-of-the-art.
Collapse
|
7
|
Liu D, Liu L, Tie Y, Qi L. Multi-task image set classification via joint representation with class-level sparsity and intra-task low-rankness. Pattern Recognit Lett 2020. [DOI: 10.1016/j.patrec.2018.11.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
8
|
Zhang X, Shi X, Sun Y, Cheng L. Multivariate Regression with Gross Errors on Manifold-Valued Data. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2019; 41:444-458. [PMID: 29993419 DOI: 10.1109/tpami.2017.2776260] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We consider the topic of multivariate regression on manifold-valued output, that is, for a multivariate observation, its output response lies on a manifold. Moreover, we propose a new regression model to deal with the presence of grossly corrupted manifold-valued responses, a bottleneck issue commonly encountered in practical scenarios. Our model first takes a correction step on the grossly corrupted responses via geodesic curves on the manifold, then performs multivariate linear regression on the corrected data. This results in a nonconvex and nonsmooth optimization problem on Riemannian manifolds. To this end, we propose a dedicated approach named PALMR, by utilizing and extending the proximal alternating linearized minimization techniques for optimization problems on euclidean spaces. Theoretically, we investigate its convergence property, where it is shown to converge to a critical point under mild conditions. Empirically, we test our model on both synthetic and real diffusion tensor imaging data, and show that our model outperforms other multivariate regression models when manifold-valued responses contain gross errors, and is effective in identifying gross errors.
Collapse
|
9
|
Faraki M, Harandi MT, Porikli F. A Comprehensive Look at Coding Techniques on Riemannian Manifolds. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:5701-5712. [PMID: 29994290 DOI: 10.1109/tnnls.2018.2812799] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Core to many learning pipelines is visual recognition such as image and video classification. In such applications, having a compact yet rich and informative representation plays a pivotal role. An underlying assumption in traditional coding schemes [e.g., sparse coding (SC)] is that the data geometrically comply with the Euclidean space. In other words, the data are presented to the algorithm in vector form and Euclidean axioms are fulfilled. This is of course restrictive in machine learning, computer vision, and signal processing, as shown by a large number of recent studies. This paper takes a further step and provides a comprehensive mathematical framework to perform coding in curved and non-Euclidean spaces, i.e., Riemannian manifolds. To this end, we start by the simplest form of coding, namely, bag of words. Then, inspired by the success of vector of locally aggregated descriptors in addressing computer vision problems, we will introduce its Riemannian extensions. Finally, we study Riemannian form of SC, locality-constrained linear coding, and collaborative coding. Through rigorous tests, we demonstrate the superior performance of our Riemannian coding schemes against the state-of-the-art methods on several visual classification tasks, including head pose classification, video-based face recognition, and dynamic scene recognition.
Collapse
|
10
|
Pan H, Jing Z, Qiao L, Li M. Discriminative Structured Dictionary Learning on Grassmann Manifolds and Its Application on Image Restoration. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:2875-2886. [PMID: 28952956 DOI: 10.1109/tcyb.2017.2751585] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Image restoration is a difficult and challenging problem in various imaging applications. However, despite of the benefits of a single overcomplete dictionary, there are still several challenges for capturing the geometric structure of image of interest. To more accurately represent the local structures of the underlying signals, we propose a new problem formulation for sparse representation with block-orthogonal constraint. There are three contributions. First, a framework for discriminative structured dictionary learning is proposed, which leads to a smooth manifold structure and quotient search spaces. Second, an alternating minimization scheme is proposed after taking both the cost function and the constraints into account. This is achieved by iteratively alternating between updating the block structure of the dictionary defined on Grassmann manifold and sparsifying the dictionary atoms automatically. Third, Riemannian conjugate gradient is considered to track local subspaces efficiently with a convergence guarantee. Extensive experiments on various datasets demonstrate that the proposed method outperforms the state-of-the-art methods on the removal of mixed Gaussian-impulse noise.
Collapse
|
11
|
Chen H, Sun Y, Gao J, Hu Y, Yin B. Fast optimization algorithm on Riemannian manifolds and its application in low-rank learning. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.02.058] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
12
|
Yger F, Berar M, Lotte F. Riemannian Approaches in Brain-Computer Interfaces: A Review. IEEE Trans Neural Syst Rehabil Eng 2017; 25:1753-1762. [DOI: 10.1109/tnsre.2016.2627016] [Citation(s) in RCA: 157] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
13
|
Wang X, Gu Y. Cross-Label Suppression: A Discriminative and Fast Dictionary Learning With Group Regularization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:3859-3873. [PMID: 28500002 DOI: 10.1109/tip.2017.2703101] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This paper addresses image classification through learning a compact and discriminative dictionary efficiently. Given a structured dictionary with each atom (columns in the dictionary matrix) related to some label, we propose crosslabel suppression constraint to enlarge the difference among representations for different classes. Meanwhile, we introduce group regularization to enforce representations to preserve label properties of original samples, meaning the representations for the same class are encouraged to be similar. Upon the crosslabel suppression, we donot resort to frequently-used ℓ0-norm or ℓ1-norm for coding, and obtain computational efficiency without losing the discriminative power for categorization. Moreover, two simple classification schemes are also developed to take full advantage of the learnt dictionary. Extensive experiments on six data sets, including face recognition, object categorization, scene classification, texture recognition, and sport action categorization are conducted, and the results show that the proposed approach can outperform lots of recently presented dictionary algorithms on both recognition accuracy and computational efficiency.
Collapse
|
14
|
Kleinsteuber M. Dynamical Textures Modeling via Joint Video Dictionary Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:2929-2943. [PMID: 28410105 DOI: 10.1109/tip.2017.2691549] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Video representation is an important and challenging task in the computer vision community. In this paper, we consider the problem of modeling and classifying video sequences of dynamic scenes which could be modeled in a dynamic textures (DTs) framework. At first, we assume that image frames of a moving scene can be modeled as a Markov random process. We propose a sparse coding framework, named joint video dictionary learning (JVDL), to model a video adaptively. By treating the sparse coefficients of image frames over a learned dictionary as the underlying "states", we learn an efficient and robust linear transition matrix between two adjacent frames of sparse events in time series. Hence, a dynamic scene sequence is represented by an appropriate transition matrix associated with a dictionary. In order to ensure the stability of JVDL, we impose several constraints on such transition matrix and dictionary. The developed framework is able to capture the dynamics of a moving scene by exploring both the sparse properties and the temporal correlations of consecutive video frames. Moreover, such learned JVDL parameters can be used for various DT applications, such as DT synthesis and recognition. Experimental results demonstrate the strong competitiveness of the proposed JVDL approach in comparison with the state-of-the-art video representation methods. Especially, it performs significantly better in dealing with DT synthesis and recognition on heavily corrupted data.
Collapse
|
15
|
Low-Rank Linear Dynamical Systems for Motor Imagery EEG. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2017; 2016:2637603. [PMID: 28096809 PMCID: PMC5210283 DOI: 10.1155/2016/2637603] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/18/2016] [Revised: 11/14/2016] [Accepted: 11/16/2016] [Indexed: 11/21/2022]
Abstract
The common spatial pattern (CSP) and other spatiospectral feature extraction methods have become the most effective and successful approaches to solve the problem of motor imagery electroencephalography (MI-EEG) pattern recognition from multichannel neural activity in recent years. However, these methods need a lot of preprocessing and postprocessing such as filtering, demean, and spatiospectral feature fusion, which influence the classification accuracy easily. In this paper, we utilize linear dynamical systems (LDSs) for EEG signals feature extraction and classification. LDSs model has lots of advantages such as simultaneous spatial and temporal feature matrix generation, free of preprocessing or postprocessing, and low cost. Furthermore, a low-rank matrix decomposition approach is introduced to get rid of noise and resting state component in order to improve the robustness of the system. Then, we propose a low-rank LDSs algorithm to decompose feature subspace of LDSs on finite Grassmannian and obtain a better performance. Extensive experiments are carried out on public dataset from “BCI Competition III Dataset IVa” and “BCI Competition IV Database 2a.” The results show that our proposed three methods yield higher accuracies compared with prevailing approaches such as CSP and CSSP.
Collapse
|
16
|
Sun Y, Gao J, Hong X, Mishra B, Yin B. Heterogeneous Tensor Decomposition for Clustering via Manifold Optimization. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2016; 38:476-489. [PMID: 27046492 DOI: 10.1109/tpami.2015.2465901] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Tensor clustering is an important tool that exploits intrinsically rich structures in real-world multiarray or Tensor datasets. Often in dealing with those datasets, standard practice is to use subspace clustering that is based on vectorizing multiarray data. However, vectorization of tensorial data does not exploit complete structure information. In this paper, we propose a subspace clustering algorithm without adopting any vectorization process. Our approach is based on a novel heterogeneous Tucker decomposition model taking into account cluster membership information. We propose a new clustering algorithm that alternates between different modes of the proposed heterogeneous tensor model. All but the last mode have closed-form updates. Updating the last mode reduces to optimizing over the multinomial manifold for which we investigate second order Riemannian geometry and propose a trust-region algorithm. Numerical experiments show that our proposed algorithm compete effectively with state-of-the-art clustering algorithms that are based on tensor factorization.
Collapse
|