1
|
Gao Y, Lu J, Li S, Li Y, Du S. Hypergraph-Based Multi-View Action Recognition Using Event Cameras. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:6610-6622. [PMID: 38536691 DOI: 10.1109/tpami.2024.3382117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/07/2024]
Abstract
Action recognition from video data forms a cornerstone with wide-ranging applications. Single-view action recognition faces limitations due to its reliance on a single viewpoint. In contrast, multi-view approaches capture complementary information from various viewpoints for improved accuracy. Recently, event cameras have emerged as innovative bio-inspired sensors, leading to advancements in event-based action recognition. However, existing works predominantly focus on single-view scenarios, leaving a gap in multi-view event data exploitation, particularly in challenges like information deficit and semantic misalignment. To bridge this gap, we introduce HyperMV, a multi-view event-based action recognition framework. HyperMV converts discrete event data into frame-like representations and extracts view-related features using a shared convolutional network. By treating segments as vertices and constructing hyperedges using rule-based and KNN-based strategies, a multi-view hypergraph neural network that captures relationships across viewpoint and temporal features is established. The vertex attention hypergraph propagation is also introduced for enhanced feature fusion. To prompt research in this area, we present the largest multi-view event-based action dataset THUMV-EACT-50, comprising 50 actions from 6 viewpoints, which surpasses existing datasets by over tenfold. Experimental results show that HyperMV significantly outperforms baselines in both cross-subject and cross-view scenarios, and also exceeds the state-of-the-arts in frame-based multi-view action recognition.
Collapse
|
2
|
Algorithm for orthogonal matrix nearness and its application to feature representation. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.12.036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
3
|
Wang R, Wu XJ, Liu Z, Kittler J. Geometry-Aware Graph Embedding Projection Metric Learning for Image Set Classification. IEEE Trans Cogn Dev Syst 2022. [DOI: 10.1109/tcds.2021.3086814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Rui Wang
- School of Artificial Intelligence and Computer Science and Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi, China
| | - Xiao-Jun Wu
- School of Artificial Intelligence and Computer Science and Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi, China
| | - Zhen Liu
- School of Artificial Intelligence and Computer Science and Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi, China
| | - Josef Kittler
- Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford, U.K
| |
Collapse
|
4
|
Sheng X, Xiong D, Ying S. Intrinsic semi-parametric regression model on Grassmannian manifolds with applications. COMMUN STAT-SIMUL C 2022. [DOI: 10.1080/03610918.2022.2112961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Affiliation(s)
- Xuanxuan Sheng
- Department of Mathematics, School of Science, Shanghai University, Shanghai, P. R. China
| | - Di Xiong
- Department of Mathematics, School of Science, Shanghai University, Shanghai, P. R. China
| | - Shihui Ying
- Department of Mathematics, School of Science, Shanghai University, Shanghai, P. R. China
| |
Collapse
|
5
|
Huang B, Hu LS. Quantification of Valve Stiction in Control Loops Using the Bayesian Approach on the Riemannian Manifold. Ind Eng Chem Res 2022. [DOI: 10.1021/acs.iecr.2c01481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Bo Huang
- Department of Automation, Shanghai Jiao Tong University Shanghai, 200240, China
| | | |
Collapse
|
6
|
Human identification based on Gait Manifold. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03818-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
7
|
Xiong D, Ying S, Zhu H. Intrinsic partial linear models for manifold-valued data. Inf Process Manag 2022. [DOI: 10.1016/j.ipm.2022.102954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
8
|
Zou J, Zhang Y, Liu H, Ma L. Monogenic features based single sample face recognition by kernel sparse representation on multiple Riemannian manifolds. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.06.113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
9
|
Shi X, Chai X, Xie J, Sun T. MC-GCN: A Multi-Scale Contrastive Graph Convolutional Network for Unconstrained Face Recognition With Image Sets. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:3046-3055. [PMID: 35385383 DOI: 10.1109/tip.2022.3163851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this paper, a Multi-scale Contrastive Graph Convolutional Network (MC-GCN) method is proposed for unconstrained face recognition with image sets, which takes a set of media (orderless images and videos) as a face subject instead of single media (an image or video). Due to factors such as illumination, posture, media source, etc., there are huge intra-set variances in a face set, and the importance of different face prototypes varies considerably. How to model the attention mechanism according to the relationship between prototypes or images in a set is the main content of this paper. In this work, we formulate a framework based on graph convolutional network (GCN), which considers face prototypes as nodes to build relations. Specifically, we first present a multi-scale graph module to learn the relationship between prototypes at multiple scales. Moreover, a Contrastive Graph Convolutional (CGC) block is introduced to build attention control model, which focuses on those frames with similar prototypes (contrastive information) between pair of sets instead of simply evaluating the frame quality. The experiments on IJB-A, YouTube Face, and an animal face dataset clearly demonstrate that our proposed MC-GCN outperforms the state-of-the-art methods significantly.
Collapse
|
10
|
Mason E, Mhaskar H, Guo A. A manifold learning approach for gesture recognition from micro-Doppler radar measurements. Neural Netw 2022; 152:353-369. [DOI: 10.1016/j.neunet.2022.04.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Revised: 04/21/2022] [Accepted: 04/21/2022] [Indexed: 11/25/2022]
|
11
|
Multilinear clustering via tensor Fukunaga–Koontz transform with Fisher eigenspectrum regularization. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107899] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
12
|
|
13
|
Wang B, Hu Y, Gao J, Sun Y, Ju F, Yin B. Adaptive Fusion of Heterogeneous Manifolds for Subspace Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:3484-3497. [PMID: 32776883 DOI: 10.1109/tnnls.2020.3011717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Multiview clustering (MVC) has recently received great interest due to its pleasing efficacy in combining the abundant and complementary information to improve clustering performance, which overcomes the drawbacks of view limitation existed in the standard single-view clustering. However, the existing MVC methods are mostly designed for vectorial data from linear spaces and, thus, are not suitable for multiple dimensional data with intrinsic nonlinear manifold structures, e.g., videos or image sets. Some works have introduced manifolds' representation methods of data into MVC and obtained considerable improvements, but how to fuse multiple manifolds efficiently for clustering is still a challenging problem. Particularly, for heterogeneous manifolds, it is an entirely new problem. In this article, we propose to represent the complicated multiviews' data as heterogeneous manifolds and a fusion framework of heterogeneous manifolds for clustering. Different from the empirical weighting methods, an adaptive fusion strategy is designed to weight the importance of different manifolds in a data-driven manner. In addition, the low-rank representation is generalized onto the fused heterogeneous manifolds to explore the low-dimensional subspace structures embedded in data for clustering. We assessed the proposed method on several public data sets, including human action video, facial image, and traffic scenario video. The experimental results show that our method obviously outperforms a number of state-of-the-art clustering methods.
Collapse
|
14
|
|
15
|
Tabejamaat M, Mohammadzade H. Contributive Representation-Based Reconstruction for Online 3D Action Recognition. INT J PATTERN RECOGN 2021. [DOI: 10.1142/s0218001421500051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Recent years have seen an increasing trend in developing 3D action recognition methods. However, despite the advances, existing models still suffer from some major drawbacks including the lack of any provision for recognizing action sequences with some missing frames. This significantly hampers the applicability of these methods for online scenarios, where only an initial part of sequences are already provided. In this paper, we introduce a novel sequence-to-sequence representation-based algorithm in which a query sample is characterized using a collaborative frame representation of all the training sequences. This way, an optimal classifier is tailored for the existing frames of each query sample, making the model robust to the effect of missing frames in sequences (e.g. in online scenarios). Moreover, due to the collaborative nature of the representation, it implicitly handles the problem of varying styles during the course of activities. Experimental results on three publicly available databases, UTKinect, TST fall, and UTD-MHAD, respectively, show 95.48%, 90.91%, and 91.67% accuracy when using the beginning 75% portion of query sequences and 84.42%, 60.98%, and 87.27% accuracy for their initial 50%.
Collapse
Affiliation(s)
- Mohsen Tabejamaat
- Department of Electrical Engineering, Sharif University of Technology, Tehran 11155-8639, Iran
| | - Hoda Mohammadzade
- Department of Electrical Engineering, Sharif University of Technology, Tehran 11155-8639, Iran
| |
Collapse
|
16
|
Jing P, Su Y, Li Z, Nie L. Learning robust affinity graph representation for multi-view clustering. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.06.068] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
17
|
Wang R, Wu XJ. GrasNet: A Simple Grassmannian Network for Image Set Classification. Neural Process Lett 2020. [DOI: 10.1007/s11063-020-10276-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
18
|
Structure Fusion Based on Graph Convolutional Networks for Node Classification in Citation Networks. ELECTRONICS 2020. [DOI: 10.3390/electronics9030432] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Suffering from the multi-view data diversity and complexity, most of the existing graph convolutional networks focus on the networks’ architecture construction or the salient graph structure preservation for node classification in citation networks and usually ignore capturing the complete graph structure of nodes for enhancing classification performance. To mine the more complete distribution structure from multi-graph structures of multi-view data with the consideration of their specificity and the commonality, we propose structure fusion based on graph convolutional networks (SF-GCN) for improving the performance of node classification in a semi-supervised way. SF-GCN can not only exploit the special characteristic of each view datum by spectral embedding preserving multi-graph structures, but also explore the common style of multi-view data by the distance metric between multi-graph structures. Suppose the linear relationship between multi-graph structures; we can construct the optimization function of the structure fusion model by balancing the specificity loss and the commonality loss. By solving this function, we can simultaneously obtain the fusion spectral embedding from the multi-view data and the fusion structure as the adjacent matrix to input graph convolutional networks for node classification in a semi-supervised way. Furthermore, we generalize the structure fusion to structure diffusion propagation and present structure propagation fusion based on graph convolutional networks (SPF-GCN) for utilizing these structure interactions. Experiments demonstrate that the performance of SPF-GCN outperforms that of the state-of-the-art methods on three challenging datasets, which are Cora, Citeseer, and Pubmed in citation networks.
Collapse
|
19
|
Kacem A, Daoudi M, Amor BB, Berretti S, Alvarez-Paiva JC. A Novel Geometric Framework on Gram Matrix Trajectories for Human Behavior Understanding. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:1-14. [PMID: 30281437 DOI: 10.1109/tpami.2018.2872564] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, we propose a novel space-time geometric representation of human landmark configurations and derive tools for comparison and classification. We model the temporal evolution of landmarks as parametrized trajectories on the Riemannian manifold of positive semidefinite matrices of fixed-rank. Our representation has the benefit to bring naturally a second desirable quantity when comparing shapes-the spatial covariance-in addition to the conventional affine-shape representation. We derived then geometric and computational tools for rate-invariant analysis and adaptive re-sampling of trajectories, grounding on the Riemannian geometry of the underlying manifold. Specifically, our approach involves three steps: (1) landmarks are first mapped into the Riemannian manifold of positive semidefinite matrices of fixed-rank to build time-parameterized trajectories; (2) a temporal warping is performed on the trajectories, providing a geometry-aware (dis-)similarity measure between them; (3) finally, a pairwise proximity function SVM is used to classify them, incorporating the (dis-)similarity measure into the kernel function. We show that such representation and metric achieve competitive results in applications as action recognition and emotion recognition from 3D skeletal data, and facial expression recognition from videos. Experiments have been conducted on several publicly available up-to-date benchmarks.
Collapse
|
20
|
Luo G, Wei J, Hu W, Maybank SJ. Tangent Fisher Vector on Matrix Manifolds for Action Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:3052-3064. [PMID: 31804934 DOI: 10.1109/tip.2019.2955561] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this paper, we address the problem of representing and recognizing human actions from videos on matrix manifolds. For this purpose, we propose a new vector representation method, named tangent Fisher vector, to describe video sequences in the Fisher kernel framework. We first extract dense curved spatio-temporal cuboids from each video sequence. Compared with the traditional 'straight cuboids', the dense curved spatio-temporal cuboids contain much more local motion information. Each cuboid is then described using a linear dynamical system (LDS) to simultaneously capture the local appearance and dynamics. Furthermore, a simple yet efficient algorithm is proposed to learn the LDS parameters and approximate the observability matrix at the same time. Each video sequence is thus represented by a set of LDSs. Considering that each LDS can be viewed as a point in a Grassmann manifold, we propose to learn an intrinsic GMM on the manifold to cluster the LDS points. Finally a tangent Fisher vector is computed by first accumulating all the tangent vectors in each Gaussian component, and then concatenating the normalized results across all the Gaussian components. A kernel is defined to measure the similarity between tangent Fisher vectors for classification and recognition of a video sequence. This approach is evaluated on the state-of-the-art human action benchmark datasets. The recognition performance is competitive when compared with current state-of-the-art results.
Collapse
|
21
|
Ali M, Gao J, Antolovich M. Parametric Classification of Bingham Distributions Based on Grassmann Manifolds. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:5771-5784. [PMID: 31247550 DOI: 10.1109/tip.2019.2922100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, we present a novel Bayesian classification framework of the matrix variate Bingham distributions with the inclusion of its normalizing constant and develop a consistent general parametric modeling framework based on the Grassmann manifolds. To calculate the normalizing constants of the Bingham model, this paper extends the method of saddle-point approximation (SPA) to a new setting. Furthermore, it employs the standard theory of maximum likelihood estimation (MLE) to evaluate the involved parameters in the used probability density functions. The validity and performance of the proposed approach are tested on 14 real-world visual classification databases. We have compared the classification performance of our proposed approach with the baselines from the previous related approaches. The comparison shows that on most of the databases, the performance of our approach is superior.
Collapse
|
22
|
Li Y, Hong J, Chen H. Short Sequence Classification Through Discriminable Linear Dynamical System. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:3396-3408. [PMID: 30716053 DOI: 10.1109/tnnls.2019.2891743] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Linear dynamical system (LDS) offers a convenient way to reveal the unobservable structure behind the data. This makes it useful for data representation and explanatory analysis. An immediate limitation with this model is that most training algorithms train a model to best approximate a sequential instance. They do not consider its class or label which indicates the dissimilarity/similarity to other instances. As a result, LDS's trained in this way are inclined to be indistinguishable over classes, resulting in a poor performance in the model-based classification. In this paper, after revisiting this limitation, we propose to promote the diversity between the two models of different classes. The diversity, measured by determinantal point process (DPP) on LDS's, is utilized to remedy the greedy behavior of the electromagnetic algorithm. The training goal is a model that balances the goodness of fit and being distinguishable over classes. Experiments on synthetic data confirm its effectiveness in generating discriminative systems under supervisory information. The classification on short time-span data sets confirms that the models generated by our approach could generalize well to unseen data.
Collapse
|
23
|
Le Brigant A, Puechmorel S. Quantization and clustering on Riemannian manifolds with an application to air traffic analysis. J MULTIVARIATE ANAL 2019. [DOI: 10.1016/j.jmva.2019.05.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
24
|
|
25
|
|
26
|
Zhang L, Zhen X, Shao L, Song J. Learning Match Kernels on Grassmann Manifolds for Action Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:205-215. [PMID: 30136940 DOI: 10.1109/tip.2018.2866688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Action recognition has been extensively researched in computer vision due to its potential applications in a broad range of areas. The key to action recognition lies in modeling actions and measuring their similarity, which however poses great challenges. In this paper, we propose learning match kernels between actions on Grassmann manifold for action recognition. Specifically, we propose modeling actions as a linear subspace on the Grassmann manifold; the subspace is a set of convolutional neural network (CNN) feature vectors pooled temporally over frames in semantic video clips, which simultaneously captures local discriminant patterns and temporal dynamics of motion. To measure the similarity between actions, we propose Grassmann match kernels (GMK) based on canonical correlations of linear subspaces to directly match videos for action recognition; GMK is learned in a supervised way via kernel target alignment, which is endowed with a great discriminative ability to distinguish actions from different classes. The proposed approach leverages the strengths of CNNs for feature extraction and kernels for measuring similarity, which accomplishes a general learning framework of match kernels for action recognition. We have conducted extensive experiments on five challenging realistic data sets including Youtube, UCF50, UCF101, Penn action, and HMDB51. The proposed approach achieves high performance and substantially surpasses the state-of-the-art algorithms by large margins, which demonstrates the great effectiveness of proposed approach for action recognition.
Collapse
|
27
|
Huang Z, Wang R, Shan S, Van Gool L, Chen X. Cross Euclidean-to-Riemannian Metric Learning with Application to Face Recognition from Video. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2018; 40:2827-2840. [PMID: 29990185 DOI: 10.1109/tpami.2017.2776154] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Riemannian manifolds have been widely employed for video representations in visual classification tasks including video-based face recognition. The success mainly derives from learning a discriminant Riemannian metric which encodes the non-linear geometry of the underlying Riemannian manifolds. In this paper, we propose a novel metric learning framework to learn a distance metric across a Euclidean space and a Riemannian manifold to fuse average appearance and pattern variation of faces within one video. The proposed metric learning framework can handle three typical tasks of video-based face recognition: Video-to-Still, Still-to-Video and Video-to-Video settings. To accomplish this new framework, by exploiting typical Riemannian geometries for kernel embedding, we map the source Euclidean space and Riemannian manifold into a common Euclidean subspace, each through a corresponding high-dimensional Reproducing Kernel Hilbert Space (RKHS). With this mapping, the problem of learning a cross-view metric between the two source heterogeneous spaces can be converted to learning a single-view Euclidean distance metric in the target common Euclidean space. By learning information on heterogeneous data with the shared label, the discriminant metric in the common space improves face recognition from videos. Extensive experiments on four challenging video face databases demonstrate that the proposed framework has a clear advantage over the state-of-the-art methods in the three classical video-based face recognition scenarios.
Collapse
|
28
|
Chen H, Sun Y, Gao J, Hu Y, Yin B. Solving Partial Least Squares Regression via Manifold Optimization Approaches. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 30:588-600. [PMID: 29994619 DOI: 10.1109/tnnls.2018.2844866] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Partial least squares regression (PLSR) has been a popular technique to explore the linear relationship between two data sets. However, all existing approaches often optimize a PLSR model in Euclidean space and take a successive strategy to calculate all the factors one by one for keeping the mutually orthogonal PLSR factors. Thus, a suboptimal solution is often generated. To overcome the shortcoming, this paper takes statistically inspired modification of PLSR (SIMPLSR) as a representative of PLSR, proposes a novel approach to transform SIMPLSR into optimization problems on Riemannian manifolds, and develops corresponding optimization algorithms. These algorithms can calculate all the PLSR factors simultaneously to avoid any suboptimal solutions. Moreover, we propose sparse SIMPLSR on Riemannian manifolds, which is simple and intuitive. A number of experiments on classification problems have demonstrated that the proposed models and algorithms can get lower classification error rates compared with other linear regression methods in Euclidean space. We have made the experimental code public at https://github.com/Haoran2014.
Collapse
|
29
|
Fisher Vector Coding for Covariance Matrix Descriptors Based on the Log-Euclidean and Affine Invariant Riemannian Metrics. J Imaging 2018. [DOI: 10.3390/jimaging4070085] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|
30
|
Chen J, Zhang Z, He R, Hu X, Qin X. RAPID: Measuring Deformation of Biological Tissues from MR Images Through the Riemannian Pseudo Kernel. INT J PATTERN RECOGN 2018. [DOI: 10.1142/s0218001418570033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Due to the nonlinear deformation of nonrigid and nonuniform tissues, it is challenging to accurately measure the displacements of feature points distributed on the inner parts, boundaries, and separatrices of tissue layers. To address this challenge, we propose a feature point matching technique called RAPID to measure MR 2D slice deformation of nonuniform and nonrigid biological tissues. We propose to use the covariance of several neighboring point statistics computed around a keypoint, as the keypoint descriptor. Inspired by the kernel methods, we advocate adopting a Riemannian pseudo kernel to map SPD matrices to a high dimensional Hilbert space, where the Euclidean geometry applies. We compare our RAPID with two existing schemes (i.e., SIFT and SURF). Our experimental results show that our RAPID is superior to SIFT and SURF, because the benefits offered by RAPID are two-fold. First, our RAPID increases the number of matched data points. Second, RAPID substantially improves the key-point matching accuracy of SIFT and SURF.
Collapse
Affiliation(s)
- Jia Chen
- School of Mathematics and Computer Science, Hubei Garment Information Engineering Technology Research Center, Wuhan Textile University, Wuhan 430073, P. R. China
| | - Zili Zhang
- School of Mathematics and Computer Science, Wuhan Textile University, Wuhan 430073, P. R. China
| | - Ruhan He
- School of Mathematics and Computer Science, Wuhan Textile University, Wuhan 430073, P. R. China
| | - Xinrong Hu
- School of Mathematics and Computer Science, Hubei Garment Information Engineering Technology Research Center, Wuhan Textile University, Wuhan 430073, P. R. China
| | - Xiao Qin
- Department of Computer Science and Software Engineering, Samuel Ginn College of Engineering, Auburn University, AL, USA
| |
Collapse
|
31
|
Zheng P, Zhao ZQ, Gao J, Wu X. A set-level joint sparse representation for image set classification. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2018.02.062] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
32
|
Ali M, Gao J. Classification of matrix-variate Fisher–Bingham distribution via Maximum Likelihood Estimation using manifold valued data. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.01.048] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
33
|
Chen H, Sun Y, Gao J, Hu Y, Yin B. Fast optimization algorithm on Riemannian manifolds and its application in low-rank learning. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.02.058] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
34
|
Cheng G, Zhou P, Han J. Duplex Metric Learning for Image Set Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:281-292. [PMID: 28991740 DOI: 10.1109/tip.2017.2760512] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Image set classification has attracted much attention because of its broad applications. Despite the success made so far, the problems of intra-class diversity and inter-class similarity still remain two major challenges. To explore a possible solution to these challenges, this paper proposes a novel approach, termed duplex metric learning (DML), for image set classification. The proposed DML consists of two progressive metric learning stages with different objectives used for feature learning and image classification, respectively. The metric learning regularization is not only used to learn powerful feature representations but also well explored to train an effective classifier. At the first stage, we first train a discriminative stacked autoencoder (DSAE) by layer-wisely imposing a metric learning regularization term on the neurons in the hidden layers and meanwhile minimizing the reconstruction error to obtain new feature mappings in which similar samples are mapped closely to each other and dissimilar samples are mapped farther apart. At the second stage, we discriminatively train a classifier and simultaneously fine-tune the DSAE by optimizing a new objective function, which consists of a classification error term and a metric learning regularization term. Finally, two simple voting strategies are devised for image set classification based on the learnt classifier. In the experiments, we extensively evaluate the proposed framework for the tasks of face recognition, object recognition, and face verification on several commonly-used data sets and state-of-the-art results are achieved in comparison with existing methods.
Collapse
|
35
|
Chakraborty R, Singh V, Adluru N, Vemuri BC. A geometric framework for statistical analysis of trajectories with distinct temporal spans. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION 2017; 2017:172-181. [PMID: 32514257 DOI: 10.1109/iccv.2017.28] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Analyzing data representing multifarious trajectories is central to the many fields in Science and Engineering; for example, trajectories representing a tennis serve, a gymnast's parallel bar routine, progression/remission of disease and so on. We present a novel geometric algorithm for performing statistical analysis of trajectories with distinct number of samples representing longitudinal (or temporal) data. A key feature of our proposal is that unlike existing schemes, our model is deployable in regimes where each participant provides a different number of acquisitions (trajectories have different number of sample points or temporal span). To achieve this, we develop a novel method involving the parallel transport of the tangent vectors along each given trajectory to the starting point of the respective trajectories and then use the span of the matrix whose columns consist of these vectors, to construct a linear subspace in R m . We then map these linear subspaces (possibly of distinct dimensions) of R m on to a single high dimensional hypersphere. This enables computing group statistics over trajectories by instead performing statistics on the hypersphere (equipped with a simpler geometry). Given a point on the hypersphere representing a trajectory, we also provide a "reverse mapping" algorithm to uniquely (under certain assumptions) reconstruct the subspace that corresponds to this point. Finally, by using existing algorithms for recursive Fréchet mean and exact principal geodesic analysis on the hypersphere, we present several experiments on synthetic and real (vision and medical) data sets showing how group testing on such diversely sampled longitudinal data is possible by analyzing the reconstructed data in the subspace spanned by the first few principal components.
Collapse
|
36
|
Grading of invasive breast carcinoma through Grassmannian VLAD encoding. PLoS One 2017; 12:e0185110. [PMID: 28934283 PMCID: PMC5608317 DOI: 10.1371/journal.pone.0185110] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2017] [Accepted: 09/05/2017] [Indexed: 11/19/2022] Open
Abstract
In this paper we address the problem of automated grading of invasive breast carcinoma through the encoding of histological images as VLAD (Vector of Locally Aggregated Descriptors) representations on the Grassmann manifold. The proposed method considers each image as a set of multidimensional spatially-evolving signals that can be efficiently modeled through a higher-order linear dynamical systems analysis. Subsequently, each H&E (Hematoxylin and Eosin) stained breast cancer histological image is represented as a cloud of points on the Grassmann manifold, while a vector representation approach is applied aiming to aggregate the Grassmannian points based on a locality criterion on the manifold. To evaluate the efficiency of the proposed methodology, two datasets with different characteristics were used. More specifically, we created a new medium-sized dataset consisting of 300 annotated images (collected from 21 patients) of grades 1, 2 and 3, while we also provide experimental results using a large dataset, namely BreaKHis, containing 7,909 breast cancer histological images, collected from 82 patients, of both benign and malignant cases. Experimental results have shown that the proposed method outperforms a number of state of the art approaches providing average classification rates of 95.8% and 91.38% with our dataset and the BreaKHis dataset, respectively.
Collapse
|
37
|
Dong G, Kuang G, Wang N, Wang W. Classification via Sparse Representation of Steerable Wavelet Frames on Grassmann Manifold: Application to Target Recognition in SAR Image. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:2892-2904. [PMID: 28410109 DOI: 10.1109/tip.2017.2692524] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Automatic target recognition has been widely studied over the years, yet it is still an open problem. The main obstacle consists in extended operating conditions, e.g.., depression angle change, configuration variation, articulation, and occlusion. To deal with them, this paper proposes a new classification strategy. We develop a new representation model via the steerable wavelet frames. The proposed representation model is entirely viewed as an element on Grassmann manifolds. To achieve target classification, we embed Grassmann manifolds into an implicit reproducing Kernel Hilbert space (RKHS), where the kernel sparse learning can be applied. Specifically, the mappings of training sample in RKHS are concatenated to form an overcomplete dictionary. It is then used to encode the counterpart of query as a linear combination of its atoms. By designed Grassmann kernel function, it is capable to obtain the sparse representation, from which the inference can be reached. The novelty of this paper comes from: 1) the development of representation model by the set of directional components of Riesz transform; 2) the quantitative measure of similarity for proposed representation model by Grassmann metric; and 3) the generation of global kernel function by Grassmann kernel. Extensive comparative studies are performed to demonstrate the advantage of proposed strategy.
Collapse
|
38
|
Kleinsteuber M. Dynamical Textures Modeling via Joint Video Dictionary Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:2929-2943. [PMID: 28410105 DOI: 10.1109/tip.2017.2691549] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Video representation is an important and challenging task in the computer vision community. In this paper, we consider the problem of modeling and classifying video sequences of dynamic scenes which could be modeled in a dynamic textures (DTs) framework. At first, we assume that image frames of a moving scene can be modeled as a Markov random process. We propose a sparse coding framework, named joint video dictionary learning (JVDL), to model a video adaptively. By treating the sparse coefficients of image frames over a learned dictionary as the underlying "states", we learn an efficient and robust linear transition matrix between two adjacent frames of sparse events in time series. Hence, a dynamic scene sequence is represented by an appropriate transition matrix associated with a dictionary. In order to ensure the stability of JVDL, we impose several constraints on such transition matrix and dictionary. The developed framework is able to capture the dynamics of a moving scene by exploring both the sparse properties and the temporal correlations of consecutive video frames. Moreover, such learned JVDL parameters can be used for various DT applications, such as DT synthesis and recognition. Experimental results demonstrate the strong competitiveness of the proposed JVDL approach in comparison with the state-of-the-art video representation methods. Especially, it performs significantly better in dealing with DT synthesis and recognition on heavily corrupted data.
Collapse
|
39
|
Connie T, Goh MKO, Teoh ABJ. A Grassmannian Approach to Address View Change Problem in Gait Recognition. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:1395-1408. [PMID: 27101628 DOI: 10.1109/tcyb.2016.2545693] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Gait recognition appears to be a valuable asset when conventional biometrics cannot be employed. Nonetheless, recognizing human by gait is not a trivial task due to the complex human kinematic structure and other external factors affecting human locomotion. A major challenge in gait recognition is view variation. A large difference between the views in the query and reference sets often leads to performance deterioration. In this paper, we show how to generate virtual views to compensate the view difference in the query and reference sets, making it possible to match the query and reference sets using standardized views. The proposed method, which combines multiview matrix representation and a novel randomized kernel extreme learning machine, is an end-to-end solution for view change problem under Grassmann manifold treatment. Under the right condition, the view-tagging problem can be eliminated. Since the recording angle and walking direction of the subject are not always available, this is particularly valuable for a practical gait recognition system. We present several working scenarios for multiview recognition that have not be considered before. Rigorous experiments have been conducted on two challenging benchmark databases containing multiview gait datasets. Experiments show that the proposed approach outperforms several state-of-the-arts methods.
Collapse
|
40
|
Anirudh R, Turaga P, Srivastava A. Elastic Functional Coding of Riemannian Trajectories. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2017; 39:922-936. [PMID: 28113699 DOI: 10.1109/tpami.2016.2564409] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Visual observations of dynamic phenomena, such as human actions, are often represented as sequences of smoothly-varying features. In cases where the feature spaces can be structured as Riemannian manifolds, the corresponding representations become trajectories on manifolds. Analysis of these trajectories is challenging due to non-linearity of underlying spaces and high-dimensionality of trajectories. In vision problems, given the nature of physical systems involved, these phenomena are better characterized on a low-dimensional manifold compared to the space of Riemannian trajectories. For instance, if one does not impose physical constraints of the human body, in data involving human action analysis, the resulting representation space will have highly redundant features. Learning an effective, low-dimensional embedding for action representations will have a huge impact in the areas of search and retrieval, visualization, learning, and recognition. Traditional manifold learning addresses this problem for static points in the euclidean space, but its extension to Riemannian trajectories is non-trivial and remains unexplored. The difficulty lies in inherent non-linearity of the domain and temporal variability of actions that can distort any traditional metric between trajectories. To overcome these issues, we use the framework based on transported square-root velocity fields (TSRVF); this framework has several desirable properties, including a rate-invariant metric and vector space representations. We propose to learn an embedding such that each action trajectory is mapped to a single point in a low-dimensional euclidean space, and the trajectories that differ only in temporal rates map to the same point. We utilize the TSRVF representation, and accompanying statistical summaries of Riemannian trajectories, to extend existing coding methods such as PCA, KSVD and Label Consistent KSVD to Riemannian trajectories or more generally to Riemannian functions. We show that such coding efficiently captures trajectories in applications such as action recognition, stroke rehabilitation, visual speech recognition, clustering and diverse sequence sampling. Using this framework, we obtain state-of-the-art recognition results, while reducing the dimensionality/ complexity by a factor of 100-250x. Since these mappings and codes are invertible, they can also be used to interactively-visualize Riemannian trajectories and synthesize actions.
Collapse
|
41
|
|
42
|
Low-Rank Linear Dynamical Systems for Motor Imagery EEG. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2017; 2016:2637603. [PMID: 28096809 PMCID: PMC5210283 DOI: 10.1155/2016/2637603] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/18/2016] [Revised: 11/14/2016] [Accepted: 11/16/2016] [Indexed: 11/21/2022]
Abstract
The common spatial pattern (CSP) and other spatiospectral feature extraction methods have become the most effective and successful approaches to solve the problem of motor imagery electroencephalography (MI-EEG) pattern recognition from multichannel neural activity in recent years. However, these methods need a lot of preprocessing and postprocessing such as filtering, demean, and spatiospectral feature fusion, which influence the classification accuracy easily. In this paper, we utilize linear dynamical systems (LDSs) for EEG signals feature extraction and classification. LDSs model has lots of advantages such as simultaneous spatial and temporal feature matrix generation, free of preprocessing or postprocessing, and low cost. Furthermore, a low-rank matrix decomposition approach is introduced to get rid of noise and resting state component in order to improve the robustness of the system. Then, we propose a low-rank LDSs algorithm to decompose feature subspace of LDSs on finite Grassmannian and obtain a better performance. Extensive experiments are carried out on public dataset from “BCI Competition III Dataset IVa” and “BCI Competition IV Database 2a.” The results show that our proposed three methods yield higher accuracies compared with prevailing approaches such as CSP and CSSP.
Collapse
|
43
|
Geodesic distance on a Grassmannian for monitoring the progression of Alzheimer's disease. Neuroimage 2017; 146:1016-1024. [DOI: 10.1016/j.neuroimage.2016.10.025] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Revised: 09/17/2016] [Accepted: 10/14/2016] [Indexed: 02/01/2023] Open
|
44
|
Connie T, Goh KO, Teoh AB. Multi-view gait recognition using a doubly-kernel approach on the Grassmann manifold. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.08.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
45
|
Hong Y, Kwitt R, Singh N, Vasconcelos N, Niethammer M. Parametric Regression on the Grassmannian. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2016; 38:2284-2297. [PMID: 26766216 DOI: 10.1109/tpami.2016.2516533] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We address the problem of fitting parametric curves on the Grassmann manifold for the purpose of intrinsic parametric regression. We start from the energy minimization formulation of linear least-squares in Euclidean space and generalize this concept to general nonflat Riemannian manifolds, following an optimal-control point of view. We then specialize this idea to the Grassmann manifold and demonstrate that it yields a simple, extensible and easy-to-implement solution to the parametric regression problem. In fact, it allows us to extend the basic geodesic model to (1) a "time-warped" variant and (2) cubic splines. We demonstrate the utility of the proposed solution on different vision problems, such as shape regression as a function of age, traffic-speed estimation and crowd-counting from surveillance video clips. Most notably, these problems can be conveniently solved within the same framework without any specifically-tailored steps along the processing pipeline.
Collapse
|
46
|
Explicit discriminative representation for improved classification of manifold features. Pattern Recognit Lett 2016. [DOI: 10.1016/j.patrec.2016.06.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
47
|
Hauberg S. Principal Curves on Riemannian Manifolds. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2016; 38:1915-1921. [PMID: 26540674 DOI: 10.1109/tpami.2015.2496166] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Euclidean statistics are often generalized to Riemannian manifolds by replacing straight-line interpolations with geodesic ones. While these Riemannian models are familiar-looking, they are restricted by the inflexibility of geodesics, and they rely on constructions which are optimal only in Euclidean domains. We consider extensions of Principal Component Analysis (PCA) to Riemannian manifolds. Classic Riemannian approaches seek a geodesic curve passing through the mean that optimizes a criteria of interest. The requirements that the solution both is geodesic and must pass through the mean tend to imply that the methods only work well when the manifold is mostly flat within the support of the generating distribution. We argue that instead of generalizing linear Euclidean models, it is more fruitful to generalize non-linear Euclidean models. Specifically, we extend the classic Principal Curves from Hastie & Stuetzle to data residing on a complete Riemannian manifold. We show that for elliptical distributions in the tangent of spaces of constant curvature, the standard principal geodesic is a principal curve. The proposed model is simple to compute and avoids many of the pitfalls of traditional geodesic approaches. We empirically demonstrate the effectiveness of the Riemannian principal curves on several manifolds and datasets.
Collapse
|
48
|
|
49
|
Geodesic Flow Kernel Support Vector Machine for Hyperspectral Image Classification by Unsupervised Subspace Feature Transfer. REMOTE SENSING 2016. [DOI: 10.3390/rs8030234] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
50
|
Ben Amor B, Su J, Srivastava A. Action Recognition Using Rate-Invariant Analysis of Skeletal Shape Trajectories. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2016; 38:1-13. [PMID: 27030844 DOI: 10.1109/tpami.2015.2439257] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We study the problem of classifying actions of human subjects using depth movies generated by Kinect or other depth sensors. Representing human body as dynamical skeletons, we study the evolution of their (skeletons’) shapes as trajectories on Kendall’s shape manifold. The action data is typically corrupted by large variability in execution rates within and across subjects and, thus, causing major problems in statistical analyses. To address that issue, we adopt a recently-developed framework of Su et al. [1], [2] to this problem domain. Here, the variable execution rates correspond to re-parameterizations of trajectories, and one uses a parameterization-invariant metric for aligning, comparing, averaging, and modeling trajectories. This is based on a combination of transported square-root vector fields (TSRVFs) of trajectories and the standard Euclidean norm, that allows computational efficiency. We develop a comprehensive suite of computational tools for this application domain: smoothing and denoising skeleton trajectories using median filtering, up- and down-sampling actions in time domain, simultaneous temporal-registration of multiple actions, and extracting invertible Euclidean representations of actions. Due to invertibility these Euclidean representations allow both discriminative and generative models for statistical analysis. For instance, they can be used in a SVM-based classification of original actions, as demonstrated here using MSR Action-3D, MSR Daily Activity and 3D Action Pairs datasets. Using only the skeletal information, we achieve state-of-the-art classification results on these datasets.
Collapse
|