1
|
Low-complexity three-dimensional discrete Hartley transform approximations for medical image compression. Comput Biol Med 2021; 139:105018. [PMID: 34781235 DOI: 10.1016/j.compbiomed.2021.105018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 11/01/2021] [Accepted: 11/02/2021] [Indexed: 11/23/2022]
Abstract
The discrete Hartley transform (DHT) is a useful tool for medical image coding. The three-dimensional DHT (3D DHT) can be employed to compress medical image data, such as magnetic resonance and X-ray angiography. However, the computation of the 3D DHT involves several multiplications by irrational quantities, which require floating-point arithmetic and inherent truncation errors. In recent years, a significant progress in wireless and implantable biomedical devices has been achieved. Such devices present critical power and hardware limitations. The multiplication operation demands higher hardware, power, and time consumption than other arithmetic operations, such as addition and bit-shifts. In this work, we present a set of multiplierless DHT approximations, which can be implemented with fixed-point arithmetic. We derive 3D DHT approximations by employing tensor formalism. Such proposed methods present prominent computational savings compared to the usual 3D DHT approach, being appropriate for devices with limited resources. The proposed transforms are applied in a lossy 3D DHT-based medical image compression algorithm, presenting practically the same level of visual quality (>98% in terms of SSIM) at a considerable reduction in computational effort (100% multiplicative complexity reduction). Furthermore, we implemented the proposed 3D transforms in an ARM Cortex-M0+ processor employing the low-cost Raspberry Pi Pico board. The execution time was reduced by ∼70% compared to the usual 3D DHT and ∼90% compared to 3D DCT.
Collapse
|
2
|
|
3
|
|
4
|
Zhang S, Lu W, Xing W, Zhang L. Learning Scale-Adaptive Tight Correlation Filter for Object Tracking. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:270-283. [PMID: 30273166 DOI: 10.1109/tcyb.2018.2868782] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, we propose a novel tracking method by formulating tracking as a correlation filtering as well as a ridge regression problem. First, we develop a tight correlation filter-based tracking framework from the signal detection perspective. In this formulation, the correlation filter is set as the same size as the target, which can make full use of the relations of the adjacent image patches and effectively exclude the influence of the background. Specifically, we point out that the novel correlation filter model can be regarded as the ridge regression model which takes into account the different importance of the samples and has the consistent objective with tracking. Second, we focus on the scale variation problem in tracking. By making use of the spatial structure of the correlation filter, the multiscale filter banks can be generated via interpolation to handle the scale estimation problem easily. Third, we present a novel distance importance-based confidence calculation model to determine the final tracking result, which not only makes use of the fine discriminability of the correlation filter but also takes the distance importance of the candidate samples into account to alleviate the impact of similar distractors. Experimental results demonstrate that our method is superior to several state-of-the-art trackers and many other correlation filter-based methods in the benchmark datasets.
Collapse
|
5
|
|
6
|
Sui Y, Wang G, Zhang L. Correlation Filter Learning Toward Peak Strength for Visual Tracking. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:1290-1303. [PMID: 28422678 DOI: 10.1109/tcyb.2017.2690860] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This paper presents a novel visual tracking approach to correlation filter learning toward peak strength of correlation response. Previous methods leverage all features of the target and the immediate background to learn a correlation filter. Some features, however, may be distractive to tracking, like those from occlusion and local deformation, resulting in unstable tracking performance. This paper aims at solving this issue and proposes a novel algorithm to learn the correlation filter. The proposed approach, by imposing an elastic net constraint on the filter, can adaptively eliminate those distractive features in the correlation filtering. A new peak strength metric is proposed to measure the discriminative capability of the learned correlation filter. It is demonstrated that the proposed approach effectively strengthens the peak of the correlation response, leading to more discriminative performance than previous methods. Extensive experiments on a challenging visual tracking benchmark demonstrate that the proposed tracker outperforms most state-of-the-art methods.
Collapse
|
7
|
Sui Y, Wang G, Zhang L, Yang MH. Exploiting Spatial-Temporal Locality of Tracking via Structured Dictionary Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:1282-1296. [PMID: 29990191 DOI: 10.1109/tip.2017.2779275] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, a novel spatial-temporal locality is proposed and unified via a discriminative dictionary learning framework for visual tracking. By exploring the strong local correlations between temporally obtained target and their spatially distributed nearby background neighbors, a spatial-temporal locality is obtained. The locality is formulated as a subspace model and exploited under a unified structure of discriminative dictionary learning with a subspace structure. Using the learned dictionary, the target and its background can be described and distinguished effectively through their sparse codes. As a result, the target is localized by integrating both the descriptive and the discriminative qualities. Extensive experiments on various challenging video sequences demonstrate the superior performance of proposed algorithm over the other state-of-the-art approaches.
Collapse
|
8
|
SAR Target Recognition via Incremental Nonnegative Matrix Factorization. REMOTE SENSING 2018. [DOI: 10.3390/rs10030374] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In synthetic aperture radar (SAR) target recognition, the amount of target data increases continuously, and thus SAR automatic target recognition (ATR) systems are required to provide updated feature models in real time. Most recent SAR feature extraction methods have to use both existing and new samples to retrain a new model every time new data is acquired. However, this repeated calculation of existing samples leads to an increased computing cost. In this paper, a dynamic feature learning method called incremental nonnegative matrix factorization with L p sparse constraints (L p -INMF) is proposed as a solution to that problem. In contrast to conventional nonnegative matrix factorization (NMF) whereby existing and new samples are computed to retrain a new model, incremental NMF (INMF) computes only the new samples to update the trained model incrementally, which can improve the computing efficiency. Considering the sparse characteristics of scattering centers in SAR images, we set the updating process under a generic sparse constraint (L p ) for matrix decomposition of INMF. Thus, L p -INMF can extract sparse characteristics in SAR images. Experimental results using Moving and Stationary Target Acquisition and Recognition (MSTAR) benchmark data illustrate that the proposed L p -INMF method can not only update models with new samples more efficiently than conventional NMF, but also has a higher recognition rate than NMF and INMF.
Collapse
|
9
|
Aghamohammadi A, Ang MC, A. Sundararajan E, Weng NK, Mogharrebi M, Banihashem SY. A parallel spatiotemporal saliency and discriminative online learning method for visual target tracking in aerial videos. PLoS One 2018; 13:e0192246. [PMID: 29438421 PMCID: PMC5811006 DOI: 10.1371/journal.pone.0192246] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Accepted: 01/18/2018] [Indexed: 11/19/2022] Open
Abstract
Visual tracking in aerial videos is a challenging task in computer vision and remote sensing technologies due to appearance variation difficulties. Appearance variations are caused by camera and target motion, low resolution noisy images, scale changes, and pose variations. Various approaches have been proposed to deal with appearance variation difficulties in aerial videos, and amongst these methods, the spatiotemporal saliency detection approach reported promising results in the context of moving target detection. However, it is not accurate for moving target detection when visual tracking is performed under appearance variations. In this study, a visual tracking method is proposed based on spatiotemporal saliency and discriminative online learning methods to deal with appearance variations difficulties. Temporal saliency is used to represent moving target regions, and it was extracted based on the frame difference with Sauvola local adaptive thresholding algorithms. The spatial saliency is used to represent the target appearance details in candidate moving regions. SLIC superpixel segmentation, color, and moment features can be used to compute feature uniqueness and spatial compactness of saliency measurements to detect spatial saliency. It is a time consuming process, which prompted the development of a parallel algorithm to optimize and distribute the saliency detection processes that are loaded into the multi-processors. Spatiotemporal saliency is then obtained by combining the temporal and spatial saliencies to represent moving targets. Finally, a discriminative online learning algorithm was applied to generate a sample model based on spatiotemporal saliency. This sample model is then incrementally updated to detect the target in appearance variation conditions. Experiments conducted on the VIVID dataset demonstrated that the proposed visual tracking method is effective and is computationally efficient compared to state-of-the-art methods.
Collapse
Affiliation(s)
| | - Mei Choo Ang
- Institute of Visual Informatics, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
| | - Elankovan A. Sundararajan
- Center for Software Technology and Management (SOFTAM), Faculty of Information Science and Technology, University, Kebangsaan Malaysia, Bangi, Selangor, Malaysia
| | - Ng Kok Weng
- Industrial Design Centre, Sirim Berhad, Selangor, Malaysia
| | - Marzieh Mogharrebi
- Institute of Visual Informatics, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
| | - Seyed Yashar Banihashem
- Department of Electrical and Computer Engineering, Buien Zahra Technical University, Buien Zahra, Iran
| |
Collapse
|
10
|
Wang J, Wang Y, Deng C, Wang S. Robust Visual Tracking Based on Convex Hull with EMD-L1. PATTERN RECOGNITION AND IMAGE ANALYSIS 2018. [DOI: 10.1134/s1054661818010078] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
11
|
Wu T, Lu Y, Zhu SC. Online Object Tracking, Learning and Parsing with And-Or Graphs. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2017; 39:2465-2480. [PMID: 28026751 DOI: 10.1109/tpami.2016.2644963] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
This paper presents a method, called AOGTracker, for simultaneously tracking, learning and parsing (TLP) of unknown objects in video sequences with a hierarchical and compositional And-Or graph (AOG) representation. The TLP method is formulated in the Bayesian framework with a spatial and a temporal dynamic programming (DP) algorithms inferring object bounding boxes on-the-fly. During online learning, the AOG is discriminatively learned using latent SVM [1] to account for appearance (e.g., lighting and partial occlusion) and structural (e.g., different poses and viewpoints) variations of a tracked object, as well as distractors (e.g., similar objects) in background. Three key issues in online inference and learning are addressed: (i) maintaining purity of positive and negative examples collected online, (ii) controling model complexity in latent structure learning, and (iii) identifying critical moments to re-learn the structure of AOG based on its intrackability. The intrackability measures uncertainty of an AOG based on its score maps in a frame. In experiments, our AOGTracker is tested on two popular tracking benchmarks with the same parameter setting: the TB-100/50/CVPR2013 benchmarks , [3] , and the VOT benchmarks [4] -VOT 2013, 2014, 2015 and TIR2015 (thermal imagery tracking). In the former, our AOGTracker outperforms state-of-the-art tracking algorithms including two trackers based on deep convolutional network [5] , [6] . In the latter, our AOGTracker outperforms all other trackers in VOT2013 and is comparable to the state-of-the-art methods in VOT2014, 2015 and TIR2015.
Collapse
|
12
|
|
13
|
de A Coutinho VA, Cintra RJ, Bayer FM. Low-Complexity Multidimensional DCT Approximations for High-Order Tensor Data Decorrelation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:2296-2310. [PMID: 28287974 DOI: 10.1109/tip.2017.2679442] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, we introduce low-complexity multidimensional discrete cosine transform (DCT) approximations. 3D DCT approximations are formalized in terms of high-order tensor theory. The formulation is extended to higher dimensions with arbitrary lengths. Several multiplierless 8×8 ×8 approximate methods are proposed and the computational complexity is discussed for the general multidimensional case. The proposed methods complexity cost was assessed, presenting considerably lower arithmetic operations when compared with the exact 3D DCT. The proposed approximations were embedded into 3D DCT-based video coding scheme and a modified quantization step was introduced. The simulation results showed that the approximate 3D DCT coding methods offer almost identical output visual quality when compared with exact 3D DCT scheme. The proposed 3D approximations were also employed as a tool for visual tracking. The approximate 3D DCT-based proposed system performs similarly to the original exact 3D DCT-based method. In general, the suggested methods showed competitive performance at a considerably lower computational cost.
Collapse
|
14
|
Online learning and joint optimization of combined spatial-temporal models for robust visual tracking. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.11.055] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
15
|
|
16
|
|
17
|
Li X, Shen C, Dick A, Zhang ZM, Zhuang Y. Online Metric-Weighted Linear Representations for Robust Visual Tracking. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2016; 38:931-950. [PMID: 26390446 DOI: 10.1109/tpami.2015.2469276] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, we propose a visual tracker based on a metric-weighted linear representation of appearance. In order to capture the interdependence of different feature dimensions, we develop two online distance metric learning methods using proximity comparison information and structured output learning. The learned metric is then incorporated into a linear representation of appearance. We show that online distance metric learning significantly improves the robustness of the tracker, especially on those sequences exhibiting drastic appearance changes. In order to bound growth in the number of training samples, we design a time-weighted reservoir sampling method. Moreover, we enable our tracker to automatically perform object identification during the process of object tracking, by introducing a collection of static template samples belonging to several object classes of interest. Object identification results for an entire video sequence are achieved by systematically combining the tracking information and visual recognition at each frame. Experimental results on challenging video sequences demonstrate the effectiveness of the method for both inter-frame tracking and object identification.
Collapse
|
18
|
|
19
|
|
20
|
Chen Y, Yang X, Zhong B, Pan S, Chen D, Zhang H. CNNTracker: Online discriminative object tracking via deep convolutional neural network. Appl Soft Comput 2016. [DOI: 10.1016/j.asoc.2015.06.048] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
21
|
|
22
|
Tian S, Yuan F, Xia GS. Multi-object tracking with inter-feedback between detection and tracking. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.07.028] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
23
|
Yang F, Xia GS, Liu G, Zhang L, Huang X. Dynamic texture recognition by aggregating spatial and temporal features via ensemble SVMs. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.09.004] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
24
|
Wang J, Wang H, Yan Y. Robust visual tracking by metric learning with weighted histogram representations. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2014.11.050] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
25
|
Yuan Y, Fang J, Wang Q. Online anomaly detection in crowd scenes via structure analysis. IEEE TRANSACTIONS ON CYBERNETICS 2015; 45:562-575. [PMID: 24988603 DOI: 10.1109/tcyb.2014.2330853] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Abnormal behavior detection in crowd scenes is continuously a challenge in the field of computer vision. For tackling this problem, this paper starts from a novel structure modeling of crowd behavior. We first propose an informative structural context descriptor (SCD) for describing the crowd individual, which originally introduces the potential energy function of particle's interforce in solid-state physics to intuitively conduct vision contextual cueing. For computing the crowd SCD variation effectively, we then design a robust multi-object tracker to associate the targets in different frames, which employs the incremental analytical ability of the 3-D discrete cosine transform (DCT). By online spatial-temporal analyzing the SCD variation of the crowd, the abnormality is finally localized. Our contribution mainly lies on three aspects: 1) the new exploration of abnormal detection from structure modeling where the motion difference between individuals is computed by a novel selective histogram of optical flow that makes the proposed method can deal with more kinds of anomalies; 2) the SCD description that can effectively represent the relationship among the individuals; and 3) the 3-D DCT multi-object tracker that can robustly associate the limited number of (instead of all) targets which makes the tracking analysis in high density crowd situation feasible. Experimental results on several publicly available crowd video datasets verify the effectiveness of the proposed method.
Collapse
|
26
|
Zhong B, Chen Y, Shen Y, Chen Y, Cui Z, Ji R, Yuan X, Chen D, Chen W. Robust tracking via patch-based appearance model and local background estimation. Neurocomputing 2014. [DOI: 10.1016/j.neucom.2013.06.044] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
27
|
Li X, Hu W, Shen C, Zhang Z, Dick A, Hengel AVD. A survey of appearance models in visual object tracking. ACM T INTEL SYST TEC 2013. [DOI: 10.1145/2508037.2508039] [Citation(s) in RCA: 193] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Visual object tracking is a significant computer vision task which can be applied to many domains, such as visual surveillance, human computer interaction, and video compression. Despite extensive research on this topic, it still suffers from difficulties in handling complex object appearance changes caused by factors such as illumination variation, partial occlusion, shape deformation, and camera motion. Therefore, effective modeling of the 2D appearance of tracked objects is a key issue for the success of a visual tracker. In the literature, researchers have proposed a variety of 2D appearance models.
To help readers swiftly learn the recent advances in 2D appearance models for visual object tracking, we contribute this survey, which provides a detailed review of the existing 2D appearance models. In particular, this survey takes a module-based architecture that enables readers to easily grasp the key points of visual object tracking. In this survey, we first decompose the problem of appearance modeling into two different processing stages: visual representation and statistical modeling. Then, different 2D appearance models are categorized and discussed with respect to their composition modules. Finally, we address several issues of interest as well as the remaining challenges for future research on this topic.
The contributions of this survey are fourfold. First, we review the literature of visual representations according to their feature-construction mechanisms (i.e., local and global). Second, the existing statistical modeling schemes for tracking-by-detection are reviewed according to their model-construction mechanisms: generative, discriminative, and hybrid generative-discriminative. Third, each type of visual representations or statistical modeling techniques is analyzed and discussed from a theoretical or practical viewpoint. Fourth, the existing benchmark resources (e.g., source codes and video datasets) are examined in this survey.
Collapse
Affiliation(s)
- Xi Li
- NLPR, Institute of Automation, Chinese Academy of Sciences and The University of Adelaide
| | - Weiming Hu
- NLPR, Institute of Automation, Chinese Academy of Sciences
| | | | | | | | | |
Collapse
|
28
|
Li X, Dick A, Shen C, Zhang Z, van den Hengel A, Wang H. Visual tracking with spatio-temporal Dempster-Shafer information fusion. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2013; 22:3028-3040. [PMID: 23529089 DOI: 10.1109/tip.2013.2253478] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
A key problem in visual tracking is how to effectively combine spatio-temporal visual information from throughout a video to accurately estimate the state of an object. We address this problem by incorporating Dempster-Shafer (DS) information fusion into the tracking approach. To implement this fusion task, the entire image sequence is partitioned into spatially and temporally adjacent subsequences. A support vector machine (SVM) classifier is trained for object/nonobject classification on each of these subsequences, the outputs of which act as separate data sources. To combine the discriminative information from these classifiers, we further present a spatio-temporal weighted DS (STWDS) scheme. In addition, temporally adjacent sources are likely to share discriminative information on object/nonobject classification. To use such information, an adaptive SVM learning scheme is designed to transfer discriminative information across sources. Finally, the corresponding DS belief function of the STWDS scheme is embedded into a Bayesian tracking model. Experimental results on challenging videos demonstrate the effectiveness and robustness of the proposed tracking approach.
Collapse
Affiliation(s)
- Xi Li
- Australian Center for Visual Technologies, School of Computer Sciences, University of Adelaide, Adelaide 5005, Australia.
| | | | | | | | | | | |
Collapse
|