1
|
Sun J, Huang L, Wang H, Zheng C, Qiu J, Islam MT, Xie E, Zhou B, Xing L, Chandrasekaran A, Black MJ. Localization and recognition of human action in 3D using transformers. COMMUNICATIONS ENGINEERING 2024; 3:125. [PMID: 39227676 PMCID: PMC11372174 DOI: 10.1038/s44172-024-00272-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 08/23/2024] [Indexed: 09/05/2024]
Abstract
Understanding a person's behavior from their 3D motion sequence is a fundamental problem in computer vision with many applications. An important component of this problem is 3D action localization, which involves recognizing what actions a person is performing, and when the actions occur in the sequence. To promote the progress of the 3D action localization community, we introduce a new, challenging, and more complex benchmark dataset, BABEL-TAL (BT), for 3D action localization. Important baselines and evaluating metrics, as well as human evaluations, are carefully established on this benchmark. We also propose a strong baseline model, i.e., Localizing Actions with Transformers (LocATe), that jointly localizes and recognizes actions in a 3D sequence. The proposed LocATe shows superior performance on BABEL-TAL as well as on the large-scale PKU-MMD dataset, achieving state-of-the-art performance by using only 10% of the labeled training data. Our research could advance the development of more accurate and efficient systems for human behavior analysis, with potential applications in areas such as human-computer interaction and healthcare.
Collapse
Affiliation(s)
- Jiankai Sun
- School of Engineering, Stanford University, Stanford, CA, USA.
| | - Linjiang Huang
- Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong, Hong Kong SAR
| | - Hongsong Wang
- Department of Computer Science and Engineering, Southeast University, Nanjing, Jiangsu, China
| | - Chuanyang Zheng
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, Hong Kong SAR
| | - Jianing Qiu
- Department of Biomedical Engineering, The Chinese University of Hong Kong, Hong Kong, Hong Kong SAR.
| | - Md Tauhidul Islam
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | - Enze Xie
- Department of Computer Science, The University of Hong Kong, Hong Kong, Hong Kong SAR
| | - Bolei Zhou
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Lei Xing
- School of Engineering, Stanford University, Stanford, CA, USA
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | - Arjun Chandrasekaran
- Perceiving Systems Department, Max Planck Institute for Intelligent Systems, Tübingen, Germany
| | - Michael J Black
- Perceiving Systems Department, Max Planck Institute for Intelligent Systems, Tübingen, Germany
| |
Collapse
|
2
|
Yu BXB, Liu Y, Chan KCC, Chen CW. EGCN++: A New Fusion Strategy for Ensemble Learning in Skeleton-Based Rehabilitation Exercise Assessment. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:6471-6485. [PMID: 38502632 DOI: 10.1109/tpami.2024.3378753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/21/2024]
Abstract
Skeleton-based exercise assessment focuses on evaluating the correctness or quality of an exercise performed by a subject. Skeleton data provide two groups of features (i.e., position and orientation), which existing methods have not fully harnessed. We previously proposed an ensemble-based graph convolutional network (EGCN) that considers both position and orientation features to construct a model-based approach. Integrating these types of features achieved better performance than available methods. However, EGCN lacked a fusion strategy across the data, feature, decision, and model levels. In this paper, we present an advanced framework, EGCN++, for rehabilitation exercise assessment. Based on EGCN, a new fusion strategy called MLE-PO is proposed for EGCN++; this technique considers fusion at the data and model levels. We conduct extensive cross-validation experiments and investigate the consistency between machine and human evaluations on three datasets: UI-PRMD, KIMORE, and EHE. Results demonstrate that MLE-PO outperforms other EGCN ensemble strategies and representative baselines. Furthermore, the MLE-PO's model evaluation scores are more quantitatively consistent with clinical evaluations than other ensemble strategies.
Collapse
|
3
|
Khodadadzadeh M, Sloan AT, Jones NA, Coyle D, Kelso JAS. Artificial intelligence detects awareness of functional relation with the environment in 3 month old babies. Sci Rep 2024; 14:15580. [PMID: 38971875 PMCID: PMC11227524 DOI: 10.1038/s41598-024-66312-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 07/01/2024] [Indexed: 07/08/2024] Open
Abstract
A recent experiment probed how purposeful action emerges in early life by manipulating infants' functional connection to an object in the environment (i.e., tethering an infant's foot to a colorful mobile). Vicon motion capture data from multiple infant joints were used here to create Histograms of Joint Displacements (HJDs) to generate pose-based descriptors for 3D infant spatial trajectories. Using HJDs as inputs, machine and deep learning systems were tasked with classifying the experimental state from which snippets of movement data were sampled. The architectures tested included k-Nearest Neighbour (kNN), Linear Discriminant Analysis (LDA), Fully connected network (FCNet), 1D-Convolutional Neural Network (1D-Conv), 1D-Capsule Network (1D-CapsNet), 2D-Conv and 2D-CapsNet. Sliding window scenarios were used for temporal analysis to search for topological changes in infant movement related to functional context. kNN and LDA achieved higher classification accuracy with single joint features, while deep learning approaches, particularly 2D-CapsNet, achieved higher accuracy on full-body features. For each AI architecture tested, measures of foot activity displayed the most distinct and coherent pattern alterations across different experimental stages (reflected in the highest classification accuracy rate), indicating that interaction with the world impacts the infant behaviour most at the site of organism~world connection.
Collapse
Affiliation(s)
- Massoud Khodadadzadeh
- School of Computer Science and Technology, University of Bedfordshire, Luton, LU1 3JU, UK.
- The Bath Institute for the Augmented Human, University of Bath, Bath, BA2 7AY, UK.
- Intelligent Systems Research Centre, Ulster University, Derry, Londonderry, BT48 7JL, UK.
| | - Aliza T Sloan
- Human Brain and Behaviour Laboratory, Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton, FL, 33431, US
| | - Nancy Aaron Jones
- Human Brain and Behaviour Laboratory, Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton, FL, 33431, US
| | - Damien Coyle
- The Bath Institute for the Augmented Human, University of Bath, Bath, BA2 7AY, UK
- Intelligent Systems Research Centre, Ulster University, Derry, Londonderry, BT48 7JL, UK
| | - J A Scott Kelso
- Human Brain and Behaviour Laboratory, Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton, FL, 33431, US
- Intelligent Systems Research Centre, Ulster University, Derry, Londonderry, BT48 7JL, UK
| |
Collapse
|
4
|
Huang X, Zhu J, Tian Z, Xu K, Liu Y. An adaptive algorithm for generating 3D point clouds of the human body based on 4D millimeter-wave radar. THE REVIEW OF SCIENTIFIC INSTRUMENTS 2024; 95:015117. [PMID: 38294291 DOI: 10.1063/5.0181265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 01/01/2024] [Indexed: 02/01/2024]
Abstract
The traditional algorithms for generating 3D human point clouds often face challenges in dealing with issues such as phantom targets and target classification caused by electromagnetic multipath effects, resulting in a lack of accuracy in the generated point clouds and requiring manual labeling of the position of the human body. To address these problems, this paper proposes an adaptive method for generating 3D human point clouds based on 4D millimeter-wave radar (Self-Adaptive mPoint, SA-mPoint). This method estimates the rough human point cloud by considering micro-motion and respiration characteristics while combining the echo dynamic with static information. Furthermore, it enhances the density of point cloud generation. It reduces interference from multipath noise through multi-frame dynamic fusion and an adaptive density-based clustering algorithm based on the center points of humans. The effectiveness of the SA-mPoint algorithm is verified through experiments conducted using the TI Millimeter Wave Cascade Imaging Radar Radio Frequency Evaluation Module 77G 4D cascade radar to collect challenging raw data consisting of single-target and multi-target human poses in an open classroom setting. Experimental results demonstrate that the proposed algorithm achieves an average accuracy rate of 97.94% for generating point clouds. Compared to the popular TI-mPoint algorithm, it generates a higher number of point clouds on average (increased by 87.94%), improves the average accuracy rate for generating point clouds (increased by 78.3%), and reduces the running time on average (reduced by 11.41%). This approach exhibits high practicality and promising application prospects.
Collapse
Affiliation(s)
- Xiaohong Huang
- School of Artificial Intelligence, North China University of Science and Technology, 063210 Tangshan, China
- Hebei Key Laboratory of Industrial Intelligent Perception, Tangshan 063210, China
| | - Jiachen Zhu
- School of Artificial Intelligence, North China University of Science and Technology, 063210 Tangshan, China
- Hebei Key Laboratory of Industrial Intelligent Perception, Tangshan 063210, China
| | - Ziran Tian
- School of Artificial Intelligence, North China University of Science and Technology, 063210 Tangshan, China
- Hebei Key Laboratory of Industrial Intelligent Perception, Tangshan 063210, China
| | - Kunqiang Xu
- School of Artificial Intelligence, North China University of Science and Technology, 063210 Tangshan, China
- Hebei Key Laboratory of Industrial Intelligent Perception, Tangshan 063210, China
| | - Yingchao Liu
- School of Artificial Intelligence, North China University of Science and Technology, 063210 Tangshan, China
| |
Collapse
|
5
|
Li Y, Yin R, Kim Y, Panda P. Efficient human activity recognition with spatio-temporal spiking neural networks. Front Neurosci 2023; 17:1233037. [PMID: 37781248 PMCID: PMC10536255 DOI: 10.3389/fnins.2023.1233037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 07/25/2023] [Indexed: 10/03/2023] Open
Abstract
In this study, we explore Human Activity Recognition (HAR), a task that aims to predict individuals' daily activities utilizing time series data obtained from wearable sensors for health-related applications. Although recent research has predominantly employed end-to-end Artificial Neural Networks (ANNs) for feature extraction and classification in HAR, these approaches impose a substantial computational load on wearable devices and exhibit limitations in temporal feature extraction due to their activation functions. To address these challenges, we propose the application of Spiking Neural Networks (SNNs), an architecture inspired by the characteristics of biological neurons, to HAR tasks. SNNs accumulate input activation as presynaptic potential charges and generate a binary spike upon surpassing a predetermined threshold. This unique property facilitates spatio-temporal feature extraction and confers the advantage of low-power computation attributable to binary spikes. We conduct rigorous experiments on three distinct HAR datasets using SNNs, demonstrating that our approach attains competitive or superior performance relative to ANNs, while concurrently reducing energy consumption by up to 94%.
Collapse
Affiliation(s)
- Yuhang Li
- Department of Electrical Engineering, Yale University, New Haven, CT, United States
| | | | | | | |
Collapse
|
6
|
Bhola G, Vishwakarma DK. A review of vision-based indoor HAR: state-of-the-art, challenges, and future prospects. MULTIMEDIA TOOLS AND APPLICATIONS 2023:1-41. [PMID: 37362688 PMCID: PMC10173923 DOI: 10.1007/s11042-023-15443-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Revised: 02/10/2023] [Accepted: 04/18/2023] [Indexed: 06/28/2023]
Abstract
With the advent of technology, we are getting more comfortable with the use of gadgets, cameras, etc., and find Artificial Intelligence as an integral part of most of the tasks we perform throughout the day. In such a scenario, the use of cameras and vision-based sensors comes as an escape from many real-time problems and challenges. One major application of these vision-based systems is Indoor Human Activity Recognition (HAR) which serves in a variety of scenarios ranging from smart homes, elderly care, assisted living, and human behavior pattern analysis for identifying any abnormal behavior to abnormal activity recognition like falling, slipping, domestic violence, etc. The effect of HAR in real time has made the area of indoor activity recognition a more explored zone by the industrial segment to attract users with their products in multiple domains. Hence, considering these aspects of HAR, this work proposes a detailed survey on indoor HAR. Through this work, we have highlighted the recent methodologies and their performance in the field of indoor activity recognition. We have also discussed- the challenges, detailed study of approaches with real-world applications of indoor-HAR, datasets available for indoor activity, and their technical details in this work. We have proposed a taxonomy for indoor HAR and highlighted the state-of-the-art and future prospects by mentioning the research gaps and the shortcomings of recent surveys with respect to our work.
Collapse
Affiliation(s)
- Geetanjali Bhola
- Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Bawana Road, Delhi, 11042 India
| | - Dinesh Kumar Vishwakarma
- Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Bawana Road, Delhi, 11042 India
| |
Collapse
|
7
|
Yu BXB, Liu Y, Zhang X, Zhong SH, Chan KCC. MMNet: A Model-Based Multimodal Network for Human Action Recognition in RGB-D Videos. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:3522-3538. [PMID: 35617191 DOI: 10.1109/tpami.2022.3177813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Human action recognition (HAR) in RGB-D videos has been widely investigated since the release of affordable depth sensors. Currently, unimodal approaches (e.g., skeleton-based and RGB video-based) have realized substantial improvements with increasingly larger datasets. However, multimodal methods specifically with model-level fusion have seldom been investigated. In this article, we propose a model-based multimodal network (MMNet) that fuses skeleton and RGB modalities via a model-based approach. The objective of our method is to improve ensemble recognition accuracy by effectively applying mutually complementary information from different data modalities. For the model-based fusion scheme, we use a spatiotemporal graph convolution network for the skeleton modality to learn attention weights that will be transferred to the network of the RGB modality. Extensive experiments are conducted on five benchmark datasets: NTU RGB+D 60, NTU RGB+D 120, PKU-MMD, Northwestern-UCLA Multiview, and Toyota Smarthome. Upon aggregating the results of multiple modalities, our method is found to outperform state-of-the-art approaches on six evaluation protocols of the five datasets; thus, the proposed MMNet can effectively capture mutually complementary features in different RGB-D video modalities and provide more discriminative features for HAR. We also tested our MMNet on an RGB video dataset Kinetics 400 that contains more outdoor actions, which shows consistent results with those of RGB-D video datasets.
Collapse
|
8
|
Skublewska-Paszkowska M, Powroznik P. Temporal Pattern Attention for Multivariate Time Series of Tennis Strokes Classification. SENSORS (BASEL, SWITZERLAND) 2023; 23:2422. [PMID: 36904626 PMCID: PMC10007534 DOI: 10.3390/s23052422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 02/07/2023] [Accepted: 02/20/2023] [Indexed: 06/18/2023]
Abstract
Human Action Recognition is a challenging task used in many applications. It interacts with many aspects of Computer Vision, Machine Learning, Deep Learning and Image Processing in order to understand human behaviours as well as identify them. It makes a significant contribution to sport analysis, by indicating players' performance level and training evaluation. The main purpose of this study is to investigate how the content of three-dimensional data influences on classification accuracy of four basic tennis strokes: forehand, backhand, volley forehand, and volley backhand. An entire player's silhouette and its combination with a tennis racket were taken into consideration as input to the classifier. Three-dimensional data were recorded using the motion capture system (Vicon Oxford, UK). The Plug-in Gait model consisting of 39 retro-reflective markers was used for the player's body acquisition. A seven-marker model was created for tennis racket capturing. The racket is represented in the form of a rigid body; therefore, all points associated with it changed their coordinates simultaneously. The Attention Temporal Graph Convolutional Network was applied for these sophisticated data. The highest accuracy, up to 93%, was achieved for the data of the whole player's silhouette together with a tennis racket. The obtained results indicated that for dynamic movements, such as tennis strokes, it is necessary to analyze the position of the whole body of the player as well as the racket position.
Collapse
|
9
|
Morshed MG, Sultana T, Alam A, Lee YK. Human Action Recognition: A Taxonomy-Based Survey, Updates, and Opportunities. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23042182. [PMID: 36850778 PMCID: PMC9963970 DOI: 10.3390/s23042182] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 02/12/2023] [Accepted: 02/12/2023] [Indexed: 06/12/2023]
Abstract
Human action recognition systems use data collected from a wide range of sensors to accurately identify and interpret human actions. One of the most challenging issues for computer vision is the automatic and precise identification of human activities. A significant increase in feature learning-based representations for action recognition has emerged in recent years, due to the widespread use of deep learning-based features. This study presents an in-depth analysis of human activity recognition that investigates recent developments in computer vision. Augmented reality, human-computer interaction, cybersecurity, home monitoring, and surveillance cameras are all examples of computer vision applications that often go in conjunction with human action detection. We give a taxonomy-based, rigorous study of human activity recognition techniques, discussing the best ways to acquire human action features, derived using RGB and depth data, as well as the latest research on deep learning and hand-crafted techniques. We also explain a generic architecture to recognize human actions in the real world and its current prominent research topic. At long last, we are able to offer some study analysis concepts and proposals for academics. In-depth researchers of human action recognition will find this review an effective tool.
Collapse
Affiliation(s)
- Md Golam Morshed
- Department of Computer Science and Engineering, Kyung Hee University, Global Campus, Yongin-si 17104, Republic of Korea
| | - Tangina Sultana
- Department of Computer Science and Engineering, Kyung Hee University, Global Campus, Yongin-si 17104, Republic of Korea
- Department of Electronics and Communication Engineering, Hajee Mohammad Danesh Science & Technology University, Dinajpur 5200, Bangladesh
| | - Aftab Alam
- Department of Computer Science and Engineering, Kyung Hee University, Global Campus, Yongin-si 17104, Republic of Korea
- Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University, Doha P.O. Box 34110, Qatar
| | - Young-Koo Lee
- Department of Computer Science and Engineering, Kyung Hee University, Global Campus, Yongin-si 17104, Republic of Korea
| |
Collapse
|
10
|
Wang X, Xu T, An D, Sun L, Wang Q, Pan Z, Yue Y. Face Mask Identification Using Spatial and Frequency Features in Depth Image from Time-of-Flight Camera. SENSORS (BASEL, SWITZERLAND) 2023; 23:1596. [PMID: 36772636 PMCID: PMC9918995 DOI: 10.3390/s23031596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Revised: 01/25/2023] [Accepted: 01/25/2023] [Indexed: 06/18/2023]
Abstract
Face masks can effectively prevent the spread of viruses. It is necessary to determine the wearing condition of masks in various locations, such as traffic stations, hospitals, and other places with a risk of infection. Therefore, achieving fast and accurate identification in different application scenarios is an urgent problem to be solved. Contactless mask recognition can avoid the waste of human resources and the risk of exposure. We propose a novel method for face mask recognition, which is demonstrated using the spatial and frequency features from the 3D information. A ToF camera with a simple system and robust data are used to capture the depth images. The facial contour of the depth image is extracted accurately by the designed method, which can reduce the dimension of the depth data to improve the recognition speed. Additionally, the classification process is further divided into two parts. The wearing condition of the mask is first identified by features extracted from the facial contour. The types of masks are then classified by new features extracted from the spatial and frequency curves. With appropriate thresholds and a voting method, the total recall accuracy of the proposed algorithm can achieve 96.21%. Especially, the recall accuracy for images without mask can reach 99.21%.
Collapse
Affiliation(s)
- Xiaoyan Wang
- Institute of Modern Optics, Nankai University, Tianjin 300350, China
| | - Tianxu Xu
- National Center for International Joint Research of Electronic Materials and Systems, School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou 450001, China
| | - Dong An
- Institute of Modern Optics, Nankai University, Tianjin 300350, China
| | - Lei Sun
- Shphotonics, LLC, Tianjin 300450, China
| | - Qiang Wang
- Angle AI (Tianjin) Technology Co., Ltd., Tianjin 300450, China
| | - Zhongqi Pan
- Department of Electrical & Computer Engineering, University of Louisiana at Lafayette, Lafayette, LA 70504, USA
| | - Yang Yue
- School of Information and Communications Engineering, Xi’an Jiaotong University, Xi’an 710049, China
| |
Collapse
|
11
|
New machine learning approaches for real-life human activity recognition using smartphone sensor-based data. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2023.110260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
12
|
Ma H, Yang Z, Liu H. Fine-Grained Unsupervised Temporal Action Segmentation and Distributed Representation for Skeleton-Based Human Motion Analysis. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:13411-13424. [PMID: 34932492 DOI: 10.1109/tcyb.2021.3132016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Understanding the fine-grained temporal structure of human actions and its semantic interpretation is beneficial to many real-world tasks, such as sports movements, rehabilitation exercises, and daily-life activities analysis. Current action segmentation methods mainly rely on deep neural networks to derive feature embedding of actions from motion data, while works on analyzing human actions in fine-granularity are still lacking due to the lack of clear and generic definitions of subactions and related datasets. On the other hand, the motion representations obtained in current action segmentation methods lack semantic or mathematical interpretability that can be used to evaluate action/subaction similarity in quantitative motion analysis. Toward the goal of fine-grained, interpretable, scalable, and efficient action segmentation, we propose a novel unsupervised action segmentation and distributed representation framework based on intuitive motion primitives defined on pose data. Metrics for comprehensive evaluation of the unsupervised fine-grained action segmentation task performance are proposed, and both public and self-constructed datasets are adopted in the experiments. The results show that the proposed method has good performance and generality across different subjects, datasets, and application scenarios.
Collapse
|
13
|
Haileslassie Gebrehiwot A, Bescos J, Garcia-Martin A. Robust Template Update Strategy for Efficient Visual Object Tracking. ARTIF INTELL 2022. [DOI: 10.5772/intechopen.101800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Real-time visual object tracking is an open problem in computer vision, with multiple applications in the industry, such as autonomous vehicles, human-machine interaction, intelligent cinematography, automated surveillance, and autonomous social navigation. The challenge of tracking a target of interest is critical to all of these applications. Recently, tracking algorithms that use siamese neural networks trained offline on large-scale datasets of image pairs have achieved the best performance exceeding real-time speed on multiple benchmarks. Results show that siamese approaches can be applied to enhance the tracking capabilities by learning deeper features of the object’s appearance. SiamMask utilized the power of siamese networks and supervised learning approaches to solve the problem of arbitrary object tracking in real-time speed. However, its practical applications are limited due to failures encountered during testing. In order to improve the robustness of the tracker and make it applicable for the intended real-world application, two improvements have been incorporated, each addressing a different aspect of the tracking task. The first one is a data augmentation strategy to consider both motion-blur and low-resolution during training. It aims to increase the robustness of the tracker against a motion-blurred and low-resolution frames during inference. The second improvement is a target template update strategy that utilizes both the initial ground truth template and a supplementary updatable template, which considers the score of the predicted target for an efficient template update strategy by avoiding template updates during severe occlusion. All of the improvements were extensively evaluated and have achieved state-of-the-art performance in the VOT2018 and VOT2019 benchmarks. Our method (VPU-SiamM) has been submitted to the VOT-ST 2020 challenge, and it is ranked 16th out of 38 submitted tracking methods according to the Expected average overlap (EAO) metrics. VPU_SiamM Implementation can be found from the VOT2020 Trackers repository1.
Collapse
|
14
|
Kulsoom F, Narejo S, Mehmood Z, Chaudhry HN, butt A, Bashir AK. A review of machine learning-based human activity recognition for diverse applications. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07665-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
15
|
Jin Z, Li Z, Gan T, Fu Z, Zhang C, He Z, Zhang H, Wang P, Liu J, Ye X. A Novel Central Camera Calibration Method Recording Point-to-Point Distortion for Vision-Based Human Activity Recognition. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22093524. [PMID: 35591215 PMCID: PMC9105339 DOI: 10.3390/s22093524] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 04/28/2022] [Accepted: 04/29/2022] [Indexed: 06/02/2023]
Abstract
The camera is the main sensor of vison-based human activity recognition, and its high-precision calibration of distortion is an important prerequisite of the task. Current studies have shown that multi-parameter model methods achieve higher accuracy than traditional methods in the process of camera calibration. However, these methods need hundreds or even thousands of images to optimize the camera model, which limits their practical use. Here, we propose a novel point-to-point camera distortion calibration method that requires only dozens of images to get a dense distortion rectification map. We have designed an objective function based on deformation between the original images and the projection of reference images, which can eliminate the effect of distortion when optimizing camera parameters. Dense features between the original images and the projection of the reference images are calculated by digital image correlation (DIC). Experiments indicate that our method obtains a comparable result with the multi-parameter model method using a large number of pictures, and contributes a 28.5% improvement to the reprojection error over the polynomial distortion model.
Collapse
Affiliation(s)
- Ziyi Jin
- Biosensor National Special Laboratory, Key Laboratory of Biomedical Engineering of Ministry of Education, Zhejiang University, Hangzhou 310027, China; (Z.J.); (T.G.); (Z.F.); (C.Z.); (Z.H.); (H.Z.); (P.W.)
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou 310027, China;
| | - Zhixue Li
- Independent Researcher, 181 Gaojiao Road, Yuhang District, Hangzhou 311122, China;
| | - Tianyuan Gan
- Biosensor National Special Laboratory, Key Laboratory of Biomedical Engineering of Ministry of Education, Zhejiang University, Hangzhou 310027, China; (Z.J.); (T.G.); (Z.F.); (C.Z.); (Z.H.); (H.Z.); (P.W.)
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou 310027, China;
| | - Zuoming Fu
- Biosensor National Special Laboratory, Key Laboratory of Biomedical Engineering of Ministry of Education, Zhejiang University, Hangzhou 310027, China; (Z.J.); (T.G.); (Z.F.); (C.Z.); (Z.H.); (H.Z.); (P.W.)
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou 310027, China;
| | - Chongan Zhang
- Biosensor National Special Laboratory, Key Laboratory of Biomedical Engineering of Ministry of Education, Zhejiang University, Hangzhou 310027, China; (Z.J.); (T.G.); (Z.F.); (C.Z.); (Z.H.); (H.Z.); (P.W.)
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou 310027, China;
| | - Zhongyu He
- Biosensor National Special Laboratory, Key Laboratory of Biomedical Engineering of Ministry of Education, Zhejiang University, Hangzhou 310027, China; (Z.J.); (T.G.); (Z.F.); (C.Z.); (Z.H.); (H.Z.); (P.W.)
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou 310027, China;
| | - Hong Zhang
- Biosensor National Special Laboratory, Key Laboratory of Biomedical Engineering of Ministry of Education, Zhejiang University, Hangzhou 310027, China; (Z.J.); (T.G.); (Z.F.); (C.Z.); (Z.H.); (H.Z.); (P.W.)
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou 310027, China;
| | - Peng Wang
- Biosensor National Special Laboratory, Key Laboratory of Biomedical Engineering of Ministry of Education, Zhejiang University, Hangzhou 310027, China; (Z.J.); (T.G.); (Z.F.); (C.Z.); (Z.H.); (H.Z.); (P.W.)
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou 310027, China;
| | - Jiquan Liu
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou 310027, China;
| | - Xuesong Ye
- Biosensor National Special Laboratory, Key Laboratory of Biomedical Engineering of Ministry of Education, Zhejiang University, Hangzhou 310027, China; (Z.J.); (T.G.); (Z.F.); (C.Z.); (Z.H.); (H.Z.); (P.W.)
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou 310027, China;
| |
Collapse
|
16
|
SiamSMDFFF: Siamese network tracker based on shallow-middle-deep three-level feature fusion and clustering-based adaptive rectangular window filtering. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.02.027] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
17
|
Tasneem NT, Biswas DK, Adhikari PR, Gunti A, Patwary AB, Reid RC, Mahbub I. A self-powered wireless motion sensor based on a high-surface area reverse electrowetting-on-dielectric energy harvester. Sci Rep 2022; 12:3782. [PMID: 35260661 PMCID: PMC8904818 DOI: 10.1038/s41598-022-07631-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 02/21/2022] [Indexed: 11/24/2022] Open
Abstract
This paper presents a motion-sensing device with the capability of harvesting energy from low-frequency motion activities. Based on the high surface area reverse electrowetting-on-dielectric (REWOD) energy harvesting technique, mechanical modulation of the liquid generates an AC signal, which is modeled analytically and implemented in Matlab and COMSOL. A constant DC voltage is produced by using a rectifier and a DC-DC converter to power up the motion-sensing read-out circuit. A charge amplifier converts the generated charge into a proportional output voltage, which is transmitted wirelessly to a remote receiver. The harvested DC voltage after the rectifier and DC-DC converter is found to be 3.3 V, having a measured power conversion efficiency (PCE) of the rectifier as high as 40.26% at 5 Hz frequency. The energy harvester demonstrates a linear relationship between the frequency of motion and the generated output power, making it highly suitable as a self-powered wearable motion sensor.
Collapse
Affiliation(s)
- Nishat T Tasneem
- Department of Electrical Engineering, University of North Texas, Denton, TX, 76201, USA.
| | - Dipon K Biswas
- Department of Electrical Engineering, University of North Texas, Denton, TX, 76201, USA
| | - Pashupati R Adhikari
- Department of Mechanical Engineering, University of North Texas, Denton, TX, 76201, USA
| | - Avinash Gunti
- Department of Electrical Engineering, University of North Texas, Denton, TX, 76201, USA
| | - Adnan B Patwary
- Department of Electrical Engineering, University of North Texas, Denton, TX, 76201, USA
| | - Russell C Reid
- Department of Engineering, Dixie State University, St. George, UT, 84770, USA
| | - Ifana Mahbub
- Department of Electrical Engineering, University of North Texas, Denton, TX, 76201, USA
| |
Collapse
|
18
|
Pereira D, De Pra Y, Tiberi E, Monaco V, Dario P, Ciuti G. Flipping food during grilling tasks, a dataset of utensils kinematics and dynamics, food pose and subject gaze. Sci Data 2022; 9:5. [PMID: 35022437 PMCID: PMC8755801 DOI: 10.1038/s41597-021-01101-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Accepted: 11/08/2021] [Indexed: 11/27/2022] Open
Abstract
This paper presents a multivariate dataset of 2866 food flipping movements, performed by 4 chefs and 5 home cooks, with different grilled food and two utensils (spatula and tweezers). The 3D trajectories of strategic points in the utensils were tracked using optoelectronic motion capture. The pinching force of the tweezers, the bending force and torsion torque of the spatula were also recorded, as well as videos and the subject gaze. These data were collected using a custom experimental setup that allowed the execution of flipping movements with freshly cooked food, without having the sensors near the dangerous cooking area. Complementary, the 2D position of food was computed from the videos. The action of flipping food is, indeed, gaining the attention of both researchers and manufacturers of foodservice technology. The reported dataset contains valuable measurements (1) to characterize and model flipping movements as performed by humans, (2) to develop bio-inspired methods to control a cooking robot, or (3) to study new algorithms for human actions recognition.
Collapse
Affiliation(s)
- Débora Pereira
- The BioRobotics Institute, Scuola Superiore Sant'Anna, Pisa, 56127, Italy.
- Department of Excellence in Robotics & AI, Scuola Superiore Sant'Anna, Pisa, 56127, Italy.
- The Research Hub by Electrolux Professional SpA, AD&T, Pordenone, 33170, Italy.
| | - Yuri De Pra
- The Research Hub by Electrolux Professional SpA, AD&T, Pordenone, 33170, Italy
- University of Udine, Department of Computer Science, Mathematics and Physics, Udine, 33100, Italy
| | - Emidio Tiberi
- The Research Hub by Electrolux Professional SpA, AD&T, Pordenone, 33170, Italy
| | - Vito Monaco
- The BioRobotics Institute, Scuola Superiore Sant'Anna, Pisa, 56127, Italy
- Department of Excellence in Robotics & AI, Scuola Superiore Sant'Anna, Pisa, 56127, Italy
| | - Paolo Dario
- The BioRobotics Institute, Scuola Superiore Sant'Anna, Pisa, 56127, Italy
- Department of Excellence in Robotics & AI, Scuola Superiore Sant'Anna, Pisa, 56127, Italy
| | - Gastone Ciuti
- The BioRobotics Institute, Scuola Superiore Sant'Anna, Pisa, 56127, Italy.
- Department of Excellence in Robotics & AI, Scuola Superiore Sant'Anna, Pisa, 56127, Italy.
| |
Collapse
|
19
|
Wen CH, Cheng CC, Shih YC. Artificial intelligence technologies for more flexible recommendation in uniforms. DATA TECHNOLOGIES AND APPLICATIONS 2022. [DOI: 10.1108/dta-09-2021-0230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeThis research aims to collect human body variables via 2D images captured by digital cameras. Based on those human variables, the forecast and recommendation of the Digital Camouflage Uniforms (DCU) for Taiwan's military personnel are made.Design/methodology/approachA total of 375 subjects are recruited (male: 253; female: 122). In this study, OpenPose converts the photographed 2D images into four body variables, which are compared with those of a tape measure and 3D scanning simultaneously. Then, the recommendation model of the DCU is built by the decision tree. Meanwhile, the Euclidean distance of each size of the DCU in the manufacturing specification is calculated as the best three recommendations.FindingsThe recommended size established by the decision tree is only 0.62 and 0.63. However, for the recommendation result of the best three options, the DCU Fitting Score can be as high as 0.8 or more. The results of OpenPose and 3D scanning have the highest correlation coefficient even though the method of measuring body size is different. This result confirms that OpenPose has significant measurement validity. That is, inexpensive equipment can be used to obtain reasonable results.Originality/valueIn general, the method proposed in this study is suitable for applications in e-commerce and the apparel industry in a long-distance, non-contact and non-pre-labeled manner when the world is facing Covid-19. In particular, it can reduce the measurement troubles of ordinary users when purchasing clothing online.
Collapse
|
20
|
|
21
|
Muangprathub J, Sriwichian A, Wanichsombat A, Kajornkasirat S, Nillaor P, Boonjing V. A Novel Elderly Tracking System Using Machine Learning to Classify Signals from Mobile and Wearable Sensors. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:12652. [PMID: 34886377 PMCID: PMC8656729 DOI: 10.3390/ijerph182312652] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 11/18/2021] [Accepted: 11/27/2021] [Indexed: 11/16/2022]
Abstract
A health or activity monitoring system is the most promising approach to assisting the elderly in their daily lives. The increase in the elderly population has increased the demand for health services so that the existing monitoring system is no longer able to meet the needs of sufficient care for the elderly. This paper proposes the development of an elderly tracking system using the integration of multiple technologies combined with machine learning to obtain a new elderly tracking system that covers aspects of activity tracking, geolocation, and personal information in an indoor and an outdoor environment. It also includes information and results from the collaboration of local agencies during the planning and development of the system. The results from testing devices and systems in a case study show that the k-nearest neighbor (k-NN) model with k = 5 was the most effective in classifying the nine activities of the elderly, with 96.40% accuracy. The developed system can monitor the elderly in real-time and can provide alerts. Furthermore, the system can display information of the elderly in a spatial format, and the elderly can use a messaging device to request help in an emergency. Our system supports elderly care with data collection, tracking and monitoring, and notification, as well as by providing supporting information to agencies relevant in elderly care.
Collapse
Affiliation(s)
- Jirapond Muangprathub
- Faculty of Science and Industrial Technology, Surat Thani Campus, Prince of Songkla University, Surat Thani 84000, Thailand; (A.S.); (A.W.); (S.K.)
- Integrated High-Value of Oleochemical (IHVO) Research Center, Surat Thani Campus, Prince of Songkla University, Surat Thani 84000, Thailand
| | - Anirut Sriwichian
- Faculty of Science and Industrial Technology, Surat Thani Campus, Prince of Songkla University, Surat Thani 84000, Thailand; (A.S.); (A.W.); (S.K.)
| | - Apirat Wanichsombat
- Faculty of Science and Industrial Technology, Surat Thani Campus, Prince of Songkla University, Surat Thani 84000, Thailand; (A.S.); (A.W.); (S.K.)
| | - Siriwan Kajornkasirat
- Faculty of Science and Industrial Technology, Surat Thani Campus, Prince of Songkla University, Surat Thani 84000, Thailand; (A.S.); (A.W.); (S.K.)
| | - Pichetwut Nillaor
- Faculty of Commerce and Management, Trang Campus, Prince of Songkla University, Trang 92000, Thailand;
| | - Veera Boonjing
- Department of Computer Engineering, School of Engineering, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand;
| |
Collapse
|
22
|
Oladele DA, Markus ED, Abu-Mahfouz AM. Adaptability of Assistive Mobility Devices and the Role of the Internet of Medical Things: Comprehensive Review. JMIR Rehabil Assist Technol 2021; 8:e29610. [PMID: 34779786 PMCID: PMC8663709 DOI: 10.2196/29610] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 06/29/2021] [Accepted: 09/12/2021] [Indexed: 01/22/2023] Open
Abstract
Background With the projected upsurge in the percentage of people with some form of disability, there has been a significant increase in the need for assistive mobility devices. However, for mobility aids to be effective, such devices should be adapted to the user’s needs. This can be achieved by improving the confidence of the acquired information (interaction between the user, the environment, and the device) following design specifications. Therefore, there is a need for literature review on the adaptability of assistive mobility devices. Objective In this study, we aim to review the adaptability of assistive mobility devices and the role of the internet of medical things in terms of the acquired information for assistive mobility devices. We review internet-enabled assistive mobility technologies and non–internet of things (IoT) assistive mobility devices. These technologies will provide awareness of the status of adaptive mobility technology and serve as a source and reference regarding information to health care professionals and researchers. Methods We performed a literature review search on the following databases of academic references and journals: Google Scholar, ScienceDirect, Institute of Electrical and Electronics Engineers, Springer, and websites of assistive mobility and foundations presenting studies on assistive mobility found through a generic Google search (including the World Health Organization website). The following keywords were used: assistive mobility OR assistive robots, assistive mobility devices, internet-enabled assistive mobility technologies, IoT Framework OR IoT Architecture AND for Healthcare, assisted navigation OR autonomous navigation, mobility AND aids OR devices, adaptability of assistive technology, adaptive mobility devices, pattern recognition, autonomous navigational systems, human-robot interfaces, motor rehabilitation devices, perception, and ambient assisted living. Results We identified 13,286 results (excluding titles that were not relevant to this study). Then, through a narrative review, we selected 189 potential studies (189/13,286, 1.42%) from the existing literature on the adaptability of assistive mobility devices and IoT frameworks for assistive mobility and conducted a critical analysis. Of the 189 potential studies, 82 (43.4%) were selected for analysis after meeting the inclusion criteria. On the basis of the type of technologies presented in the reviewed articles, we proposed a categorization of the adaptability of smart assistive mobility devices in terms of their interaction with the user (user system interface), perception techniques, and communication and sensing frameworks. Conclusions We discussed notable limitations of the reviewed literature studies. The findings revealed that an improvement in the adaptation of assistive mobility systems would require a reduction in training time and avoidance of cognitive overload. Furthermore, sensor fusion and classification accuracy are critical for achieving real-world testing requirements. Finally, the trade-off between cost and performance should be considered in the commercialization of these devices.
Collapse
Affiliation(s)
- Daniel Ayo Oladele
- Department of Electrical, Electronic and Computer Engineering, Central University of Technology, Bloemfontein, South Africa
| | - Elisha Didam Markus
- Department of Electrical, Electronic and Computer Engineering, Central University of Technology, Bloemfontein, South Africa
| | | |
Collapse
|
23
|
Tan J, Boominathan V, Baraniuk R, Veeraraghavan A. EDoF-ToF: extended depth of field time-of-flight imaging. OPTICS EXPRESS 2021; 29:38540-38556. [PMID: 34808905 DOI: 10.1364/oe.441515] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 10/24/2021] [Indexed: 06/13/2023]
Abstract
Conventional continuous-wave amplitude-modulated time-of-flight (CWAM ToF) cameras suffer from a fundamental trade-off between light throughput and depth of field (DoF): a larger lens aperture allows more light collection but suffers from significantly lower DoF. However, both high light throughput, which increases signal-to-noise ratio, and a wide DoF, which enlarges the system's applicable depth range, are valuable for CWAM ToF applications. In this work, we propose EDoF-ToF, an algorithmic method to extend the DoF of large-aperture CWAM ToF cameras by using a neural network to deblur objects outside of the lens's narrow focal region and thus produce an all-in-focus measurement. A key component of our work is the proposed large-aperture ToF training data simulator, which models the depth-dependent blurs and partial occlusions caused by such apertures. Contrary to conventional image deblurring where the blur model is typically linear, ToF depth maps are nonlinear functions of scene intensities, resulting in a nonlinear blur model that we also derive for our simulator. Unlike extended DoF for conventional photography where depth information needs to be encoded (or made depth-invariant) using additional hardware (phase masks, focal sweeping, etc.), ToF sensor measurements naturally encode depth information, allowing a completely software solution to extended DoF. We experimentally demonstrate EDoF-ToF increasing the DoF of a conventional ToF system by 3.6 ×, effectively achieving the DoF of a smaller lens aperture that allows 22.1 × less light. Ultimately, EDoF-ToF enables CWAM ToF cameras to enjoy the benefits of both high light throughput and a wide DoF.
Collapse
|
24
|
Bouchabou D, Nguyen SM, Lohr C, LeDuc B, Kanellos I. A Survey of Human Activity Recognition in Smart Homes Based on IoT Sensors Algorithms: Taxonomies, Challenges, and Opportunities with Deep Learning. SENSORS (BASEL, SWITZERLAND) 2021; 21:6037. [PMID: 34577243 PMCID: PMC8469092 DOI: 10.3390/s21186037] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Revised: 08/30/2021] [Accepted: 09/04/2021] [Indexed: 11/16/2022]
Abstract
Recent advances in Internet of Things (IoT) technologies and the reduction in the cost of sensors have encouraged the development of smart environments, such as smart homes. Smart homes can offer home assistance services to improve the quality of life, autonomy, and health of their residents, especially for the elderly and dependent. To provide such services, a smart home must be able to understand the daily activities of its residents. Techniques for recognizing human activity in smart homes are advancing daily. However, new challenges are emerging every day. In this paper, we present recent algorithms, works, challenges, and taxonomy of the field of human activity recognition in a smart home through ambient sensors. Moreover, since activity recognition in smart homes is a young field, we raise specific problems, as well as missing and needed contributions. However, we also propose directions, research opportunities, and solutions to accelerate advances in this field.
Collapse
Affiliation(s)
- Damien Bouchabou
- IMT Atlantique Engineer School, 29238 Brest, France; (C.L.); (I.K.)
- Delta Dore Company, 35270 Bonnemain, France;
| | - Sao Mai Nguyen
- IMT Atlantique Engineer School, 29238 Brest, France; (C.L.); (I.K.)
| | - Christophe Lohr
- IMT Atlantique Engineer School, 29238 Brest, France; (C.L.); (I.K.)
| | | | - Ioannis Kanellos
- IMT Atlantique Engineer School, 29238 Brest, France; (C.L.); (I.K.)
| |
Collapse
|
25
|
Early estimation model for 3D-discrete indian sign language recognition using graph matching. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2021. [DOI: 10.1016/j.jksuci.2018.06.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
26
|
Yadav SK, Tiwari K, Pandey HM, Akbar SA. A review of multimodal human activity recognition with special emphasis on classification, applications, challenges and future directions. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106970] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
|
27
|
Skeleton Driven Action Recognition Using an Image-Based Spatial-Temporal Representation and Convolution Neural Network. SENSORS 2021; 21:s21134342. [PMID: 34201991 PMCID: PMC8271982 DOI: 10.3390/s21134342] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 06/17/2021] [Accepted: 06/21/2021] [Indexed: 11/22/2022]
Abstract
Individuals with Autism Spectrum Disorder (ASD) typically present difficulties in engaging and interacting with their peers. Thus, researchers have been developing different technological solutions as support tools for children with ASD. Social robots, one example of these technological solutions, are often unaware of their game partners, preventing the automatic adaptation of their behavior to the user. Information that can be used to enrich this interaction and, consequently, adapt the system behavior is the recognition of different actions of the user by using RGB cameras or/and depth sensors. The present work proposes a method to automatically detect in real-time typical and stereotypical actions of children with ASD by using the Intel RealSense and the Nuitrack SDK to detect and extract the user joint coordinates. The pipeline starts by mapping the temporal and spatial joints dynamics onto a color image-based representation. Usually, the position of the joints in the final image is clustered into groups. In order to verify if the sequence of the joints in the final image representation can influence the model’s performance, two main experiments were conducted where in the first, the order of the grouped joints in the sequence was changed, and in the second, the joints were randomly ordered. In each experiment, statistical methods were used in the analysis. Based on the experiments conducted, it was found statistically significant differences concerning the joints sequence in the image, indicating that the order of the joints might impact the model’s performance. The final model, a Convolutional Neural Network (CNN), trained on the different actions (typical and stereotypical), was used to classify the different patterns of behavior, achieving a mean accuracy of 92.4% ± 0.0% on the test data. The entire pipeline ran on average at 31 FPS.
Collapse
|
28
|
Shaikh MB, Chai D. RGB-D Data-Based Action Recognition: A Review. SENSORS (BASEL, SWITZERLAND) 2021; 21:4246. [PMID: 34205782 PMCID: PMC8234200 DOI: 10.3390/s21124246] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Revised: 06/09/2021] [Accepted: 06/09/2021] [Indexed: 11/21/2022]
Abstract
Classification of human actions is an ongoing research problem in computer vision. This review is aimed to scope current literature on data fusion and action recognition techniques and to identify gaps and future research direction. Success in producing cost-effective and portable vision-based sensors has dramatically increased the number and size of datasets. The increase in the number of action recognition datasets intersects with advances in deep learning architectures and computational support, both of which offer significant research opportunities. Naturally, each action-data modality-such as RGB, depth, skeleton, and infrared (IR)-has distinct characteristics; therefore, it is important to exploit the value of each modality for better action recognition. In this paper, we focus solely on data fusion and recognition techniques in the context of vision with an RGB-D perspective. We conclude by discussing research challenges, emerging trends, and possible future research directions.
Collapse
|
29
|
Randhavane T, Bera A, Kubin E, Gray K, Manocha D. Modeling Data-Driven Dominance Traits for Virtual Characters Using Gait Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:2967-2979. [PMID: 31751243 DOI: 10.1109/tvcg.2019.2953063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We present a data-driven algorithm for generating gaits of virtual characters with varying dominance traits. Our formulation utilizes a user study to establish a data-driven dominance mapping between gaits and dominance labels. We use our dominance mapping to generate walking gaits for virtual characters that exhibit a variety of dominance traits while interacting with the user. Furthermore, we extract gait features based on known criteria in visual perception and psychology literature that can be used to identify the dominance levels of any walking gait. We validate our mapping and the perceived dominance traits by a second user study in an immersive virtual environment. Our gait dominance classification algorithm can classify the dominance traits of gaits with ˜73 percent accuracy. We also present an application of our approach that simulates interpersonal relationships between virtual characters. To the best of our knowledge, ours is the first practical approach to classifying gait dominance and generate dominance traits in virtual characters.
Collapse
|
30
|
Bulbul MF, Tabussum S, Ali H, Zheng W, Lee MY, Ullah A. Exploring 3D Human Action Recognition Using STACOG on Multi-View Depth Motion Maps Sequences. SENSORS 2021; 21:s21113642. [PMID: 34073799 PMCID: PMC8197175 DOI: 10.3390/s21113642] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 04/20/2021] [Accepted: 04/25/2021] [Indexed: 11/28/2022]
Abstract
This paper proposes an action recognition framework for depth map sequences using the 3D Space-Time Auto-Correlation of Gradients (STACOG) algorithm. First, each depth map sequence is split into two sets of sub-sequences of two different frame lengths individually. Second, a number of Depth Motion Maps (DMMs) sequences from every set are generated and are fed into STACOG to find an auto-correlation feature vector. For two distinct sets of sub-sequences, two auto-correlation feature vectors are obtained and applied gradually to L2-regularized Collaborative Representation Classifier (L2-CRC) for computing a pair of sets of residual values. Next, the Logarithmic Opinion Pool (LOGP) rule is used to combine the two different outcomes of L2-CRC and to allocate an action label of the depth map sequence. Finally, our proposed framework is evaluated on three benchmark datasets named MSR-action 3D dataset, DHA dataset, and UTD-MHAD dataset. We compare the experimental results of our proposed framework with state-of-the-art approaches to prove the effectiveness of the proposed framework. The computational efficiency of the framework is also analyzed for all the datasets to check whether it is suitable for real-time operation or not.
Collapse
Affiliation(s)
- Mohammad Farhad Bulbul
- Department of Mathematics, Jashore University of Science and Technology, Jashore 7408, Bangladesh; (M.F.B.); (S.T.)
| | - Sadiya Tabussum
- Department of Mathematics, Jashore University of Science and Technology, Jashore 7408, Bangladesh; (M.F.B.); (S.T.)
| | - Hazrat Ali
- Department of Electrical and Computer Engineering, Abbottabad Campus, COMSATS University Islamabad, Abbottabad 22060, Pakistan;
| | - Wenli Zheng
- School of Science, Xi’an Shiyou University, Xi’an 710065, China;
| | - Mi Young Lee
- Intelligent Media Laboratory, Department of Software, Sejong University, Seoul 143-747, Korea
- Correspondence: (M.Y.L.); (A.U.)
| | - Amin Ullah
- Intelligent Media Laboratory, Department of Software, Sejong University, Seoul 143-747, Korea
- CORIS Institute, Oregon State University, Corvallis, OR 97331, USA
- Correspondence: (M.Y.L.); (A.U.)
| |
Collapse
|
31
|
NPU RGBD Dataset and a Feature-Enhanced LSTM-DGCN Method for Action Recognition of Basketball Players+. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11104426] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Computer vision-based action recognition of basketball players in basketball training and competition has gradually become a research hotspot. However, owing to the complex technical action, diverse background, and limb occlusion, it remains a challenging task without effective solutions or public dataset benchmarks. In this study, we defined 32 kinds of atomic actions covering most of the complex actions for basketball players and built the dataset NPU RGB+D (a large scale dataset of basketball action recognition with RGB image data and Depth data captured in Northwestern Polytechnical University) for 12 kinds of actions of 10 professional basketball players with 2169 RGB+D videos and 75 thousand frames, including RGB frame sequences, depth maps, and skeleton coordinates. Through extracting the spatial features of the distances and angles between the joint points of basketball players, we created a new feature-enhanced skeleton-based method called LSTM-DGCN for basketball player action recognition based on the deep graph convolutional network (DGCN) and long short-term memory (LSTM) methods. Many advanced action recognition methods were evaluated on our dataset and compared with our proposed method. The experimental results show that the NPU RGB+D dataset is very competitive with the current action recognition algorithms and that our LSTM-DGCN outperforms the state-of-the-art action recognition methods in various evaluation criteria on our dataset. Our action classifications and this NPU RGB+D dataset are valuable for basketball player action recognition techniques. The feature-enhanced LSTM-DGCN has a more accurate action recognition effect, which improves the motion expression ability of the skeleton data.
Collapse
|
32
|
State Estimation Using a Randomized Unscented Kalman Filter for 3D Skeleton Posture. ELECTRONICS 2021. [DOI: 10.3390/electronics10080971] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In this study, we propose a method for minimizing the noise of Kinect sensors for 3D skeleton estimation. Notably, it is difficult to effectively remove nonlinear noise when estimating 3D skeleton posture; however, the proposed randomized unscented Kalman filter reduces the nonlinear temporal noise effectively through the state estimation process. The 3D skeleton data can then be estimated at each step by iteratively passing the posterior state during the propagation and updating process. Ultimately, the performance of the proposed method for 3D skeleton estimation is observed to be superior to that of conventional methods based on experimental results.
Collapse
|
33
|
Mehmood K, Jalil A, Ali A, Khan B, Murad M, Cheema KM, Milyani AH. Spatio-Temporal Context, Correlation Filter and Measurement Estimation Collaboration Based Visual Object Tracking. SENSORS 2021; 21:s21082841. [PMID: 33920648 PMCID: PMC8073341 DOI: 10.3390/s21082841] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 04/01/2021] [Accepted: 04/15/2021] [Indexed: 11/16/2022]
Abstract
Despite eminent progress in recent years, various challenges associated with object tracking algorithms such as scale variations, partial or full occlusions, background clutters, illumination variations are still required to be resolved with improved estimation for real-time applications. This paper proposes a robust and fast algorithm for object tracking based on spatio-temporal context (STC). A pyramid representation-based scale correlation filter is incorporated to overcome the STC’s inability on the rapid change of scale of target. It learns appearance induced by variations in the target scale sampled at a different set of scales. During occlusion, most correlation filter trackers start drifting due to the wrong update of samples. To prevent the target model from drift, an occlusion detection and handling mechanism are incorporated. Occlusion is detected from the peak correlation score of the response map. It continuously predicts target location during occlusion and passes it to the STC tracking model. After the successful detection of occlusion, an extended Kalman filter is used for occlusion handling. This decreases the chance of tracking failure as the Kalman filter continuously updates itself and the tracking model. Further improvement to the model is provided by fusion with average peak to correlation energy (APCE) criteria, which automatically update the target model to deal with environmental changes. Extensive calculations on the benchmark datasets indicate the efficacy of the proposed tracking method with state of the art in terms of performance analysis.
Collapse
Affiliation(s)
- Khizer Mehmood
- Department of Electrical Engineering, International Islamic University, Islamabad 44000, Pakistan; (A.J.); (B.K.); (M.M.)
- Correspondence: (K.M.); (K.M.C.)
| | - Abdul Jalil
- Department of Electrical Engineering, International Islamic University, Islamabad 44000, Pakistan; (A.J.); (B.K.); (M.M.)
| | - Ahmad Ali
- Department of Software Engineering, Bahria University, Islamabad 44000, Pakistan;
| | - Baber Khan
- Department of Electrical Engineering, International Islamic University, Islamabad 44000, Pakistan; (A.J.); (B.K.); (M.M.)
| | - Maria Murad
- Department of Electrical Engineering, International Islamic University, Islamabad 44000, Pakistan; (A.J.); (B.K.); (M.M.)
| | - Khalid Mehmood Cheema
- School of Electrical Engineering, Southeast University, Nanjing 210096, China
- Correspondence: (K.M.); (K.M.C.)
| | - Ahmad H. Milyani
- Department of Electrical and Computer Engineering, King Abdulaziz University, Jeddah 21589, Saudi Arabia;
| |
Collapse
|
34
|
Chen PW, Baune NA, Zwir I, Wang J, Swamidass V, Wong AW. Measuring Activities of Daily Living in Stroke Patients with Motion Machine Learning Algorithms: A Pilot Study. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18041634. [PMID: 33572116 PMCID: PMC7915561 DOI: 10.3390/ijerph18041634] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 02/04/2021] [Accepted: 02/05/2021] [Indexed: 11/20/2022]
Abstract
Measuring activities of daily living (ADLs) using wearable technologies may offer higher precision and granularity than the current clinical assessments for patients after stroke. This study aimed to develop and determine the accuracy of detecting different ADLs using machine-learning (ML) algorithms and wearable sensors. Eleven post-stroke patients participated in this pilot study at an ADL Simulation Lab across two study visits. We collected blocks of repeated activity (“atomic” activity) performance data to train our ML algorithms during one visit. We evaluated our ML algorithms using independent semi-naturalistic activity data collected at a separate session. We tested Decision Tree, Random Forest, Support Vector Machine (SVM), and eXtreme Gradient Boosting (XGBoost) for model development. XGBoost was the best classification model. We achieved 82% accuracy based on ten ADL tasks. With a model including seven tasks, accuracy improved to 90%. ADL tasks included chopping food, vacuuming, sweeping, spreading jam or butter, folding laundry, eating, brushing teeth, taking off/putting on a shirt, wiping a cupboard, and buttoning a shirt. Results provide preliminary evidence that ADL functioning can be predicted with adequate accuracy using wearable sensors and ML. The use of external validation (independent training and testing data sets) and semi-naturalistic testing data is a major strength of the study and a step closer to the long-term goal of ADL monitoring in real-world settings. Further investigation is needed to improve the ADL prediction accuracy, increase the number of tasks monitored, and test the model outside of a laboratory setting.
Collapse
Affiliation(s)
- Pin-Wei Chen
- PlatformSTL, St. Louis, MO 63110, USA; (P.-W.C.); (N.A.B.); (V.S.)
- Program in Occupational Therapy, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Nathan A. Baune
- PlatformSTL, St. Louis, MO 63110, USA; (P.-W.C.); (N.A.B.); (V.S.)
- Program in Occupational Therapy, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Igor Zwir
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO 63110, USA; (I.Z.); (J.W.)
- Department of Computer Science and Artificial Intelligence, University of Granada, 18010 Granada, Spain
| | - Jiayu Wang
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO 63110, USA; (I.Z.); (J.W.)
| | | | - Alex W.K. Wong
- Program in Occupational Therapy, Washington University School of Medicine, St. Louis, MO 63108, USA
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO 63110, USA; (I.Z.); (J.W.)
- Department of Neurology, Washington University School of Medicine, St. Louis, MO 63110, USA
- Center for Rehabilitation Outcomes Research, Shirley Ryan AbilityLab, Chicago, IL 60611, USA
- Correspondence:
| |
Collapse
|
35
|
Fu Z, He X, Wang E, Huo J, Huang J, Wu D. Personalized Human Activity Recognition Based on Integrated Wearable Sensor and Transfer Learning. SENSORS (BASEL, SWITZERLAND) 2021; 21:885. [PMID: 33525538 PMCID: PMC7865943 DOI: 10.3390/s21030885] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 01/04/2021] [Accepted: 01/22/2021] [Indexed: 11/16/2022]
Abstract
Human activity recognition (HAR) based on the wearable device has attracted more attention from researchers with sensor technology development in recent years. However, personalized HAR requires high accuracy of recognition, while maintaining the model's generalization capability is a major challenge in this field. This paper designed a compact wireless wearable sensor node, which combines an air pressure sensor and inertial measurement unit (IMU) to provide multi-modal information for HAR model training. To solve personalized recognition of user activities, we propose a new transfer learning algorithm, which is a joint probability domain adaptive method with improved pseudo-labels (IPL-JPDA). This method adds the improved pseudo-label strategy to the JPDA algorithm to avoid cumulative errors due to inaccurate initial pseudo-labels. In order to verify our equipment and method, we use the newly designed sensor node to collect seven daily activities of 7 subjects. Nine different HAR models are trained by traditional machine learning and transfer learning methods. The experimental results show that the multi-modal data improve the accuracy of the HAR system. The IPL-JPDA algorithm proposed in this paper has the best performance among five HAR models, and the average recognition accuracy of different subjects is 93.2%.
Collapse
Affiliation(s)
| | | | | | | | - Jian Huang
- Key Laboratory of Ministry of Education for Image Processing and Intelligent Control, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China; (Z.F.); (X.H.); (E.W.); (J.H.); (D.W.)
| | | |
Collapse
|
36
|
Vidhyapathi CM, Joseph Raj AN, Sundar S. The 3D-DTW Custom IP based FPGA Hardware Acceleration for Action Recognition. J Imaging Sci Technol 2021. [DOI: 10.2352/j.imagingsci.technol.2021.65.1.010401] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
|
37
|
Gonzalez S, Stegall P, Edwards H, Stirling L, Siu HC. Ablation Analysis to Select Wearable Sensors for Classifying Standing, Walking, and Running. SENSORS 2020; 21:s21010194. [PMID: 33396734 PMCID: PMC7796131 DOI: 10.3390/s21010194] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Revised: 12/18/2020] [Accepted: 12/19/2020] [Indexed: 11/16/2022]
Abstract
The field of human activity recognition (HAR) often utilizes wearable sensors and machine learning techniques in order to identify the actions of the subject. This paper considers the activity recognition of walking and running while using a support vector machine (SVM) that was trained on principal components derived from wearable sensor data. An ablation analysis is performed in order to select the subset of sensors that yield the highest classification accuracy. The paper also compares principal components across trials to inform the similarity of the trials. Five subjects were instructed to perform standing, walking, running, and sprinting on a self-paced treadmill, and the data were recorded while using surface electromyography sensors (sEMGs), inertial measurement units (IMUs), and force plates. When all of the sensors were included, the SVM had over 90% classification accuracy using only the first three principal components of the data with the classes of stand, walk, and run/sprint (combined run and sprint class). It was found that sensors that were placed only on the lower leg produce higher accuracies than sensors placed on the upper leg. There was a small decrease in accuracy when the force plates are ablated, but the difference may not be operationally relevant. Using only accelerometers without sEMGs was shown to decrease the accuracy of the SVM.
Collapse
Affiliation(s)
- Sarah Gonzalez
- Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA;
- Correspondence:
| | - Paul Stegall
- Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA;
| | - Harvey Edwards
- Lincoln Laboratory, Massachusetts Institute of Technology, 244 Wood Street, Lexington, MA 02421-6426, USA; (H.E.); (H.C.S.)
| | - Leia Stirling
- Department of Industrial and Operations Engineering, Robotics Institute, University of Michigan, 1205 Beal Avenue, Ann Arbor, MI 48109, USA;
| | - Ho Chit Siu
- Lincoln Laboratory, Massachusetts Institute of Technology, 244 Wood Street, Lexington, MA 02421-6426, USA; (H.E.); (H.C.S.)
| |
Collapse
|
38
|
Liu J, Shahroudy A, Perez M, Wang G, Duan LY, Kot AC. NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:2684-2701. [PMID: 31095476 DOI: 10.1109/tpami.2019.2916873] [Citation(s) in RCA: 158] [Impact Index Per Article: 39.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Research on depth-based human activity analysis achieved outstanding performance and demonstrated the effectiveness of 3D representation for action recognition. The existing depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of large-scale training samples, realistic number of distinct class categories, diversity in camera views, varied environmental conditions, and variety of human subjects. In this work, we introduce a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities. We evaluate the performance of a series of existing 3D activity analysis methods on this dataset, and show the advantage of applying deep learning methods for 3D-based human action recognition. Furthermore, we investigate a novel one-shot 3D activity recognition problem on our dataset, and a simple yet effective Action-Part Semantic Relevance-aware (APSR) framework is proposed for this task, which yields promising results for recognition of the novel action classes. We believe the introduction of this large-scale dataset will enable the community to apply, adapt, and develop various data-hungry learning techniques for depth-based and RGB+D-based human activity understanding.
Collapse
|
39
|
Orlandi A, Cross ES, Orgs G. Timing is everything: Dance aesthetics depend on the complexity of movement kinematics. Cognition 2020; 205:104446. [PMID: 32932073 DOI: 10.1016/j.cognition.2020.104446] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Revised: 06/26/2020] [Accepted: 08/20/2020] [Indexed: 12/31/2022]
Abstract
What constitutes a beautiful action? Research into dance aesthetics has largely focussed on subjective features like familiarity with the observed movement, but has rarely studied objective features like speed or acceleration. We manipulated the kinematic complexity of observed actions by creating dance sequences that varied in movement timing, but not in movement trajectory. Dance-naïve participants rated the dance videos on speed, effort, reproducibility, and enjoyment. Using linear mixed-effects modeling, we show that faster, more predictable movement sequences with varied velocity profiles are judged to be more effortful, less reproducible, and more aesthetically pleasing than slower sequences with more uniform velocity profiles. Accordingly, dance aesthetics depend not only on which movements are being performed but on how movements are executed and linked into sequences. The aesthetics of movement timing may apply across culturally-specific dance styles and predict both preference for and perceived difficulty of dance, consistent with information theory and effort heuristic accounts of aesthetic appreciation.
Collapse
Affiliation(s)
- Andrea Orlandi
- Neuro-MI, Milan Center for Neuroscience, Dept. of Psychology, University of Milano - Bicocca, Italy; Department of Psychology, Sapienza University of Rome, Italy.
| | - Emily S Cross
- Institute of Cognitive Neuroscience, School of Psychology, University of Glasgow, UK; Department of Cognitive Science, Macquarie University, Australia
| | - Guido Orgs
- Department of Psychology, Goldsmiths, University of London, UK
| |
Collapse
|
40
|
Ziegler J, Reiter A, Gattringer H, Müller A. Simultaneous identification of human body model parameters and gait trajectory from 3D motion capture data. Med Eng Phys 2020; 84:193-202. [PMID: 32977918 DOI: 10.1016/j.medengphy.2020.08.009] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 08/13/2020] [Accepted: 08/24/2020] [Indexed: 11/26/2022]
Abstract
The analysis of human movements rests on a realistic human body model. Deducing model parameters from anthropomorphic data is challenging since these are inherently imprecise. An approach to improve model accuracy is the parameter adaptation based on motion data. 3D motion capture data are already being used for generating the trajectories of a human body model, so combining motion tracking and parameter identification seems most natural. This paper introduces a holistic approach to simultaneously identify the geometric parameters of a kinematic human lower limb model and the parameters defining a (cyclic) gait trajectory, based on 3D marker positions. The result is a time-continuous description of a physiologically compatible lower extremity movement along with optimal model parameters so to best reproduce the captured motion. The method takes into account restrictions such as the range of motion of human body joints and is robust against missing data due to marker occlusions or failures of the measurement system. Considering multiple gait cycles of a movement trial, we derive the characteristic motion pattern (CMP) of a specific subject walking at a specific speed. Our method further allows for motion analysis and assessment, but also for motion synthesis with arbitrary time span and time resolution and can thus be used for simulations and trajectory planning of rehabilitation and movement assistance systems, such as exoskeletons.
Collapse
Affiliation(s)
- Jakob Ziegler
- Institute of Robotics, Johannes Kepler University Linz, Austria.
| | - Alexander Reiter
- Institute of Robotics, Johannes Kepler University Linz, Austria.
| | | | - Andreas Müller
- Institute of Robotics, Johannes Kepler University Linz, Austria.
| |
Collapse
|
41
|
Automatic Recognition of Human Interaction via Hybrid Descriptors and Maximum Entropy Markov Model Using Depth Sensors. ENTROPY 2020; 22:e22080817. [PMID: 33286588 PMCID: PMC7517385 DOI: 10.3390/e22080817] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/27/2020] [Revised: 07/18/2020] [Accepted: 07/24/2020] [Indexed: 01/03/2023]
Abstract
Automatic identification of human interaction is a challenging task especially in dynamic environments with cluttered backgrounds from video sequences. Advancements in computer vision sensor technologies provide powerful effects in human interaction recognition (HIR) during routine daily life. In this paper, we propose a novel features extraction method which incorporates robust entropy optimization and an efficient Maximum Entropy Markov Model (MEMM) for HIR via multiple vision sensors. The main objectives of proposed methodology are: (1) to propose a hybrid of four novel features-i.e., spatio-temporal features, energy-based features, shape based angular and geometric features-and a motion-orthogonal histogram of oriented gradient (MO-HOG); (2) to encode hybrid feature descriptors using a codebook, a Gaussian mixture model (GMM) and fisher encoding; (3) to optimize the encoded feature using a cross entropy optimization function; (4) to apply a MEMM classification algorithm to examine empirical expectations and highest entropy, which measure pattern variances to achieve outperformed HIR accuracy results. Our system is tested over three well-known datasets: SBU Kinect interaction; UoL 3D social activity; UT-interaction datasets. Through wide experimentations, the proposed features extraction algorithm, along with cross entropy optimization, has achieved the average accuracy rate of 91.25% with SBU, 90.4% with UoL and 87.4% with UT-Interaction datasets. The proposed HIR system will be applicable to a wide variety of man-machine interfaces, such as public-place surveillance, future medical applications, virtual reality, fitness exercises and 3D interactive gaming.
Collapse
|
42
|
van Schaik JE, Dominici N. Motion tracking in developmental research: Methods, considerations, and applications. PROGRESS IN BRAIN RESEARCH 2020; 254:89-111. [PMID: 32859295 DOI: 10.1016/bs.pbr.2020.06.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
In this chapter, we explore the use of motion tracking methodology in developmental research. With motion tracking, also called motion capture, human movements can be precisely recorded and analyzed. Motion tracking provides developmental researchers with objective measurements of motor and (socio-)cognitive development. It can further be used to create carefully-controlled stimuli videos and can offer means of measuring development outside of the lab. We discuss three types of motion tracking that lend themselves to developmental applications. First, marker-based systems track optical or electromagnetic markers or sensors placed on the body and offer high accuracy measurements. Second, markerless methods entail image processing of videos to track the movement of bodies without participants being hindered by physical markers. Third, inertial motion tracking measures three-dimensional movements and can be used in a variety of settings. The chapter concludes by examining three example topics from developmental literature in which motion tracking applications have contributed to our understanding of human development.
Collapse
Affiliation(s)
- Johanna E van Schaik
- Department of Educational and Family Studies, Faculty of Behavioural and Movement Sciences, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Nadia Dominici
- Department of Human Movement Sciences, Faculty of Behavioural and Movement Sciences, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.
| |
Collapse
|
43
|
Human activity recognition using improved complete ensemble EMD with adaptive noise and long short-term memory neural networks. Biocybern Biomed Eng 2020. [DOI: 10.1016/j.bbe.2020.04.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
44
|
Liu J, Shahroudy A, Wang G, Duan LY, Kot AC. Skeleton-Based Online Action Prediction Using Scale Selection Network. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:1453-1467. [PMID: 30762531 DOI: 10.1109/tpami.2019.2898954] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Action prediction is to recognize the class label of an ongoing activity when only a part of it is observed. In this paper, we focus on online action prediction in streaming 3D skeleton sequences. A dilated convolutional network is introduced to model the motion dynamics in temporal dimension via a sliding window over the temporal axis. Since there are significant temporal scale variations in the observed part of the ongoing action at different time steps, a novel window scale selection method is proposed to make our network focus on the performed part of the ongoing action and try to suppress the possible incoming interference from the previous actions at each step. An activation sharing scheme is also proposed to handle the overlapping computations among the adjacent time steps, which enables our framework to run more efficiently. Moreover, to enhance the performance of our framework for action prediction with the skeletal input data, a hierarchy of dilated tree convolutions are also designed to learn the multi-level structured semantic representations over the skeleton joints at each frame. Our proposed approach is evaluated on four challenging datasets. The extensive experiments demonstrate the effectiveness of our method for skeleton-based online action prediction.
Collapse
|
45
|
|
46
|
He J, Zhang C, He X, Dong R. Visual Recognition of traffic police gestures with convolutional pose machine and handcrafted features. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.07.103] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
47
|
Neira-Rodado D, Nugent C, Cleland I, Velasquez J, Viloria A. Evaluating the Impact of a Two-Stage Multivariate Data Cleansing Approach to Improve to the Performance of Machine Learning Classifiers: A Case Study in Human Activity Recognition. SENSORS (BASEL, SWITZERLAND) 2020; 20:s20071858. [PMID: 32230844 PMCID: PMC7180455 DOI: 10.3390/s20071858] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2019] [Revised: 03/12/2020] [Accepted: 03/13/2020] [Indexed: 06/10/2023]
Abstract
Human activity recognition (HAR) is a popular field of study. The outcomes of the projects in this area have the potential to impact on the quality of life of people with conditions such as dementia. HAR is focused primarily on applying machine learning classifiers on data from low level sensors such as accelerometers. The performance of these classifiers can be improved through an adequate training process. In order to improve the training process, multivariate outlier detection was used in order to improve the quality of data in the training set and, subsequently, performance of the classifier. The impact of the technique was evaluated with KNN and random forest (RF) classifiers. In the case of KNN, the performance of the classifier was improved from 55.9% to 63.59%.
Collapse
Affiliation(s)
- Dionicio Neira-Rodado
- Department of Industrial Agroindustrial and Operations Management GIAO, Universidad de la Costa, Barranquilla 080002, Colombia; (J.V.); (A.V.)
| | - Chris Nugent
- School of Computing, Ulster University, Shore Road, Newtownabbey, County Antrim BT37 0QB, Northern Ireland, UK; (C.N.); (I.C.)
| | - Ian Cleland
- School of Computing, Ulster University, Shore Road, Newtownabbey, County Antrim BT37 0QB, Northern Ireland, UK; (C.N.); (I.C.)
| | - Javier Velasquez
- Department of Industrial Agroindustrial and Operations Management GIAO, Universidad de la Costa, Barranquilla 080002, Colombia; (J.V.); (A.V.)
| | - Amelec Viloria
- Department of Industrial Agroindustrial and Operations Management GIAO, Universidad de la Costa, Barranquilla 080002, Colombia; (J.V.); (A.V.)
| |
Collapse
|
48
|
|
49
|
Taufique AMN, Minnehan B, Savakis A. Benchmarking Deep Trackers on Aerial Videos. SENSORS 2020; 20:s20020547. [PMID: 31963879 PMCID: PMC7014490 DOI: 10.3390/s20020547] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/14/2019] [Revised: 01/10/2020] [Accepted: 01/10/2020] [Indexed: 12/03/2022]
Abstract
In recent years, deep learning-based visual object trackers have achieved state-of-the-art performance on several visual object tracking benchmarks. However, most tracking benchmarks are focused on ground level videos, whereas aerial tracking presents a new set of challenges. In this paper, we compare ten trackers based on deep learning techniques on four aerial datasets. We choose top performing trackers utilizing different approaches, specifically tracking by detection, discriminative correlation filters, Siamese networks and reinforcement learning. In our experiments, we use a subset of OTB2015 dataset with aerial style videos; the UAV123 dataset without synthetic sequences; the UAV20L dataset, which contains 20 long sequences; and DTB70 dataset as our benchmark datasets. We compare the advantages and disadvantages of different trackers in different tracking situations encountered in aerial data. Our findings indicate that the trackers perform significantly worse in aerial datasets compared to standard ground level videos. We attribute this effect to smaller target size, camera motion, significant camera rotation with respect to the target, out of view movement, and clutter in the form of occlusions or similar looking distractors near tracked object.
Collapse
|
50
|
Computer Vision Intelligent Approaches to Extract Human Pose and Its Activity from Image Sequences. ELECTRONICS 2020. [DOI: 10.3390/electronics9010159] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The purpose of this work is to develop computational intelligence models based on neural networks (NN), fuzzy models (FM), support vector machines (SVM) and long short-term memory networks (LSTM) to predict human pose and activity from image sequences, based on computer vision approaches to gather the required features. To obtain the human pose semantics (output classes), based on a set of 3D points that describe the human body model (the input variables of the predictive model), prediction models were obtained from the acquired data, for example, video images. In the same way, to predict the semantics of the atomic activities that compose an activity, based again in the human body model extracted at each video frame, prediction models were learned using LSTM networks. In both cases the best learned models were implemented in an application to test the systems. The SVM model obtained 95.97% of correct classification of the six different human poses tackled in this work, during tests in different situations from the training phase. The implemented LSTM learned model achieved an overall accuracy of 88%, during tests in different situations from the training phase. These results demonstrate the validity of both approaches to predict human pose and activity from image sequences. Moreover, the system is capable of obtaining the atomic activities and quantifying the time interval in which each activity takes place.
Collapse
|