1
|
Goldbraikh A, Shubi O, Rubin O, Pugh CM, Laufer S. MS-TCRNet: Multi-Stage Temporal Convolutional Recurrent Networks for Action Segmentation Using Sensor-Augmented Kinematics. PATTERN RECOGNITION 2024; 156:110778. [PMID: 39494221 PMCID: PMC11526485 DOI: 10.1016/j.patcog.2024.110778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2024]
Abstract
Action segmentation is a challenging task in high-level process analysis, typically performed on video or kinematic data obtained from various sensors. This work presents two contributions related to action segmentation on kinematic data. Firstly, we introduce two versions of Multi-Stage Temporal Convolutional Recurrent Networks (MS-TCRNet), specifically designed for kinematic data. The architectures consist of a prediction generator with intra-stage regularization and Bidirectional LSTM or GRU-based refinement stages. Secondly, we propose two new data augmentation techniques, World Frame Rotation and Hand Inversion, which utilize the strong geometric structure of kinematic data to improve algorithm performance and robustness. We evaluate our models on three datasets of surgical suturing tasks: the Variable Tissue Simulation (VTS) Dataset and the newly introduced Bowel Repair Simulation (BRS) Dataset, both of which are open surgery simulation datasets collected by us, as well as the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), a well-known benchmark in robotic surgery. Our methods achieved state-of-the-art performance. code: https://github.com/AdamGoldbraikh/MS-TCRNet.
Collapse
Affiliation(s)
- Adam Goldbraikh
- Applied Mathematics Department at the Technion – Israel Institute of Technology, Haifa, 3200003, Israel
| | - Omer Shubi
- Faculty of Data and Decision Sciences at the Technion – Israel Institute of Technology, Haifa, 3200003, Israel
| | - Or Rubin
- Faculty of Data and Decision Sciences at the Technion – Israel Institute of Technology, Haifa, 3200003, Israel
| | - Carla M Pugh
- Stanford University, School of Medicine Stanford, Stanford, CA, USA
| | - Shlomi Laufer
- Faculty of Data and Decision Sciences at the Technion – Israel Institute of Technology, Haifa, 3200003, Israel
| |
Collapse
|
2
|
Chan S, Hang Y, Tong C, Acquah A, Schonfeldt A, Gershuny J, Doherty A. CAPTURE-24: A large dataset of wrist-worn activity tracker data collected in the wild for human activity recognition. Sci Data 2024; 11:1135. [PMID: 39414802 PMCID: PMC11484779 DOI: 10.1038/s41597-024-03960-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 10/02/2024] [Indexed: 10/18/2024] Open
Abstract
Existing activity tracker datasets for human activity recognition are typically obtained by having participants perform predefined activities in an enclosed environment under supervision. This results in small datasets with a limited number of activities and heterogeneity, lacking the mixed and nuanced movements normally found in free-living scenarios. As such, models trained on laboratory-style datasets may not generalise out of sample. To address this problem, we introduce a new dataset involving wrist-worn accelerometers, wearable cameras, and sleep diaries, enabling data collection for over 24 hours in a free-living setting. The result is CAPTURE-24, a large activity tracker dataset collected in the wild from 151 participants, amounting to 3883 hours of accelerometer data, of which 2562 hours are annotated. CAPTURE-24 is two to three orders of magnitude larger than existing publicly available datasets, which is critical to developing accurate human activity recognition models.
Collapse
Affiliation(s)
- Shing Chan
- Big Data Institute, University of Oxford, Oxford, UK
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Yuan Hang
- Big Data Institute, University of Oxford, Oxford, UK
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Catherine Tong
- Department of Computer Science, University of Oxford, Oxford, UK
| | - Aidan Acquah
- Big Data Institute, University of Oxford, Oxford, UK
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
- Department of Engineering Science, University of Oxford, Oxford, UK
| | - Abram Schonfeldt
- Big Data Institute, University of Oxford, Oxford, UK
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | | | - Aiden Doherty
- Big Data Institute, University of Oxford, Oxford, UK.
- Nuffield Department of Population Health, University of Oxford, Oxford, UK.
| |
Collapse
|
3
|
Muniasamy A. Revolutionizing health monitoring: Integrating transformer models with multi-head attention for precise human activity recognition using wearable devices. Technol Health Care 2024:THC241064. [PMID: 39269866 DOI: 10.3233/thc-241064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/15/2024]
Abstract
BACKGROUND A daily activity routine is vital for overall health and well-being, supporting physical and mental fitness. Consistent physical activity is linked to a multitude of benefits for the body, mind, and emotions, playing a key role in raising a healthy lifestyle. The use of wearable devices has become essential in the realm of health and fitness, facilitating the monitoring of daily activities. While convolutional neural networks (CNN) have proven effective, challenges remain in quickly adapting to a variety of activities. OBJECTIVE This study aimed to develop a model for precise recognition of human activities to revolutionize health monitoring by integrating transformer models with multi-head attention for precise human activity recognition using wearable devices. METHODS The Human Activity Recognition (HAR) algorithm uses deep learning to classify human activities using spectrogram data. It uses a pretrained convolution neural network (CNN) with a MobileNetV2 model to extract features, a dense residual transformer network (DRTN), and a multi-head multi-level attention architecture (MH-MLA) to capture time-related patterns. The model then blends information from both layers through an adaptive attention mechanism and uses a SoftMax function to provide classification probabilities for various human activities. RESULTS The integrated approach, combining pretrained CNN with transformer models to create a thorough and effective system for recognizing human activities from spectrogram data, outperformed these methods in various datasets - HARTH, KU-HAR, and HuGaDB produced accuracies of 92.81%, 97.98%, and 95.32%, respectively. This suggests that the integration of diverse methodologies yields good results in capturing nuanced human activities across different activities. The comparison analysis showed that the integrated system consistently performs better for dynamic human activity recognition datasets. CONCLUSION In conclusion, maintaining a routine of daily activities is crucial for overall health and well-being. Regular physical activity contributes substantially to a healthy lifestyle, benefiting both the body and the mind. The integration of wearable devices has simplified the monitoring of daily routines. This research introduces an innovative approach to human activity recognition, combining the CNN model with a dense residual transformer network (DRTN) with multi-head multi-level attention (MH-MLA) within the transformer architecture to enhance its capability.
Collapse
|
4
|
Mekruksavanich S, Phaphan W, Hnoohom N, Jitpattanakul A. Recognition of sports and daily activities through deep learning and convolutional block attention. PeerJ Comput Sci 2024; 10:e2100. [PMID: 38855220 PMCID: PMC11157566 DOI: 10.7717/peerj-cs.2100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 05/15/2024] [Indexed: 06/11/2024]
Abstract
Portable devices like accelerometers and physiological trackers capture movement and biometric data relevant to sports. This study uses data from wearable sensors to investigate deep learning techniques for recognizing human behaviors associated with sports and fitness. The proposed CNN-BiGRU-CBAM model, a unique hybrid architecture, combines convolutional neural networks (CNNs), bidirectional gated recurrent unit networks (BiGRUs), and convolutional block attention modules (CBAMs) for accurate activity recognition. CNN layers extract spatial patterns, BiGRU captures temporal context, and CBAM focuses on informative BiGRU features, enabling precise activity pattern identification. The novelty lies in seamlessly integrating these components to learn spatial and temporal relationships, prioritizing significant features for activity detection. The model and baseline deep learning models were trained on the UCI-DSA dataset, evaluating with 5-fold cross-validation, including multi-class classification accuracy, precision, recall, and F1-score. The CNN-BiGRU-CBAM model outperformed baseline models like CNN, LSTM, BiLSTM, GRU, and BiGRU, achieving state-of-the-art results with 99.10% accuracy and F1-score across all activity classes. This breakthrough enables accurate identification of sports and everyday activities using simplified wearables and advanced deep learning techniques, facilitating athlete monitoring, technique feedback, and injury risk detection. The proposed model's design and thorough evaluation significantly advance human activity recognition for sports and fitness.
Collapse
Affiliation(s)
- Sakorn Mekruksavanich
- Department of Computer Engineering, School of Information and Communication Technology, University of Phayao, Phayao, Thailand
| | - Wikanda Phaphan
- Department of Applied Statistics, Faculty of Applied Science, King Mongkut’s University of Technology North Bangkok, BangkokThailand
| | - Narit Hnoohom
- Department of Computer Engineering, Faculty of Engineering, Mahidol University, Nakhon Pathom, Thailand
| | - Anuchit Jitpattanakul
- Department of Mathematics, Faculty of Applied Science, King Mongkut’s University of Technology North Bangkok, Bangkok, Thailand
- Intelligent and Nonlinear Dynamic Innovations Research Center, Science and Technology Research Institute, King Mongkut’s University of Technology North Bangkok, Bangkok, Thailand
| |
Collapse
|
5
|
Azadi B, Haslgrübler M, Anzengruber-Tanase B, Sopidis G, Ferscha A. Robust Feature Representation Using Multi-Task Learning for Human Activity Recognition. SENSORS (BASEL, SWITZERLAND) 2024; 24:681. [PMID: 38276371 PMCID: PMC10819053 DOI: 10.3390/s24020681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 01/11/2024] [Accepted: 01/19/2024] [Indexed: 01/27/2024]
Abstract
Learning underlying patterns from sensory data is crucial in the Human Activity Recognition (HAR) task to avoid poor generalization when coping with unseen data. A key solution to such an issue is representation learning, which becomes essential when input signals contain activities with similar patterns or when patterns generated by different subjects for the same activity vary. To address these issues, we seek a solution to increase generalization by learning the underlying factors of each sensor signal. We develop a novel multi-channel asymmetric auto-encoder to recreate input signals precisely and extract indicative unsupervised futures. Further, we investigate the role of various activation functions in signal reconstruction to ensure the model preserves the patterns of each activity in the output. Our main contribution is that we propose a multi-task learning model to enhance representation learning through shared layers between signal reconstruction and the HAR task to improve the robustness of the model in coping with users not included in the training phase. The proposed model learns shared features between different tasks that are indeed the underlying factors of each input signal. We validate our multi-task learning model using several publicly available HAR datasets, UCI-HAR, MHealth, PAMAP2, and USC-HAD, and an in-house alpine skiing dataset collected in the wild, where our model achieved 99%, 99%, 95%, 88%, and 92% accuracy. Our proposed method shows consistent performance and good generalization on all the datasets compared to the state of the art.
Collapse
Affiliation(s)
- Behrooz Azadi
- Pro2Future GmbH, Altenberger Strasse 69, 4040 Linz, Austria; (M.H.); (B.A.-T.); (G.S.)
| | - Michael Haslgrübler
- Pro2Future GmbH, Altenberger Strasse 69, 4040 Linz, Austria; (M.H.); (B.A.-T.); (G.S.)
| | | | - Georgios Sopidis
- Pro2Future GmbH, Altenberger Strasse 69, 4040 Linz, Austria; (M.H.); (B.A.-T.); (G.S.)
| | - Alois Ferscha
- Institute of Pervasive Computing, Johannes Kepler University, Altenberger Straße 69, 4040 Linz, Austria;
| |
Collapse
|
6
|
Vuong TH, Doan T, Takasu A. Deep Wavelet Convolutional Neural Networks for Multimodal Human Activity Recognition Using Wearable Inertial Sensors. SENSORS (BASEL, SWITZERLAND) 2023; 23:9721. [PMID: 38139567 PMCID: PMC10747357 DOI: 10.3390/s23249721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 12/02/2023] [Accepted: 12/05/2023] [Indexed: 12/24/2023]
Abstract
Recent advances in wearable systems have made inertial sensors, such as accelerometers and gyroscopes, compact, lightweight, multimodal, low-cost, and highly accurate. Wearable inertial sensor-based multimodal human activity recognition (HAR) methods utilize the rich sensing data from embedded multimodal sensors to infer human activities. However, existing HAR approaches either rely on domain knowledge or fail to address the time-frequency dependencies of multimodal sensor signals. In this paper, we propose a novel method called deep wavelet convolutional neural networks (DWCNN) designed to learn features from the time-frequency domain and improve accuracy for multimodal HAR. DWCNN introduces a framework that combines continuous wavelet transforms (CWT) with enhanced deep convolutional neural networks (DCNN) to capture the dependencies of sensing signals in the time-frequency domain, thereby enhancing the feature representation ability for multiple wearable inertial sensor-based HAR tasks. Within the CWT, we further propose an algorithm to estimate the wavelet scale parameter. This helps enhance the performance of CWT when computing the time-frequency representation of the input signals. The output of the CWT then serves as input for the proposed DCNN, which consists of residual blocks for extracting features from different modalities and attention blocks for fusing these features of multimodal signals. We conducted extensive experiments on five benchmark HAR datasets: WISDM, UCI-HAR, Heterogeneous, PAMAP2, and UniMiB SHAR. The experimental results demonstrate the superior performance of the proposed model over existing competitors.
Collapse
Affiliation(s)
- Thi Hong Vuong
- Department of Informatics, National Institute of Informatics, Tokyo 101-0003, Japan;
| | - Tung Doan
- Department of Computer Engineering, School of Information and Communication Technology, Hanoi University of Science and Technology, Hanoi 11615, Vietnam;
| | - Atsuhiro Takasu
- Department of Informatics, National Institute of Informatics, Tokyo 101-0003, Japan;
| |
Collapse
|