1
|
Hartley T, Hicks Y, Davies JL, Cazzola D, Sheeran L. BACK-to-MOVE: Machine learning and computer vision model automating clinical classification of non-specific low back pain for personalised management. PLoS One 2024; 19:e0302899. [PMID: 38728282 PMCID: PMC11086851 DOI: 10.1371/journal.pone.0302899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 04/15/2024] [Indexed: 05/12/2024] Open
Abstract
BACKGROUND Low back pain (LBP) is a major global disability contributor with profound health and socio-economic implications. The predominant form is non-specific LBP (NSLBP), lacking treatable pathology. Active physical interventions tailored to individual needs and capabilities are crucial for its management. However, the intricate nature of NSLBP and complexity of clinical classification systems necessitating extensive clinical training, hinder customised treatment access. Recent advancements in machine learning and computer vision demonstrate promise in characterising NSLBP altered movement patters through wearable sensors and optical motion capture. This study aimed to develop and evaluate a machine learning model (i.e., 'BACK-to-MOVE') for NSLBP classification trained with expert clinical classification, spinal motion data from a standard video alongside patient-reported outcome measures (PROMs). METHODS Synchronised video and three-dimensional (3D) motion data was collected during forward spinal flexion from 83 NSLBP patients. Two physiotherapists independently classified them as motor control impairment (MCI) or movement impairment (MI), with conflicts resolved by a third expert. The Convolutional Neural Networks (CNNs) architecture, HigherHRNet, was chosen for effective pose estimation from video data. The model was validated against 3D motion data (subset of 62) and trained on the freely available MS-COCO dataset for feature extraction. The Back-to-Move classifier underwent fine-tuning through feed-forward neural networks using labelled examples from the training dataset. Evaluation utilised 5-fold cross-validation to assess accuracy, specificity, sensitivity, and F1 measure. RESULTS Pose estimation's Mean Square Error of 0.35 degrees against 3D motion data demonstrated strong criterion validity. Back-to-Move proficiently differentiated MI and MCI classes, yielding 93.98% accuracy, 96.49% sensitivity (MI detection), 88.46% specificity (MCI detection), and an F1 measure of .957. Incorporating PROMs curtailed classifier performance (accuracy: 68.67%, sensitivity: 91.23%, specificity: 18.52%, F1: .800). CONCLUSION This study is the first to demonstrate automated clinical classification of NSLBP using computer vision and machine learning with standard video data, achieving accuracy comparable to expert consensus. Automated classification of NSLBP based on altered movement patters video-recorded during routine clinical examination could expedite personalised NSLBP rehabilitation management, circumventing existing healthcare constraints. This advancement holds significant promise for patients and healthcare services alike.
Collapse
Affiliation(s)
- Thomas Hartley
- School of Engineering, Cardiff University, Cardiff, United Kingdom
| | - Yulia Hicks
- School of Engineering, Cardiff University, Cardiff, United Kingdom
| | - Jennifer L. Davies
- School of Healthcare Sciences, Cardiff University, Cardiff, United Kingdom
- Biomechanics and Bioengineering Research Centre Versus Arthritis, Cardiff University, Cardiff, United Kingdom
| | - Dario Cazzola
- Department for Health, University of Bath, Bath, United Kingdom
- Centre for Health, Injury and Illness Prevention in Sport, University of Bath, Bath, United Kingdom
| | - Liba Sheeran
- School of Healthcare Sciences, Cardiff University, Cardiff, United Kingdom
- Biomechanics and Bioengineering Research Centre Versus Arthritis, Cardiff University, Cardiff, United Kingdom
| |
Collapse
|
2
|
El Kaid A, Baïna K. A Systematic Review of Recent Deep Learning Approaches for 3D Human Pose Estimation. J Imaging 2023; 9:275. [PMID: 38132693 PMCID: PMC10743718 DOI: 10.3390/jimaging9120275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 12/07/2023] [Accepted: 12/08/2023] [Indexed: 12/23/2023] Open
Abstract
Three-dimensional human pose estimation has made significant advancements through the integration of deep learning techniques. This survey provides a comprehensive review of recent 3D human pose estimation methods, with a focus on monocular images, videos, and multi-view cameras. Our approach stands out through a systematic literature review methodology, ensuring an up-to-date and meticulous overview. Unlike many existing surveys that categorize approaches based on learning paradigms, our survey offers a fresh perspective, delving deeper into the subject. For image-based approaches, we not only follow existing categorizations but also introduce and compare significant 2D models. Additionally, we provide a comparative analysis of these methods, enhancing the understanding of image-based pose estimation techniques. In the realm of video-based approaches, we categorize them based on the types of models used to capture inter-frame information. Furthermore, in the context of multi-person pose estimation, our survey uniquely differentiates between approaches focusing on relative poses and those addressing absolute poses. Our survey aims to serve as a pivotal resource for researchers, highlighting state-of-the-art deep learning strategies and identifying promising directions for future exploration in 3D human pose estimation.
Collapse
Affiliation(s)
- Amal El Kaid
- Alqualsadi Research Team, Rabat IT Center, ENSIAS, Mohammed V University in Rabat, Rabat 10112, Morocco;
| | | |
Collapse
|
3
|
Kulbacki M, Segen J, Chaczko Z, Rozenblit JW, Kulbacki M, Klempous R, Wojciechowski K. Intelligent Video Analytics for Human Action Recognition: The State of Knowledge. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23094258. [PMID: 37177461 PMCID: PMC10181781 DOI: 10.3390/s23094258] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Revised: 04/09/2023] [Accepted: 04/21/2023] [Indexed: 05/15/2023]
Abstract
The paper presents a comprehensive overview of intelligent video analytics and human action recognition methods. The article provides an overview of the current state of knowledge in the field of human activity recognition, including various techniques such as pose-based, tracking-based, spatio-temporal, and deep learning-based approaches, including visual transformers. We also discuss the challenges and limitations of these techniques and the potential of modern edge AI architectures to enable real-time human action recognition in resource-constrained environments.
Collapse
Affiliation(s)
- Marek Kulbacki
- Polish-Japanese Academy of Information Technology, 02-008 Warsaw, Poland
- DIVE IN AI, 53-307 Wroclaw, Poland
| | - Jakub Segen
- Polish-Japanese Academy of Information Technology, 02-008 Warsaw, Poland
- DIVE IN AI, 53-307 Wroclaw, Poland
| | - Zenon Chaczko
- DIVE IN AI, 53-307 Wroclaw, Poland
- School of Electrical and Data Engineering, University of Technology Sydney, Ultimo 2007, Australia
| | - Jerzy W Rozenblit
- Department of Electrical and Computer Engineering, The University of Arizona, Tucson, AZ 85721, USA
| | | | - Ryszard Klempous
- Wrocław University of Science and Technology, 50-370 Wroclaw, Poland
| | | |
Collapse
|
4
|
Pal R, Adhikari D, Heyat MBB, Ullah I, You Z. Yoga Meets Intelligent Internet of Things: Recent Challenges and Future Directions. Bioengineering (Basel) 2023; 10:459. [PMID: 37106646 PMCID: PMC10135646 DOI: 10.3390/bioengineering10040459] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 03/30/2023] [Accepted: 04/06/2023] [Indexed: 04/29/2023] Open
Abstract
The physical and mental health of people can be enhanced through yoga, an excellent form of exercise. As part of the breathing procedure, yoga involves stretching the body organs. The guidance and monitoring of yoga are crucial to ripe the full benefits of it, as wrong postures possess multiple antagonistic effects, including physical hazards and stroke. The detection and monitoring of the yoga postures are possible with the Intelligent Internet of Things (IIoT), which is the integration of intelligent approaches (machine learning) and the Internet of Things (IoT). Considering the increment in yoga practitioners in recent years, the integration of IIoT and yoga has led to the successful implementation of IIoT-based yoga training systems. This paper provides a comprehensive survey on integrating yoga with IIoT. The paper also discusses the multiple types of yoga and the procedure for the detection of yoga using IIoT. Additionally, this paper highlights various applications of yoga, safety measures, various challenges, and future directions. This survey provides the latest developments and findings on yoga and its integration with IIoT.
Collapse
Affiliation(s)
- Rishi Pal
- Center of Psychosomatic Medicine, Sichuan Provincial Center for Mental Health, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Deepak Adhikari
- School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610056, China
| | - Md Belal Bin Heyat
- IoT Research Center, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
| | - Inam Ullah
- Department of Computer Engineering, Gachon University, Sujeong-gu, Seongnam 13120, Republic of Korea
| | - Zili You
- Center of Psychosomatic Medicine, Sichuan Provincial Center for Mental Health, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
5
|
Giulietti N, Caputo A, Chiariotti P, Castellini P. SwimmerNET: Underwater 2D Swimmer Pose Estimation Exploiting Fully Convolutional Neural Networks. SENSORS (BASEL, SWITZERLAND) 2023; 23:2364. [PMID: 36850962 PMCID: PMC9966167 DOI: 10.3390/s23042364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 02/17/2023] [Accepted: 02/19/2023] [Indexed: 06/18/2023]
Abstract
Professional swimming coaches make use of videos to evaluate their athletes' performances. Specifically, the videos are manually analyzed in order to observe the movements of all parts of the swimmer's body during the exercise and to give indications for improving swimming technique. This operation is time-consuming, laborious and error prone. In recent years, alternative technologies have been introduced in the literature, but they still have severe limitations that make their correct and effective use impossible. In fact, the currently available techniques based on image analysis only apply to certain swimming styles; moreover, they are strongly influenced by disturbing elements (i.e., the presence of bubbles, splashes and reflections), resulting in poor measurement accuracy. The use of wearable sensors (accelerometers or photoplethysmographic sensors) or optical markers, although they can guarantee high reliability and accuracy, disturb the performance of the athletes, who tend to dislike these solutions. In this work we introduce swimmerNET, a new marker-less 2D swimmer pose estimation approach based on the combined use of computer vision algorithms and fully convolutional neural networks. By using a single 8 Mpixel wide-angle camera, the proposed system is able to estimate the pose of a swimmer during exercise while guaranteeing adequate measurement accuracy. The method has been successfully tested on several athletes (i.e., different physical characteristics and different swimming technique), obtaining an average error and a standard deviation (worst case scenario for the dataset analyzed) of approximately 1 mm and 10 mm, respectively.
Collapse
Affiliation(s)
- Nicola Giulietti
- Department of Mechanical Engineering, Politecnico di Milano, Via La Masa 1, 20156 Milan, Italy
| | - Alessia Caputo
- Department of Industrial Engineering and Mathematical Science, Università Politecnica delle Marche, Via Brecce Bianche 12, 60131 Ancona, Italy
| | - Paolo Chiariotti
- Department of Mechanical Engineering, Politecnico di Milano, Via La Masa 1, 20156 Milan, Italy
| | - Paolo Castellini
- Department of Industrial Engineering and Mathematical Science, Università Politecnica delle Marche, Via Brecce Bianche 12, 60131 Ancona, Italy
| |
Collapse
|
6
|
Bhatt A, Ganatra A. Weapon operating pose detection and suspicious human activity classification using skeleton graphs. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:2669-2690. [PMID: 36899552 DOI: 10.3934/mbe.2023125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Spurt upsurge in violent protest and armed conflict in populous, civil areas has upstretched momentous concern worldwide. The unrelenting strategy of the law enforcement agencies focuses on thwarting the conspicuous impact of violent events. Increased surveillance using a widespread visual network supports the state actors in maintaining vigilance. Minute, simultaneous monitoring of numerous surveillance feeds is a workforce-intensive, idiosyncratic, and otiose method. Significant advancements in Machine Learning (ML) show potential in realizing precise models to detect suspicious activities in the mob. Existing pose estimation techniques have privations in detecting weapon operation activity. The paper proposes a comprehensive, customized human activity recognition approach using human body skeleton graphs. The VGG-19 backbone extracted 6600 body coordinates from the customized dataset. The methodology categorizes human activities into eight classes experienced during violent clashes. It facilitates alarm triggers in a specific activity, i.e., stone pelting or weapon handling while walking, standing, and kneeling is considered a regular activity. The end-to-end pipeline presents a robust model for multiple human tracking, mapping a skeleton graph for each person in consecutive surveillance video frames with the improved categorization of suspicious human activities, realizing effective crowd management. LSTM-RNN Network, trained on a customized dataset superimposed with Kalman filter, attained 89.09% accuracy for real-time pose identification.
Collapse
Affiliation(s)
- Anant Bhatt
- Devang Patel Institute of Advance Technology and Research (DEPSTAR), Charotar University of Science and Technology (CHARUSAT), Nadiad Petlad Road, Changa, Gujarat-388421, India
| | - Amit Ganatra
- Devang Patel Institute of Advance Technology and Research (DEPSTAR), Charotar University of Science and Technology (CHARUSAT), Nadiad Petlad Road, Changa, Gujarat-388421, India
| |
Collapse
|
7
|
A Comprehensive Survey on Single-Person Pose Estimation in Social Robotics. Int J Soc Robot 2022. [DOI: 10.1007/s12369-020-00739-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
8
|
Kronenberg R, Kuflik T, Shimshoni I. Improving office workers’ workspace using a self-adjusting computer screen. ACM T INTERACT INTEL 2022. [DOI: 10.1145/3545993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
With the rapid evolution of technology, computers and their users’ workspaces have become an essential part of our life in general. Today, many people use computers both for work and for personal needs, spending long hours sitting at a desk in front of a computer screen, changing their pose slightly from time to time. This phenomenon impacts people’s health negatively, adversely affecting their musculoskeletal and ocular systems. To mitigate these risks, several different ergonomic solutions have been suggested. This study proposes, demonstrates, and evaluates a technological solution that automatically adjusts the computer screen position and orientation to its user’s current pose, using a simple RGB camera and robotic arm. The automatic adjustment will reduce the physical load on users and better fit their changing poses. The user’s pose is extracted from images continuously acquired by the system’s camera. The most suitable screen position is calculated according to the user’s pose and ergonomic guidelines. Thereafter, the robotic arm adjusts the screen accordingly. The evaluation was done through a user study with 35 users who rated both the idea and the prototype system itself highly.
Collapse
|
9
|
Kim M, Lee S. Fusion Poser: 3D Human Pose Estimation Using Sparse IMUs and Head Trackers in Real Time. SENSORS 2022; 22:s22134846. [PMID: 35808342 PMCID: PMC9269439 DOI: 10.3390/s22134846] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 06/20/2022] [Accepted: 06/21/2022] [Indexed: 02/01/2023]
Abstract
The motion capture method using sparse inertial sensors is an approach for solving the occlusion and economic problems in vision-based methods, which is suitable for virtual reality applications and works in complex environments. However, VR applications need to track the location of the user in real-world space, which is hard to obtain using only inertial sensors. In this paper, we present Fusion Poser, which combines the deep learning-based pose estimation and location tracking method with six inertial measurement units and a head tracking sensor that provides head-mounted displays. To estimate human poses, we propose a bidirectional recurrent neural network with a convolutional long short-term memory layer that achieves higher accuracy and stability by preserving spatio-temporal properties. To locate a user with real-world coordinates, our method integrates the results of an estimated joint pose with the pose of the tracker. To train the model, we gathered public motion capture datasets of synthesized IMU measurement data, as well as creating a real-world dataset. In the evaluation, our method showed higher accuracy and a more robust estimation performance, especially when the user adopted lower poses, such as a squat or a bow.
Collapse
|
10
|
Yoga Pose Estimation and Feedback Generation Using Deep Learning. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:4311350. [PMID: 35371230 PMCID: PMC8970937 DOI: 10.1155/2022/4311350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Revised: 02/17/2022] [Accepted: 02/26/2022] [Indexed: 11/17/2022]
Abstract
Yoga is a 5000-year-old practice developed in ancient India by the Indus-Sarasvati civilization. The word yoga means deep association and union of mind with the body. It is used to keep both mind and body in equilibration in all flip-flops of life by means of asana, meditation, and several other techniques. Nowadays, yoga has gained worldwide attention due to increased stress levels in the modern lifestyle, and there are numerous methods or resources for learning yoga. Yoga can be practiced in yoga centers, through personal tutors, and can also be learned on one's own with the help of the Internet, books, recorded clips, etc. In fast-paced lifestyles, many people prefer self-learning because the abovementioned resources might not be available all the time. But in self-learning, one may not find an incorrect pose. Incorrect posture can be harmful to one's health, resulting in acute pain and long-term chronic concerns. In this paper, deep learning-based techniques are developed to detect incorrect yoga posture. With this method, the users can select the desired pose for practice and can upload recorded videos of their yoga practice pose. The user pose is sent to train models that output the abnormal angles detected between the actual pose and the user pose. With these outputs, the system advises the user to improve the pose by specifying where the yoga pose is going wrong. The proposed method was compared to several state-of-the-art methods, and it achieved outstanding accuracy of 0.9958 while requiring less computational complexity.
Collapse
|
11
|
Detection of Physical Strain and Fatigue in Industrial Environments Using Visual and Non-Visual Low-Cost Sensors. TECHNOLOGIES 2022. [DOI: 10.3390/technologies10020042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The detection and prevention of workers’ body straining postures and other stressing conditions within the work environment, supports establishing occupational safety and promoting well being and sustainability at work. Developed methods towards this aim typically rely on combining highly ergonomic workplaces and expensive monitoring mechanisms including wearable devices. In this work, we demonstrate how the input from low-cost sensors, specifically, passive camera sensors installed in a real manufacturing workplace, and smartwatches used by the workers can provide useful feedback on the workers’ conditions and can yield key indicators for the prevention of work-related musculo-skeletal disorders (WMSD) and physical fatigue. To this end, we study the ability to assess the risk for physical strain of workers online during work activities based on the classification of ergonomically sub-optimal working postures using visual information, the correlation and fusion of these estimations with synchronous worker heart rate data, as well as the prediction of near-future heart rate using deep learning-based techniques. Moreover, a new multi-modal dataset of video and heart rate data captured in a real manufacturing workplace during car door assembly activities is introduced. The experimental results show the efficiency of the proposed approach that exceeds 70% of classification rate based on the F1 score measure using a set of over 300 annotated video clips of real line workers during work activities. In addition a time lagging correlation between the estimated ergonomic risks for physical strain and high heart rate was assessed using a larger dataset of synchronous visual and heart rate data sequences. The statistical analysis revealed that imposing increased strain to body parts will results in an increase to the heart rate after 100–120 s. This finding is used to improve the short term forecasting of worker’s cardiovascular activity for the next 10 to 30 s by fusing the heart rate data with the estimated ergonomic risks for physical strain and ultimately to train better predictive models for worker fatigue.
Collapse
|
12
|
Park C, Lee HS, Kim WJ, Bae HB, Lee J, Lee S. An Efficient Approach Using Knowledge Distillation Methods to Stabilize Performance in a Lightweight Top-Down Posture Estimation Network. SENSORS 2021; 21:s21227640. [PMID: 34833717 PMCID: PMC8623800 DOI: 10.3390/s21227640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 10/25/2021] [Accepted: 11/12/2021] [Indexed: 11/16/2022]
Abstract
Multi-person pose estimation has been gaining considerable interest due to its use in several real-world applications, such as activity recognition, motion capture, and augmented reality. Although the improvement of the accuracy and speed of multi-person pose estimation techniques has been recently studied, limitations still exist in balancing these two aspects. In this paper, a novel knowledge distilled lightweight top-down pose network (KDLPN) is proposed that balances computational complexity and accuracy. For the first time in multi-person pose estimation, a network that reduces computational complexity by applying a "Pelee" structure and shuffles pixels in the dense upsampling convolution layer to reduce the number of channels is presented. Furthermore, to prevent performance degradation because of the reduced computational complexity, knowledge distillation is applied to establish the pose estimation network as a teacher network. The method performance is evaluated on the MSCOCO dataset. Experimental results demonstrate that our KDLPN network significantly reduces 95% of the parameters required by state-of-the-art methods with minimal performance degradation. Moreover, our method is compared with other pose estimation methods to substantiate the importance of computational complexity reduction and its effectiveness.
Collapse
Affiliation(s)
- Changhyun Park
- Department of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Korea; (C.P.); (H.S.L.); (W.J.K.); (J.L.)
| | - Hean Sung Lee
- Department of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Korea; (C.P.); (H.S.L.); (W.J.K.); (J.L.)
| | - Woo Jin Kim
- Department of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Korea; (C.P.); (H.S.L.); (W.J.K.); (J.L.)
| | - Han Byeol Bae
- Department of Artificial Intelligence Convergence, Kwangju Women’s University, Gwangju 62396, Korea;
| | - Jaeho Lee
- Department of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Korea; (C.P.); (H.S.L.); (W.J.K.); (J.L.)
| | - Sangyoun Lee
- Department of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Korea; (C.P.); (H.S.L.); (W.J.K.); (J.L.)
- Correspondence: ; Tel.: +82-2-2123-5768
| |
Collapse
|
13
|
Deep Learning Methods for 3D Human Pose Estimation under Different Supervision Paradigms: A Survey. ELECTRONICS 2021. [DOI: 10.3390/electronics10182267] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The rise of deep learning technology has broadly promoted the practical application of artificial intelligence in production and daily life. In computer vision, many human-centered applications, such as video surveillance, human-computer interaction, digital entertainment, etc., rely heavily on accurate and efficient human pose estimation techniques. Inspired by the remarkable achievements in learning-based 2D human pose estimation, numerous research studies are devoted to the topic of 3D human pose estimation via deep learning methods. Against this backdrop, this paper provides an extensive literature survey of recent literature about deep learning methods for 3D human pose estimation to display the development process of these research studies, track the latest research trends, and analyze the characteristics of devised types of methods. The literature is reviewed, along with the general pipeline of 3D human pose estimation, which consists of human body modeling, learning-based pose estimation, and regularization for refinement. Different from existing reviews of the same topic, this paper focus on deep learning-based methods. The learning-based pose estimation is discussed from two categories: single-person and multi-person. Each one is further categorized by data type to the image-based methods and the video-based methods. Moreover, due to the significance of data for learning-based methods, this paper surveys the 3D human pose estimation methods according to the taxonomy of supervision form. At last, this paper also enlists the current and widely used datasets and compares performances of reviewed methods. Based on this literature survey, it can be concluded that each branch of 3D human pose estimation starts with fully-supervised methods, and there is still much room for multi-person pose estimation based on other supervision methods from both image and video. Besides the significant development of 3D human pose estimation via deep learning, the inherent ambiguity and occlusion problems remain challenging issues that need to be better addressed.
Collapse
|
14
|
Sibley KG, Girges C, Hoque E, Foltynie T. Video-Based Analyses of Parkinson's Disease Severity: A Brief Review. JOURNAL OF PARKINSON'S DISEASE 2021; 11:S83-S93. [PMID: 33682727 PMCID: PMC8385513 DOI: 10.3233/jpd-202402] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Accepted: 02/10/2021] [Indexed: 12/25/2022]
Abstract
Remote and objective assessment of the motor symptoms of Parkinson's disease is an area of great interest particularly since the COVID-19 crisis emerged. In this paper, we focus on a) the challenges of assessing motor severity via videos and b) the use of emerging video-based Artificial Intelligence (AI)/Machine Learning techniques to quantitate human movement and its potential utility in assessing motor severity in patients with Parkinson's disease. While we conclude that video-based assessment may be an accessible and useful way of monitoring motor severity of Parkinson's disease, the potential of video-based AI to diagnose and quantify disease severity in the clinical context is dependent on research with large, diverse samples, and further validation using carefully considered performance standards.
Collapse
Affiliation(s)
- Krista G. Sibley
- Department of Clinical and Movement Neurosciences, Institute of Neurology, University College London, London, UK
| | - Christine Girges
- Department of Clinical and Movement Neurosciences, Institute of Neurology, University College London, London, UK
| | - Ehsan Hoque
- Department of Computer Science, University of Rochester, Rochester, NY, USA
| | - Thomas Foltynie
- Department of Clinical and Movement Neurosciences, Institute of Neurology, University College London, London, UK
| |
Collapse
|
15
|
Automatic vocal tract landmark localization from midsagittal MRI data. Sci Rep 2020; 10:1468. [PMID: 32001739 PMCID: PMC6992757 DOI: 10.1038/s41598-020-58103-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 01/09/2020] [Indexed: 11/29/2022] Open
Abstract
The various speech sounds of a language are obtained by varying the shape and position of the articulators surrounding the vocal tract. Analyzing their variations is crucial for understanding speech production, diagnosing speech disorders and planning therapy. Identifying key anatomical landmarks of these structures on medical images is a pre-requisite for any quantitative analysis and the rising amount of data generated in the field calls for an automatic solution. The challenge lies in the high inter- and intra-speaker variability, the mutual interaction between the articulators and the moderate quality of the images. This study addresses this issue for the first time and tackles it by means of Deep Learning. It proposes a dedicated network architecture named Flat-net and its performance are evaluated and compared with eleven state-of-the-art methods from the literature. The dataset contains midsagittal anatomical Magnetic Resonance Images for 9 speakers sustaining 62 articulations with 21 annotated anatomical landmarks per image. Results show that the Flat-net approach outperforms the former methods, leading to an overall Root Mean Square Error of 3.6 pixels/0.36 cm obtained in a leave-one-out procedure over the speakers. The implementation codes are also shared publicly on GitHub.
Collapse
|
16
|
Garcia-Salguero M, Gonzalez-Jimenez J, Moreno FA. Human 3D Pose Estimation with a Tilting Camera for Social Mobile Robot Interaction. SENSORS 2019; 19:s19224943. [PMID: 31766197 PMCID: PMC6891307 DOI: 10.3390/s19224943] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Revised: 11/07/2019] [Accepted: 11/11/2019] [Indexed: 11/16/2022]
Abstract
Human-Robot interaction represents a cornerstone of mobile robotics, especially within the field of social robots. In this context, user localization becomes of crucial importance for the interaction. This work investigates the capabilities of wide field-of-view RGB cameras to estimate the 3D position and orientation (i.e., the pose) of a user in the environment. For that, we employ a social robot endowed with a fish-eye camera hosted in a tilting head and develop two complementary approaches: (1) a fast method relying on a single image that estimates the user pose from the detection of their feet and does not require either the robot or the user to remain static during the reconstruction; and (2) a method that takes some views of the scene while the camera is being tilted and does not need the feet to be visible. Due to the particular setup of the tilting camera, special equations for 3D reconstruction have been developed. In both approaches, a CNN-based skeleton detector (OpenPose) is employed to identify humans within the image. A set of experiments with real data validate our two proposed methods, yielding similar results than commercial RGB-D cameras while surpassing them in terms of coverage of the scene (wider FoV and longer range) and robustness to light conditions.
Collapse
|
17
|
|
18
|
Abstract
Human pose estimation is a fundamental but challenging task in computer vision. The estimation of human pose mainly depends on the global information of the keypoint type and the local information of the keypoint location. However, the consistency of the cascading process makes it difficult for each stacking network to form a differentiation and collaboration mechanism. In order to solve these problems, this paper introduces a new human pose estimation framework called Multi-Scale Collaborative (MSC) network. The pre-processing network forms feature maps of different sizes, and dispatches them to various locations of the stack network, with small-scale features reaching the front-end stacking network and large-scale features reaching the back-end stacking network. A new loss function is proposed for MSC network. Different keypoints have different weight coefficients of loss function at different scales, and the keypoint weight coefficients are dynamically adjusted from the top hourglass network to the bottom hourglass network. Experimental results show that the proposed method is competitive in MPII and LSP challenge leaderboard among the state-of-the-art methods.
Collapse
Affiliation(s)
- Chunsheng Guo
- School of Communication Engineering, Hangzhou Dianzi University, Hangzhou, Zhejiang, P. R. China
| | - Jialuo Zhou
- School of Communication Engineering, Hangzhou Dianzi University, Hangzhou, Zhejiang, P. R. China
| | - Wenlong Du
- School of Communication Engineering, Hangzhou Dianzi University, Hangzhou, Zhejiang, P. R. China
| | - Xuguang Zhang
- School of Communication Engineering, Hangzhou Dianzi University, Hangzhou, Zhejiang, P. R. China
| |
Collapse
|
19
|
Jin R, Jiang J, Qi Y, Lin D, Song T. Drone Detection and Pose Estimation Using Relational Graph Networks. SENSORS 2019; 19:s19061479. [PMID: 30917607 PMCID: PMC6471270 DOI: 10.3390/s19061479] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/09/2019] [Revised: 03/13/2019] [Accepted: 03/22/2019] [Indexed: 12/02/2022]
Abstract
With the upsurge in use of Unmanned Aerial Vehicles (UAVs), drone detection and pose estimation by using optical sensors becomes an important research subject in cooperative flight and low-altitude security. The existing technology only obtains the position of the target UAV based on object detection methods. To achieve better adaptability and enhanced cooperative performance, the attitude information of the target drone becomes a key message to understand its state and intention, e.g., the acceleration of quadrotors. At present, most of the object 6D pose estimation algorithms depend on accurate pose annotation or a 3D target model, which costs a lot of human resource and is difficult to apply to non-cooperative targets. To overcome these problems, a quadrotor 6D pose estimation algorithm was proposed in this paper. It was based on keypoints detection (only need keypoints annotation), relational graph network and perspective-n-point (PnP) algorithm, which achieves state-of-the-art performance both in simulation and real scenario. In addition, the inference ability of our relational graph network to the keypoints of four motors was also evaluated. The accuracy and speed were improved significantly compared with the state-of-the-art keypoints detection algorithm.
Collapse
Affiliation(s)
- Ren Jin
- Beijing Key Laboratory of UAV Autonomous Control, Beijing Institute of Technology, Beijing 100081, China.
| | - Jiaqi Jiang
- Multi-UAV GNC Laboratory, School of Aerospace Engineering, Beijing Institute of Technology, Beijing 100081, China.
| | - Yuhua Qi
- Multi-UAV GNC Laboratory, School of Aerospace Engineering, Beijing Institute of Technology, Beijing 100081, China.
| | - Defu Lin
- Beijing Key Laboratory of UAV Autonomous Control, Beijing Institute of Technology, Beijing 100081, China.
| | - Tao Song
- Beijing Key Laboratory of UAV Autonomous Control, Beijing Institute of Technology, Beijing 100081, China.
| |
Collapse
|
20
|
Song Y, Kim I. Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals. SENSORS 2019; 19:s19051085. [PMID: 30832433 PMCID: PMC6427216 DOI: 10.3390/s19051085] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Revised: 02/23/2019] [Accepted: 02/25/2019] [Indexed: 12/02/2022]
Abstract
This paper proposes a novel deep neural network model for solving the spatio-temporal-action-detection problem, by localizing all multiple-action regions and classifying the corresponding actions in an untrimmed video. The proposed model uses a spatio-temporal region proposal method to effectively detect multiple-action regions. First, in the temporal region proposal, anchor boxes were generated by targeting regions expected to potentially contain actions. Unlike the conventional temporal region proposal methods, the proposed method uses a complementary two-stage method to effectively detect the temporal regions of the respective actions occurring asynchronously. In addition, to detect a principal agent performing an action among the people appearing in a video, the spatial region proposal process was used. Further, coarse-level features contain comprehensive information of the whole video and have been frequently used in conventional action-detection studies. However, they cannot provide detailed information of each person performing an action in a video. In order to overcome the limitation of coarse-level features, the proposed model additionally learns fine-level features from the proposed action tubes in the video. Various experiments conducted using the LIRIS-HARL and UCF-10 datasets confirm the high performance and effectiveness of the proposed deep neural network model.
Collapse
Affiliation(s)
- Yeongtaek Song
- Department of Computer Science, Graduate School, Kyonggi University, 154-42 Gwanggyosan-ro Yeongtong-gu, Suwon-si 16227, Korea.
| | - Incheol Kim
- Department of Computer Science, Kyonggi University, 154-42 Gwanggyosan-ro Yeongtong-gu, Suwon-si 16227, Korea.
| |
Collapse
|
21
|
Improved Convolutional Pose Machines for Human Pose Estimation Using Image Sensor Data. SENSORS 2019; 19:s19030718. [PMID: 30744191 PMCID: PMC6386920 DOI: 10.3390/s19030718] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/05/2019] [Revised: 01/31/2019] [Accepted: 02/05/2019] [Indexed: 11/17/2022]
Abstract
In recent years, increasing human data comes from image sensors. In this paper, a novel approach combining convolutional pose machines (CPMs) with GoogLeNet is proposed for human pose estimation using image sensor data. The first stage of the CPMs directly generates a response map of each human skeleton’s key points from images, in which we introduce some layers from the GoogLeNet. On the one hand, the improved model uses deeper network layers and more complex network structures to enhance the ability of low level feature extraction. On the other hand, the improved model applies a fine-tuning strategy, which benefits the estimation accuracy. Moreover, we introduce the inception structure to greatly reduce parameters of the model, which reduces the convergence time significantly. Extensive experiments on several datasets show that the improved model outperforms most mainstream models in accuracy and training time. The prediction efficiency of the improved model is improved by 1.023 times compared with the CPMs. At the same time, the training time of the improved model is reduced 3.414 times. This paper presents a new idea for future research.
Collapse
|
22
|
Abstract
According to UNESCO, cultural heritage does not only include monuments and collections of objects, but also contains traditions or living expressions inherited from our ancestors and passed to our descendants. Folk dances represent part of cultural heritage and their preservation for the next generations appears of major importance. Digitization and visualization of folk dances form an increasingly active research area in computer science. In parallel to the rapidly advancing technologies, new ways for learning folk dances are explored, making the digitization and visualization of assorted folk dances for learning purposes using different equipment possible. Along with challenges and limitations, solutions that can assist the learning process and provide the user with meaningful feedback are proposed. In this paper, an overview of the techniques used for the recording of dance moves is presented. The different ways of visualization and giving the feedback to the user are reviewed as well as ways of performance evaluation. This paper reviews advances in digitization and visualization of folk dances from 2000 to 2018.
Collapse
|
23
|
Capturing Complex 3D Human Motions with Kernelized Low-Rank Representation from Monocular RGB Camera. SENSORS 2017; 17:s17092019. [PMID: 28869514 PMCID: PMC5620964 DOI: 10.3390/s17092019] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Revised: 08/23/2017] [Accepted: 08/24/2017] [Indexed: 11/19/2022]
Abstract
Recovering 3D structures from the monocular image sequence is an inherently ambiguous problem that has attracted considerable attention from several research communities. To resolve the ambiguities, a variety of additional priors, such as low-rank shape basis, have been proposed. In this paper, we make two contributions. First, we introduce an assumption that 3D structures lie on the union of nonlinear subspaces. Based on this assumption, we propose a Non-Rigid Structure from Motion (NRSfM) method with kernelized low-rank representation. To be specific, we utilize the soft-inextensibility constraint to accurately recover 3D human motions. Second, we extend this NRSfM method to the marker-less 3D human pose estimation problem by combining with Convolutional Neural Network (CNN) based 2D human joint detectors. To evaluate the performance of our methods, we apply our marker-based method on several sequences from Utrecht Multi-Person Motion (UMPM) benchmark and CMU MoCap datasets, and then apply the marker-less method on the Human3.6M datasets. The experiments demonstrate that the kernelized low-rank representation is more suitable for modeling the complex deformation and the method consequently yields more accurate reconstructions. Benefiting from the CNN-based detector, the marker-less approach can be applied to more real-life applications.
Collapse
|