1
|
Butler RM, Frassini E, Vijfvinkel TS, van Riel S, Bachvarov C, Constandse J, van der Elst M, van den Dobbelsteen JJ, Hendriks BHW. Benchmarking 2D human pose estimators and trackers for workflow analysis in the cardiac catheterization laboratory. Med Eng Phys 2025; 136:104289. [PMID: 39979009 DOI: 10.1016/j.medengphy.2025.104289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 12/23/2024] [Accepted: 01/07/2025] [Indexed: 02/22/2025]
Abstract
Workflow insights can improve efficiency and safety in the Cardiac Catheterization Laboratory (Cath Lab). As manual analysis is labor-intensive, we aim for automation through camera monitoring. Literature shows that human poses are indicative of activities and therefore workflow. As a first exploration, we evaluate how marker-less multi-human pose estimators perform in the Cath Lab. We annotated poses in 2040 frames from ten multi-view coronary angiogram (CAG) recordings. Pose estimators AlphaPose, OpenPifPaf and OpenPose were run on the footage. Detection and tracking were evaluated separately for the Head, Arms, and Legs with Average Precision (AP), head-guided Percentage of Correct Keypoints (PCKh), Association Accuracy (AA), and Higher-Order Tracking Accuracy (HOTA). We give qualitative examples of results for situations common in the Cath Lab, with reflections in the monitor or occlusion of personnel. AlphaPose performed best on most mean Full-pose metrics with an AP from 0.56 to 0.82, AA from 0.55 to 0.71, and HOTA from 0.58 to 0.73. On PCKh OpenPifPaf scored highest, from 0.53 to 0.64. Arms, Legs, and the Head were detected best in that order, from the views which see the least occlusion. During tracking in the Cath Lab, AlphaPose tended to swap identities and OpenPifPaf merged different individuals. Results suggest that AlphaPose yields the most accurate confidence scores and limbs, and OpenPifPaf more accurate keypoint locations in the Cath Lab. Occlusions and reflection complicate pose tracking. The AP of up to 0.82 suggests that AlphaPose is a suitable pose detector for workflow analysis in the Cath Lab, whereas its HOTA of up to 0.73 here calls for another tracking solution.
Collapse
Affiliation(s)
- Rick M Butler
- Delft University of Technology, Delft, the Netherlands.
| | | | | | | | | | | | - Maarten van der Elst
- Delft University of Technology, Delft, the Netherlands; Reinier de Graaf Gasthuis, Delft, the Netherlands
| | | | - Benno H W Hendriks
- Delft University of Technology, Delft, the Netherlands; Philips Healthcare, Best, the Netherlands
| |
Collapse
|
2
|
Butler RM, Vijfvinkel TS, Frassini E, van Riel S, Bachvarov C, Constandse J, van der Elst M, van den Dobbelsteen JJ, Hendriks BHW. 2D human pose tracking in the cardiac catheterisation laboratory with BYTE. Med Eng Phys 2025; 135:104270. [PMID: 39922649 DOI: 10.1016/j.medengphy.2024.104270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 10/22/2024] [Accepted: 12/01/2024] [Indexed: 02/10/2025]
Abstract
Workflow insights can enable safety- and efficiency improvements in the Cardiac Catheterisation Laboratory (Cath Lab). Human pose tracklets from video footage can provide a source of workflow information. However, occlusions and visual similarity between personnel make the Cath Lab a challenging environment for the re-identification of individuals. We propose a human pose tracker that addresses these problems specifically, and test it on recordings of real coronary angiograms. This tracker uses no visual information for re-identification, and instead employs object keypoint similarity between detections and predictions from a third-order motion model. Algorithm performance is measured on Cath Lab footage using Higher-Order Tracking Accuracy (HOTA). To evaluate its stability during procedures, this is done separately for five different surgical steps of the procedure. We achieve up to 0.71 HOTA where tested state-of-the-art pose trackers score up to 0.65 on the used dataset. We observe that the pose tracker HOTA performance varies with up to 10 percentage point (pp) between workflow phases, where tested state-of-the-art trackers show differences of up to 23 pp. In addition, the tracker achieves up to 22.5 frames per second, which is 9 frames per second faster than the current state-of-the-art on our setup in the Cath Lab. The fast and consistent short-term performance of the provided algorithm makes it suitable for use in workflow analysis in the Cath Lab and opens the door to real-time use-cases. Our code is publicly available at https://github.com/RM-8vt13r/PoseBYTE.
Collapse
Affiliation(s)
- Rick M Butler
- Delft University of Technology, Delft, the Netherlands.
| | | | | | | | | | | | - Maarten van der Elst
- Delft University of Technology, Delft, the Netherlands; Reinier de Graaf Gasthuis, Delft, the Netherlands
| | | | - Benno H W Hendriks
- Delft University of Technology, Delft, the Netherlands; Philips Healthcare, Best, the Netherlands
| |
Collapse
|
3
|
Crouzet A, Lopez N, Riss Yaw B, Lepelletier Y, Demange L. The Millennia-Long Development of Drugs Associated with the 80-Year-Old Artificial Intelligence Story: The Therapeutic Big Bang? Molecules 2024; 29:2716. [PMID: 38930784 PMCID: PMC11206022 DOI: 10.3390/molecules29122716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 05/30/2024] [Accepted: 05/31/2024] [Indexed: 06/28/2024] Open
Abstract
The journey of drug discovery (DD) has evolved from ancient practices to modern technology-driven approaches, with Artificial Intelligence (AI) emerging as a pivotal force in streamlining and accelerating the process. Despite the vital importance of DD, it faces challenges such as high costs and lengthy timelines. This review examines the historical progression and current market of DD alongside the development and integration of AI technologies. We analyse the challenges encountered in applying AI to DD, focusing on drug design and protein-protein interactions. The discussion is enriched by presenting models that put forward the application of AI in DD. Three case studies are highlighted to demonstrate the successful application of AI in DD, including the discovery of a novel class of antibiotics and a small-molecule inhibitor that has progressed to phase II clinical trials. These cases underscore the potential of AI to identify new drug candidates and optimise the development process. The convergence of DD and AI embodies a transformative shift in the field, offering a path to overcome traditional obstacles. By leveraging AI, the future of DD promises enhanced efficiency and novel breakthroughs, heralding a new era of medical innovation even though there is still a long way to go.
Collapse
Affiliation(s)
- Aurore Crouzet
- UMR 8038 CNRS CiTCoM, Team PNAS, Faculté de Pharmacie, Université Paris Cité, 4 Avenue de l’Observatoire, 75006 Paris, France
- W-MedPhys, 128 Rue la Boétie, 75008 Paris, France
| | - Nicolas Lopez
- W-MedPhys, 128 Rue la Boétie, 75008 Paris, France
- ENOES, 62 Rue de Miromesnil, 75008 Paris, France
- Unité Mixte de Recherche «Institut de Physique Théorique (IPhT)» CEA-CNRS, UMR 3681, Bat 774, Route de l’Orme des Merisiers, 91191 St Aubin-Gif-sur-Yvette, France
| | - Benjamin Riss Yaw
- UMR 8038 CNRS CiTCoM, Team PNAS, Faculté de Pharmacie, Université Paris Cité, 4 Avenue de l’Observatoire, 75006 Paris, France
| | - Yves Lepelletier
- W-MedPhys, 128 Rue la Boétie, 75008 Paris, France
- Université Paris Cité, Imagine Institute, 24 Boulevard Montparnasse, 75015 Paris, France
- INSERM UMR 1163, Laboratory of Cellular and Molecular Basis of Normal Hematopoiesis and Hematological Disorders: Therapeutical Implications, 24 Boulevard Montparnasse, 75015 Paris, France
| | - Luc Demange
- UMR 8038 CNRS CiTCoM, Team PNAS, Faculté de Pharmacie, Université Paris Cité, 4 Avenue de l’Observatoire, 75006 Paris, France
| |
Collapse
|
4
|
Javeed M, Abdelhaq M, Algarni A, Jalal A. Biosensor-Based Multimodal Deep Human Locomotion Decoding via Internet of Healthcare Things. MICROMACHINES 2023; 14:2204. [PMID: 38138373 PMCID: PMC10745656 DOI: 10.3390/mi14122204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 11/28/2023] [Accepted: 11/30/2023] [Indexed: 12/24/2023]
Abstract
Multiple Internet of Healthcare Things (IoHT)-based devices have been utilized as sensing methodologies for human locomotion decoding to aid in applications related to e-healthcare. Different measurement conditions affect the daily routine monitoring, including the sensor type, wearing style, data retrieval method, and processing model. Currently, several models are present in this domain that include a variety of techniques for pre-processing, descriptor extraction, and reduction, along with the classification of data captured from multiple sensors. However, such models consisting of multiple subject-based data using different techniques may degrade the accuracy rate of locomotion decoding. Therefore, this study proposes a deep neural network model that not only applies the state-of-the-art Quaternion-based filtration technique for motion and ambient data along with background subtraction and skeleton modeling for video-based data, but also learns important descriptors from novel graph-based representations and Gaussian Markov random-field mechanisms. Due to the non-linear nature of data, these descriptors are further utilized to extract the codebook via the Gaussian mixture regression model. Furthermore, the codebook is provided to the recurrent neural network to classify the activities for the locomotion-decoding system. We show the validity of the proposed model across two publicly available data sampling strategies, namely, the HWU-USP and LARa datasets. The proposed model is significantly improved over previous systems, as it achieved 82.22% and 82.50% for the HWU-USP and LARa datasets, respectively. The proposed IoHT-based locomotion-decoding model is useful for unobtrusive human activity recognition over extended periods in e-healthcare facilities.
Collapse
Affiliation(s)
- Madiha Javeed
- Department of Computer Science, Air University, Islamabad 44000, Pakistan;
| | - Maha Abdelhaq
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Asaad Algarni
- Department of Computer Sciences, Faculty of Computing and Information Technology, Northern Border University, Rafha 91911, Saudi Arabia;
| | - Ahmad Jalal
- Department of Computer Science, Air University, Islamabad 44000, Pakistan;
| |
Collapse
|
5
|
Yin R, Yang B, Huang Z, Zhang X. DSA-Net: Infrared and Visible Image Fusion via Dual-Stream Asymmetric Network. SENSORS (BASEL, SWITZERLAND) 2023; 23:7097. [PMID: 37631634 PMCID: PMC10459630 DOI: 10.3390/s23167097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 07/28/2023] [Accepted: 08/07/2023] [Indexed: 08/27/2023]
Abstract
Infrared and visible image fusion technologies are used to characterize the same scene using diverse modalities. However, most existing deep learning-based fusion methods are designed as symmetric networks, which ignore the differences between modal images and lead to source image information loss during feature extraction. In this paper, we propose a new fusion framework for the different characteristics of infrared and visible images. Specifically, we design a dual-stream asymmetric network with two different feature extraction networks to extract infrared and visible feature maps, respectively. The transformer architecture is introduced in the infrared feature extraction branch, which can force the network to focus on the local features of infrared images while still obtaining their contextual information. The visible feature extraction branch uses residual dense blocks to fully extract the rich background and texture detail information of visible images. In this way, it can provide better infrared targets and visible details for the fused image. Experimental results on multiple datasets indicate that DSA-Net outperforms state-of-the-art methods in both qualitative and quantitative evaluations. In addition, we also apply the fusion results to the target detection task, which indirectly demonstrates the fusion performances of our method.
Collapse
Affiliation(s)
| | - Bin Yang
- College of Electrical Engineering, University of South China, Hengyang 421001, China; (R.Y.); (Z.H.); (X.Z.)
| | | | | |
Collapse
|
6
|
Jang Y, Jeong I, Younesi Heravi M, Sarkar S, Shin H, Ahn Y. Multi-Camera-Based Human Activity Recognition for Human-Robot Collaboration in Construction. SENSORS (BASEL, SWITZERLAND) 2023; 23:6997. [PMID: 37571779 PMCID: PMC10422633 DOI: 10.3390/s23156997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 07/27/2023] [Accepted: 08/05/2023] [Indexed: 08/13/2023]
Abstract
As the use of construction robots continues to increase, ensuring safety and productivity while working alongside human workers becomes crucial. To prevent collisions, robots must recognize human behavior in close proximity. However, single, or RGB-depth cameras have limitations, such as detection failure, sensor malfunction, occlusions, unconstrained lighting, and motion blur. Therefore, this study proposes a multiple-camera approach for human activity recognition during human-robot collaborative activities in construction. The proposed approach employs a particle filter, to estimate the 3D human pose by fusing 2D joint locations extracted from multiple cameras and applies long short-term memory network (LSTM) to recognize ten activities associated with human and robot collaboration tasks in construction. The study compared the performance of human activity recognition models using one, two, three, and four cameras. Results showed that using multiple cameras enhances recognition performance, providing a more accurate and reliable means of identifying and differentiating between various activities. The results of this study are expected to contribute to the advancement of human activity recognition and utilization in human-robot collaboration in construction.
Collapse
Affiliation(s)
- Youjin Jang
- Department of Civil, Construction and Environmental Engineering, North Dakota State University, Fargo, ND 58108, USA; (M.Y.H.); (S.S.)
| | - Inbae Jeong
- Department of Mechanical Engineering, North Dakota State University, Fargo, ND 58108, USA;
| | - Moein Younesi Heravi
- Department of Civil, Construction and Environmental Engineering, North Dakota State University, Fargo, ND 58108, USA; (M.Y.H.); (S.S.)
| | - Sajib Sarkar
- Department of Civil, Construction and Environmental Engineering, North Dakota State University, Fargo, ND 58108, USA; (M.Y.H.); (S.S.)
| | - Hyunkyu Shin
- Sustainable Smart City Convergence Educational Research Center, Hanyang University ERICA, Ansan 15588, Republic of Korea;
| | - Yonghan Ahn
- Department of Architectural Engineering, Hanyang University ERICA, Ansan 15588, Republic of Korea;
| |
Collapse
|