1
|
Li H, Yang J, Xuan Z, Qu M, Wang Y, Feng C. A spatio-temporal graph convolutional network for ultrasound echocardiographic landmark detection. Med Image Anal 2024; 97:103272. [PMID: 39024972 DOI: 10.1016/j.media.2024.103272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 07/07/2024] [Accepted: 07/08/2024] [Indexed: 07/20/2024]
Abstract
Landmark detection is a crucial task in medical image analysis, with applications across various fields. However, current methods struggle to accurately locate landmarks in medical images with blurred tissue boundaries due to low image quality. In particular, in echocardiography, sparse annotations make it challenging to predict landmarks with position stability and temporal consistency. In this paper, we propose a spatio-temporal graph convolutional network tailored for echocardiography landmark detection. We specifically sample landmark labels from the left ventricular endocardium and pre-calculate their correlations to establish structural priors. Our approach involves a graph convolutional neural network that learns the interrelationships among landmarks, significantly enhancing landmark accuracy within ambiguous tissue contexts. Additionally, we integrate gate recurrent units to grasp the temporal consistency of landmarks across consecutive images, augmenting the model's resilience against unlabeled data. Through validation across three echocardiography datasets, our method demonstrates superior accuracy when contrasted with alternative landmark detection models.
Collapse
Affiliation(s)
- Honghe Li
- Computer Science and Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, China
| | - Jinzhu Yang
- Computer Science and Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, China.
| | - Zhanfeng Xuan
- Computer Science and Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, China
| | - Mingjun Qu
- Computer Science and Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, China
| | - Yonghuai Wang
- Department of Cardiovascular Ultrasound, The First Hospital of China Medical University, China
| | - Chaolu Feng
- Computer Science and Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, China
| |
Collapse
|
2
|
Tripathi SC, Garg R. Consistent movement of viewers' facial keypoints while watching emotionally evocative videos. PLoS One 2024; 19:e0302705. [PMID: 38758739 PMCID: PMC11101037 DOI: 10.1371/journal.pone.0302705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 04/09/2024] [Indexed: 05/19/2024] Open
Abstract
Neuropsychological research aims to unravel how diverse individuals' brains exhibit similar functionality when exposed to the same stimuli. The evocation of consistent responses when different subjects watch the same emotionally evocative stimulus has been observed through modalities like fMRI, EEG, physiological signals and facial expressions. We refer to the quantification of these shared consistent signals across subjects at each time instant across the temporal dimension as Consistent Response Measurement (CRM). CRM is widely explored through fMRI, occasionally with EEG, physiological signals and facial expressions using metrics like Inter-Subject Correlation (ISC). However, fMRI tools are expensive and constrained, while EEG and physiological signals are prone to facial artifacts and environmental conditions (such as temperature, humidity, and health condition of subjects). In this research, facial expression videos are used as a cost-effective and flexible alternative for CRM, minimally affected by external conditions. By employing computer vision-based automated facial keypoint tracking, a new metric similar to ISC, called the Average t-statistic, is introduced. Unlike existing facial expression-based methodologies that measure CRM of secondary indicators like inferred emotions, keypoint, and ICA-based features, the Average t-statistic is closely associated with the direct measurement of consistent facial muscle movement using the Facial Action Coding System (FACS). This is evidenced in DISFA dataset where the time-series of Average t-statistic has a high correlation (R2 = 0.78) with a metric called AU consistency, which directly measures facial muscle movement through FACS coding of video frames. The simplicity of recording facial expressions with the automated Average t-statistic expands the applications of CRM such as measuring engagement in online learning, customer interactions, etc., and diagnosing outliers in healthcare conditions like stroke, autism, depression, etc. To promote further research, we have made the code repository publicly available.
Collapse
Affiliation(s)
- Shivansh Chandra Tripathi
- Department of Computer Science and Engineering, Indian Institute of Technology Delhi, New Delhi, India
| | - Rahul Garg
- Department of Computer Science and Engineering, Indian Institute of Technology Delhi, New Delhi, India
- Amar Nath and Shashi Khosla School of Information Technology, Indian Institute of Technology Delhi, New Delhi, India
- National Resource Centre for Value Education in Engineering, Indian Institute of Technology Delhi, New Delhi, India
| |
Collapse
|
3
|
Xia J, Xu M, Zhang H, Zhang J, Huang W, Cao H, Wen S. Robust Face Alignment via Inherent Relation Learning and Uncertainty Estimation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:10358-10375. [PMID: 37030840 DOI: 10.1109/tpami.2023.3260926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Human tends to locate the facial landmarks with heavy occlusion by their relative position to the easily identified landmarks. The clue is defined as the landmark inherent relation while it is ignored by most existing methods. In this paper, we present Dynamic Sparse Local Patch Transformer (DSLPT), a novel face alignment framework for the inherent relation learning and uncertainty estimation. Unlike most existing methods that regress facial landmarks directly from global features, the DSLPT first generates a rough representation of each landmark from a local patch cropped from the feature map and then adaptively aggregates them by a case dependent inherent relation. Finally, the DSLPT predicts the coordinate and uncertainty of each landmark by regressing their probability distribution from the output features. Moreover, we introduce a coarse-to-fine framework to incorporate with DSLPT for an improved result. In the framework, the position and size of each patch are determined by the probability distribution of the corresponding landmark predicted in the previous stage. The dynamic patches will ensure a fine-grained landmark representation for inherent relation learning so that a rough prediction result can gradually converge to the target facial landmarks. We integrate the coarse-to-fine model into an end-to-end training pipeline and carry out experiments on the mainstream benchmarks. The results demonstrate that the DSLPT achieves state-of-the-art performance with much less computational complexity. The codes and models are available at https://github.com/Jiahao-UTS/DSLPT.
Collapse
|
4
|
EchoEFNet: Multi-task deep learning network for automatic calculation of left ventricular ejection fraction in 2D echocardiography. Comput Biol Med 2023; 156:106705. [PMID: 36863190 DOI: 10.1016/j.compbiomed.2023.106705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Revised: 01/23/2023] [Accepted: 02/19/2023] [Indexed: 03/03/2023]
Abstract
Left ventricular ejection fraction (LVEF) is essential for evaluating left ventricular systolic function. However, its clinical calculation requires the physician to interactively segment the left ventricle and obtain the mitral annulus and apical landmarks. This process is poorly reproducible and error prone. In this study, we propose a multi-task deep learning network EchoEFNet. The network use ResNet50 with dilated convolution as the backbone to extract high-dimensional features while maintaining spatial features. The branching network used our designed multi-scale feature fusion decoder to segment the left ventricle and detect landmarks simultaneously. The LVEF was then calculated automatically and accurately using the biplane Simpson's method. The model was tested for performance on the public dataset CAMUS and private dataset CMUEcho. The experimental results showed that the geometrical metrics and percentage of correct keypoints of EchoEFNet outperformed other deep learning methods. The correlation between the predicted LVEF and true values on the CAMUS and CMUEcho datasets was 0.854 and 0.916, respectively.
Collapse
|
5
|
Wang Y, Zhou W, Zhou J. 2DHeadPose: A simple and effective annotation method for the head pose in RGB images and its dataset. Neural Netw 2023; 160:50-62. [PMID: 36621170 DOI: 10.1016/j.neunet.2022.12.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 11/16/2022] [Accepted: 12/28/2022] [Indexed: 01/03/2023]
Abstract
Head pose estimation is one of the essential tasks in computer vision, which predicts the Euler angles of the head in an image. In recent years, CNN-based methods for head pose estimation have achieved excellent performance. Their training relies on RGB images providing facial landmarks or depth images from RGBD cameras. However, labeling facial landmarks is complex for large angular head poses in RGB images, and RGBD cameras are unsuitable for outdoor scenes. We propose a simple and effective annotation method for the head pose in RGB images. The novelty method uses a 3D virtual human head to simulate the head pose in the RGB image. The Euler angle can be calculated from the change in coordinates of the 3D virtual head. We then create a dataset using our annotation method: 2DHeadPose dataset, which contains a rich set of attributes, dimensions, and angles. Finally, we propose Gaussian label smoothing to suppress annotation noises and reflect inter-class relationships. A baseline approach is established using Gaussian label smoothing. Experiments demonstrate that our annotation method, datasets, and Gaussian label smoothing are very effective. Our baseline approach surpasses most current state-of-the-art methods. The annotation tool, dataset, and source code are publicly available at https://github.com/youngnuaa/2DHeadPose.
Collapse
Affiliation(s)
- Yang Wang
- College of Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China
| | - Wanlin Zhou
- College of Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China
| | - Jiakai Zhou
- College of Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China.
| |
Collapse
|
6
|
Deep Recurrent Regression with a Heatmap Coupling Module for Facial Landmarks Detection. Cognit Comput 2022. [DOI: 10.1007/s12559-022-10065-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
7
|
Jafari MH, Luong C, Tsang M, Gu AN, Van Woudenberg N, Rohling R, Tsang T, Abolmaesumi P. U-LanD: Uncertainty-Driven Video Landmark Detection. IEEE TRANSACTIONS ON MEDICAL IMAGING 2022; 41:793-804. [PMID: 34705639 DOI: 10.1109/tmi.2021.3123547] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This paper presents U-LanD, a framework for automatic detection of landmarks on key frames of the video by leveraging the uncertainty of landmark prediction. We tackle a specifically challenging problem, where training labels are noisy and highly sparse. U-LanD builds upon a pivotal observation: a deep Bayesian landmark detector solely trained on key video frames, has significantly lower predictive uncertainty on those frames vs. other frames in videos. We use this observation as an unsupervised signal to automatically recognize key frames on which we detect landmarks. As a test-bed for our framework, we use ultrasound imaging videos of the heart, where sparse and noisy clinical labels are only available for a single frame in each video. Using data from 4,493 patients, we demonstrate that U-LanD can exceedingly outperform the state-of-the-art non-Bayesian counterpart by a noticeable absolute margin of 42% in R2 score, with almost no overhead imposed on the model size.
Collapse
|
8
|
Behzad M, Vo N, Li X, Zhao G. Towards Reading Beyond Faces for Sparsity-aware 3D/4D Affect Recognition. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.06.023] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
9
|
Lin C, Zhu B, Wang Q, Liao R, Qian C, Lu J, Zhou J. Structure-Coherent Deep Feature Learning for Robust Face Alignment. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:5313-5326. [PMID: 34038362 DOI: 10.1109/tip.2021.3082319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this paper, we propose a structure-coherent deep feature learning method for face alignment. Unlike most existing face alignment methods which overlook the facial structure cues, we explicitly exploit the relation among facial landmarks to make the detector robust to hard cases such as occlusion and large pose. Specifically, we leverage a landmark-graph relational network to enforce the structural relationships among landmarks. We consider the facial landmarks as structural graph nodes and carefully design the neighborhood to passing features among the most related nodes. Our method dynamically adapts the weights of node neighborhood to eliminate distracted information from noisy nodes, such as occluded landmark point. Moreover, different from most previous works which only tend to penalize the landmarks absolute position during the training, we propose a relative location loss to enhance the information of relative location of landmarks. This relative location supervision further regularizes the facial structure. Our approach considers the interactions among facial landmarks and can be easily implemented on top of any convolutional backbone to boost the performance. Extensive experiments on three popular benchmarks, including WFLW, COFW and 300W, demonstrate the effectiveness of the proposed method. In particular, due to explicit structure modeling, our approach is especially robust to challenging cases resulting in impressive low failure rate on COFW and WFLW datasets. The model and code are publicly available at https://github.com/BeierZhu/Sturcture-Coherency-Face-Alignment.
Collapse
|