Xu C, Zhou J, Cai W, Jiang Y, Li Y, Liu Y. Robust 3D Hand Detection from a Single RGB-D Image in Unconstrained Environments.
SENSORS 2020;
20:s20216360. [PMID:
33171831 PMCID:
PMC7664645 DOI:
10.3390/s20216360]
[Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 11/04/2020] [Accepted: 11/05/2020] [Indexed: 11/20/2022]
Abstract
Three-dimensional hand detection from a single RGB-D image is an important technology which supports many useful applications. Practically, it is challenging to robustly detect human hands in unconstrained environments because the RGB-D channels can be affected by many uncontrollable factors, such as light changes. To tackle this problem, we propose a 3D hand detection approach which improves the robustness and accuracy by adaptively fusing the complementary features extracted from the RGB-D channels. Using the fused RGB-D feature, the 2D bounding boxes of hands are detected first, and then the 3D locations along the z-axis are estimated through a cascaded network. Furthermore, we represent a challenging RGB-D hand detection dataset collected in unconstrained environments. Different from previous works which primarily rely on either the RGB or D channel, we adaptively fuse the RGB-D channels for hand detection. Specifically, evaluation results show that the D-channel is crucial for hand detection in unconstrained environments. Our RGB-D fusion-based approach significantly improves the hand detection accuracy from 69.1 to 74.1 comparing to one of the most state-of-the-art RGB-based hand detectors. The existing RGB- or D-based methods are unstable in unseen lighting conditions: in dark conditions, the accuracy of the RGB-based method significantly drops to 48.9, and in back-light conditions, the accuracy of the D-based method dramatically drops to 28.3. Compared with these methods, our RGB-D fusion based approach is much more robust without accuracy degrading, and our detection results are 62.5 and 65.9, respectively, in these two extreme lighting conditions for accuracy.
Collapse