Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Yao H, Chen J, Wang Z, Wang X, Chai X, Qiu Y, Han P. Vertex points are not enough: Monocular 3D object detection via intra- and inter-plane constraints. Neural Netw 2023;162:350-358. [PMID: 36940495 DOI: 10.1016/j.neunet.2023.02.038] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 02/15/2023] [Accepted: 02/25/2023] [Indexed: 03/06/2023]

For:	Yao H, Chen J, Wang Z, Wang X, Chai X, Qiu Y, Han P. Vertex points are not enough: Monocular 3D object detection via intra- and inter-plane constraints. Neural Netw 2023;162:350-358. [PMID: 36940495 DOI: 10.1016/j.neunet.2023.02.038] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 02/15/2023] [Accepted: 02/25/2023] [Indexed: 03/06/2023]

Number

Cited by Other Article(s)

Ding B, Xie J, Nie J, Wu Y, Cao J. C²BG-Net: Cross-modality and cross-scale balance network with global semantics for multi-modal 3D object detection. Neural Netw 2024;179:106535. [PMID: 39047336 DOI: 10.1016/j.neunet.2024.106535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 07/03/2024] [Accepted: 07/09/2024] [Indexed: 07/27/2024]

Abstract

Multi-modal 3D object detection is instrumental in identifying and localizing objects within 3D space. It combines RGB images from cameras and point-clouds data from lidar sensors, serving as a fundamental technology for autonomous driving applications. Current methods commonly employ simplistic element-wise additions or multiplications to aggregate multi-modal features extracted from point-clouds and images. While these methods enhance detection accuracy, the utilization of basic operations presents challenges in effectively balancing the significance between modalities. This can potentially introduce noise and irrelevant information during the feature aggregation process. Additionally, the multi-level features extracted from images display imbalances in receptive fields. To tackle the aforementioned challenges, we propose two innovative networks: a cross-modality balance network (CMN) and a cross-scale balance network (CSN). CMN incorporates cross-modality attention mechanisms and introduces an auxiliary 2D detection head to balance the significance of both modalities. Meanwhile, CSN leverages cross-scale attention mechanisms to mitigate the gap in receptive fields between different image levels. Additionally, we introduce a novel Local with Global Voxel Attention Encoder (LGVAE) designed to capture global semantics by extracting more comprehensive point-level information into voxel-level features. We perform comprehensive experiments on three challenging public benchmarks: KITTI, Dense and nuScenes. The results consistently demonstrate improvements across multiple 3D object detection frameworks, affirming the effectiveness and versatility of our proposed method. Remarkably, our approach achieves a substantial absolute gain of 3.1% over the baseline MVXNet on the challenging Hard set of the Dense test set.

Collapse

Shao Y, Tan A, Wang B, Yan T, Sun Z, Zhang Y, Liu J. MS²3D: A 3D object detection method using multi-scale semantic feature points to construct 3D feature layer. Neural Netw 2024;179:106623. [PMID: 39154419 DOI: 10.1016/j.neunet.2024.106623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 06/20/2024] [Accepted: 08/08/2024] [Indexed: 08/20/2024]

Wang J, Qi Y. Multi-level feature fusion and joint refinement for simultaneous object pose estimation and camera localization. Neural Netw 2024;174:106238. [PMID: 38508048 DOI: 10.1016/j.neunet.2024.106238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 01/22/2024] [Accepted: 03/13/2024] [Indexed: 03/22/2024]

Abstract

Object pose estimation and camera localization are critical in various applications. However, achieving algorithm universality, which refers to category-level pose estimation and scene-independent camera localization, presents challenges for both techniques. Although the two tasks keep close relationships due to spatial geometry constraints, different tasks require distinct feature extractions. This paper pays attention to a unified RGB-D based framework that simultaneously performs category-level object pose estimation and scene-independent camera localization. The framework consists of a pose estimation branch called SLO-ObjNet, a localization branch called SLO-LocNet, a pose confidence calculation process and object-level optimization. At the start, we obtain the initial camera and object results from SLO-LocNet and SLO-ObjNet. In these two networks, we design there-level feature fusion modules as well as the loss function to achieve feature sharing between two tasks. Then the proposed approach involves a confidence calculation process to determine the accuracy of object poses obtained. Additionally, an object-level Bundle Adjustment (BA) optimization algorithm is further used to improve the precision of these techniques. The BA algorithm establishes relationships among feature points, objects, and cameras with the usage of camera-point, camera-object, and object-point metrics. To evaluate the performance of this approach, experiments are conducted on localization and pose estimation datasets including REAL275, CAMERA25, LineMOD, YCB-Video, 7 Scenes, ScanNet and TUM RGB-D. The results show that this approach outperforms existing methods in terms of both estimation and localization accuracy. Additionally, SLO-LocNet and SLO-ObjNet are trained on ScanNet data and tested on 7 Scenes and TUM RGB-D datasets to demonstrate its universality performance. Finally, we also highlight the positive effects of fusion modules, loss function, confidence process and BA for improving overall performance.

Collapse

Zhou W, Zheng F, Zhao Y, Pang Y, Yi J. MSDCNN: A multiscale dilated convolution neural network for fine-grained 3D shape classification. Neural Netw 2024;172:106141. [PMID: 38301340 DOI: 10.1016/j.neunet.2024.106141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 01/17/2024] [Accepted: 01/21/2024] [Indexed: 02/03/2024]

Wang H, Chen T, Ji X, Qian F, Ma Y, Wang S. LiDAR-camera-system-based unsupervised and weakly supervised 3D object detection. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA. A, OPTICS, IMAGE SCIENCE, AND VISION 2023;40:1849-1860. [PMID: 37855540 DOI: 10.1364/josaa.494980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 08/17/2023] [Indexed: 10/20/2023]