Li P, Sun Z, Duan G, Wang D, Meng Q, Sun Y. DMU-Net: A Dual-Stream Multi-Scale U-Net Network Using Multi-Dimensional Spatial Information for Urban Building Extraction.
SENSORS (BASEL, SWITZERLAND) 2023;
23:1991. [PMID:
36850587 PMCID:
PMC9963264 DOI:
10.3390/s23041991]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 02/03/2023] [Accepted: 02/07/2023] [Indexed: 06/18/2023]
Abstract
Automatically extracting urban buildings from remote sensing images has essential application value, such as urban planning and management. Gaofen-7 (GF-7) provides multi-perspective and multispectral satellite images, which can obtain three-dimensional spatial information. Previous studies on building extraction often ignored information outside the red-green-blue (RGB) bands. To utilize the multi-dimensional spatial information of GF-7, we propose a dual-stream multi-scale network (DMU-Net) for urban building extraction. DMU-Net is based on U-Net, and the encoder is designed as the dual-stream CNN structure, which inputs RGB images, near-infrared (NIR), and normalized digital surface model (nDSM) fusion images, respectively. In addition, the improved FPN (IFPN) structure is integrated into the decoder. It enables DMU-Net to fuse different band features and multi-scale features of images effectively. This new method is tested with the study area within the Fourth Ring Road in Beijing, and the conclusions are as follows: (1) Our network achieves an overall accuracy (OA) of 96.16% and an intersection-over-union (IoU) of 84.49% for the GF-7 self-annotated building dataset, outperforms other state-of-the-art (SOTA) models. (2) Three-dimensional information significantly improved the accuracy of building extraction. Compared with RGB and RGB + NIR, the IoU increased by 7.61% and 3.19% after using nDSM data, respectively. (3) DMU-Net is superior to SMU-Net, DU-Net, and IEU-Net. The IoU is improved by 0.74%, 0.55%, and 1.65%, respectively, indicating the superiority of the dual-stream CNN structure and the IFPN structure.
Collapse