1
|
Xu K, Huang H, Deng P, Li Y. Deep Feature Aggregation Framework Driven by Graph Convolutional Network for Scene Classification in Remote Sensing. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:5751-5765. [PMID: 33857002 DOI: 10.1109/tnnls.2021.3071369] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Scene classification of high spatial resolution (HSR) images can provide data support for many practical applications, such as land planning and utilization, and it has been a crucial research topic in the remote sensing (RS) community. Recently, deep learning methods driven by massive data show the impressive ability of feature learning in the field of HSR scene classification, especially convolutional neural networks (CNNs). Although traditional CNNs achieve good classification results, it is difficult for them to effectively capture potential context relationships. The graphs have powerful capacity to represent the relevance of data, and graph-based deep learning methods can spontaneously learn intrinsic attributes contained in RS images. Inspired by the abovementioned facts, we develop a deep feature aggregation framework driven by graph convolutional network (DFAGCN) for the HSR scene classification. First, the off-the-shelf CNN pretrained on ImageNet is employed to obtain multilayer features. Second, a graph convolutional network-based model is introduced to effectively reveal patch-to-patch correlations of convolutional feature maps, and more refined features can be harvested. Finally, a weighted concatenation method is adopted to integrate multiple features (i.e., multilayer convolutional features and fully connected features) by introducing three weighting coefficients, and then a linear classifier is employed to predict semantic classes of query images. Experimental results performed on the UCM, AID, RSSCN7, and NWPU-RESISC45 data sets demonstrate that the proposed DFAGCN framework obtains more competitive performance than some state-of-the-art methods of scene classification in terms of OAs.
Collapse
|
2
|
Extracting Wetland Type Information with a Deep Convolutional Neural Network. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:5303872. [PMID: 35634072 PMCID: PMC9132632 DOI: 10.1155/2022/5303872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 04/01/2022] [Accepted: 04/19/2022] [Indexed: 11/17/2022]
Abstract
Wetlands have important ecological value. The application of wetland remote sensing is essential for the timely and accurate analysis of the current situation in wetlands and dynamic changes in wetland resources, but high-resolution remote sensing images display nonobvious boundaries between wetland types. However, high classification accuracy and time efficiency cannot be guaranteed simultaneously. Extraction of wetland type information based on high-spatial-resolution remote sensing images is a bottleneck that has hindered wetland development research and change detection. This paper proposes an automatic and efficient method for extracting wetland type information. First, the object-oriented multiscale segmentation method is used to realize the fine segmentation of high-resolution remote sensing images, and then the deep convolutional neural network model AlexNet is used to classify automatically the types of wetland images. The method is verified in a case study involving field-measured data, and the classification results are compared with those of traditional classification methods. The results show that the proposed method can more accurately and efficiently extract different wetland types in high-resolution remote sensing images than the traditional classification methods. The proposed method will be helpful in the extension and application of wetland remote sensing technology and will provide technical support for the protection, development, and utilization of wetland resources.
Collapse
|
3
|
Hyperspectral Image Classification Based on 3D Coordination Attention Mechanism Network. REMOTE SENSING 2022. [DOI: 10.3390/rs14030608] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
In recent years, due to its powerful feature extraction ability, the deep learning method has been widely used in hyperspectral image classification tasks. However, the features extracted by classical deep learning methods have limited discrimination ability, resulting in unsatisfactory classification performance. In addition, due to the limited data samples of hyperspectral images (HSIs), how to achieve high classification performance under limited samples is also a research hotspot. In order to solve the above problems, this paper proposes a deep learning network framework named the three-dimensional coordination attention mechanism network (3DCAMNet). In this paper, a three-dimensional coordination attention mechanism (3DCAM) is designed. This attention mechanism can not only obtain the long-distance dependence of the spatial position of HSIs in the vertical and horizontal directions, but also obtain the difference of importance between different spectral bands. In order to extract the spectral and spatial information of HSIs more fully, a convolution module based on convolutional neural network (CNN) is adopted in this paper. In addition, the linear module is introduced after the convolution module, which can extract more fine advanced features. In order to verify the effectiveness of 3DCAMNet, a series of experiments were carried out on five datasets, namely, Indian Pines (IP), Pavia University (UP), Kennedy Space Center (KSC), Salinas Valley (SV), and University of Houston (HT). The OAs obtained by the proposed method on the five datasets were 95.81%, 97.01%, 99.01%, 97.48%, and 97.69% respectively, 3.71%, 9.56%, 0.67%, 2.89% and 0.11% higher than the most advanced A2S2K-ResNet. Experimental results show that, compared with some state-of-the-art methods, 3DCAMNet not only has higher classification performance, but also has stronger robustness.
Collapse
|
4
|
Chen SB, Wei QS, Wang WZ, Tang J, Luo B, Wang ZY. Remote Sensing Scene Classification via Multi-Branch Local Attention Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 31:99-109. [PMID: 34793302 DOI: 10.1109/tip.2021.3127851] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Remote sensing scene classification (RSSC) is a hotspot and play very important role in the field of remote sensing image interpretation in recent years. With the recent development of the convolutional neural networks, a significant breakthrough has been made in the classification of remote sensing scenes. Many objects form complex and diverse scenes through spatial combination and association, which makes it difficult to classify remote sensing image scenes. The problem of insufficient differentiation of feature representations extracted by Convolutional Neural Networks (CNNs) still exists, which is mainly due to the characteristics of similarity for inter-class images and diversity for intra-class images. In this paper, we propose a remote sensing image scene classification method via Multi-Branch Local Attention Network (MBLANet), where Convolutional Local Attention Module (CLAM) is embedded into all down-sampling blocks and residual blocks of ResNet backbone. CLAM contains two submodules, Convolutional Channel Attention Module (CCAM) and Local Spatial Attention Module (LSAM). The two submodules are placed in parallel to obtain both channel and spatial attentions, which helps to emphasize the main target in the complex background and improve the ability of feature representation. Extensive experiments on three benchmark datasets show that our method is better than state-of-the-art methods.
Collapse
|
5
|
Remote Sensing Image Scene Classification Based on Global Self-Attention Module. REMOTE SENSING 2021. [DOI: 10.3390/rs13224542] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The complexity of scene images makes the research on remote-sensing image scene classification challenging. With the wide application of deep learning in recent years, many remote-sensing scene classification methods using a convolutional neural network (CNN) have emerged. Current CNN usually output global information by integrating the depth features extricated from the convolutional layer through the fully connected layer; however, the global information extracted is not comprehensive. This paper proposes an improved remote-sensing image scene classification method based on a global self-attention module to address this problem. The global information is derived from the depth characteristics extracted by the CNN. In order to better express the semantic information of the remote-sensing image, the multi-head self-attention module is introduced for global information augmentation. Meanwhile, the local perception unit is utilized to improve the self-attention module’s representation capabilities for local objects. The proposed method’s effectiveness is validated through comparative experiments with various training ratios and different scales on public datasets (UC Merced, AID, and NWPU-NESISC45). The precision of our proposed model is significantly improved compared to other methods for remote-sensing image scene classification.
Collapse
|
6
|
HA-Net: A Lake Water Body Extraction Network Based on Hybrid-Scale Attention and Transfer Learning. REMOTE SENSING 2021. [DOI: 10.3390/rs13204121] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Due to the large quantity of noise and complex spatial background of the remote sensing images, how to improve the accuracy of semantic segmentation has become a hot topic. Lake water body extraction is crucial for disaster detection, resource utilization, and carbon cycle, etc. The the area of lakes on the Tibetan Plateau has been constantly changing due to the movement of the Earth’s crust. Most of the convolutional neural networks used for remote sensing images are based on single-layer features for pixel classification while ignoring the correlation of such features in different layers. In this paper, the two-branch encoder is presented, which is a multiscale structure that combines the features of ResNet-34 with a feature pyramid network. Secondly, adaptive weights are distributed to global information using the hybrid-scale attention block. Finally, PixelShuffle is used to recover the feature maps’ resolution, and the densely connected block is used to refine the boundary of the lake water body. Likewise, we transfer the best weights which are saved on the Google dataset to the Landsat-8 dataset to ensure that our proposed method is robust. We validate the superiority of Hybrid-scale Attention Network (HA-Net) on two given datasets, which were created by us using Google and Landsat-8 remote sensing images. (1) On the Google dataset, HA-Net achieves the best performance of all five evaluation metrics with a Mean Intersection over Union (MIoU) of 97.38%, which improves by 1.04% compared with DeepLab V3+, and reduces the training time by about 100 s per epoch. Moreover, the overall accuracy (OA), Recall, True Water Rate (TWR), and False Water Rate (FWR) of HA-Net are 98.88%, 98.03%, 98.24%, and 1.76% respectively. (2) On the Landsat-8 dataset, HA-Net achieves the best overall accuracy and the True Water Rate (TWR) improvement of 2.93% compared to Pre_PSPNet, which proves to be more robust than other advanced models.
Collapse
|
7
|
Nanni L, Ghidoni S, Brahnam S. Deep Features for Training Support Vector Machines. J Imaging 2021; 7:177. [PMID: 34564103 PMCID: PMC8470768 DOI: 10.3390/jimaging7090177] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Revised: 08/27/2021] [Accepted: 09/01/2021] [Indexed: 11/16/2022] Open
Abstract
Features play a crucial role in computer vision. Initially designed to detect salient elements by means of handcrafted algorithms, features now are often learned using different layers in convolutional neural networks (CNNs). This paper develops a generic computer vision system based on features extracted from trained CNNs. Multiple learned features are combined into a single structure to work on different image classification tasks. The proposed system was derived by testing several approaches for extracting features from the inner layers of CNNs and using them as inputs to support vector machines that are then combined by sum rule. Several dimensionality reduction techniques were tested for reducing the high dimensionality of the inner layers so that they can work with SVMs. The empirically derived generic vision system based on applying a discrete cosine transform (DCT) separately to each channel is shown to significantly boost the performance of standard CNNs across a large and diverse collection of image data sets. In addition, an ensemble of different topologies taking the same DCT approach and combined with global mean thresholding pooling obtained state-of-the-art results on a benchmark image virus data set.
Collapse
Affiliation(s)
- Loris Nanni
- Department of Information Engineering (DEI), University of Padova, 35131 Padova, Italy;
| | - Stefano Ghidoni
- Department of Information Engineering (DEI), University of Padova, 35131 Padova, Italy;
| | - Sheryl Brahnam
- Information Technology and Cybersecurity (ITC), Missouri State University, 901 S National, Springfield, MO 65804, USA;
| |
Collapse
|
8
|
Landscape Similarity Analysis Using Texture Encoded Deep-Learning Features on Unclassified Remote Sensing Imagery. REMOTE SENSING 2021. [DOI: 10.3390/rs13030492] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Convolutional neural networks (CNNs) are known for their ability to learn shape and texture descriptors useful for object detection, pattern recognition, and classification problems. Deeper layer filters of CNN generally learn global image information vital for whole-scene or object discrimination. In landscape pattern comparison, however, dense localized information encoded in shallow layers can contain discriminative information for characterizing changes across image local regions but are often lost in the deeper and non-spatial fully connected layers. Such localized features hold potential for identifying, as well as characterizing, process–pattern change across space and time. In this paper, we propose a simple yet effective texture-based CNN (Tex-CNN) via a feature concatenation framework which results in capturing and learning texture descriptors. The traditional CNN architecture was adopted as a baseline for assessing the performance of Tex-CNN. We utilized 75% and 25% of the image data for model training and validation, respectively. To test the models’ generalization, we used a separate set of imagery from the Aerial Imagery Dataset (AID) and Sentinel-2 for model development and independent validation. The classical CNN and the Tex-CNN classification accuracies in the AID were 91.67% and 96.33%, respectively. Tex-CNN accuracy was either on par with or outcompeted state-of-the-art methods. Independent validation on Sentinel-2 data had good performance for most scene types but had difficulty discriminating farm scenes, likely due to geometric generalization of discriminative features at the coarser scale. In both datasets, the Tex-CNN outperformed the classical CNN architecture. Using the Tex-CNN, gradient-based spatial attention maps (feature maps) which contain discriminative pattern information are extracted and subsequently employed for mapping landscape similarity. To enhance the discriminative capacity of the feature maps, we further perform spatial filtering, using PCA and select eigen maps with the top eigen values. We show that CNN feature maps provide descriptors capable of characterizing and quantifying landscape (dis)similarity. Using the feature maps histogram of oriented gradient vectors and computing their Earth Movers Distances, our method effectively identified similar landscape types with over 60% of target-reference scene comparisons showing smaller Earth Movers Distance (EMD) (e.g., 0.01), while different landscape types tended to show large EMD (e.g., 0.05) in the benchmark AID. We hope this proposal will inspire further research into the use of CNN layer feature maps in landscape similarity assessment, as well as in change detection.
Collapse
|
9
|
Xu K, Huang H, Deng P, Shi G. Two-stream feature aggregation deep neural network for scene classification of remote sensing images. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.06.011] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
10
|
Remote Sensing Scene Classification and Explanation Using RSSCNet and LIME. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10186151] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Classification is needed in disaster investigation, traffic control, and land-use resource management. How to quickly and accurately classify such remote sensing imagery has become a popular research topic. However, the application of large, deep neural network models for the training of classifiers in the hope of obtaining good classification results is often very time-consuming. In this study, a new CNN (convolutional neutral networks) architecture, i.e., RSSCNet (remote sensing scene classification network), with high generalization capability was designed. Moreover, a two-stage cyclical learning rate policy and the no-freezing transfer learning method were developed to speed up model training and enhance accuracy. In addition, the manifold learning t-SNE (t-distributed stochastic neighbor embedding) algorithm was used to verify the effectiveness of the proposed model, and the LIME (local interpretable model, agnostic explanation) algorithm was applied to improve the results in cases where the model made wrong predictions. Comparing the results of three publicly available datasets in this study with those obtained in previous studies, the experimental results show that the model and method proposed in this paper can achieve better scene classification more quickly and more efficiently.
Collapse
|
11
|
Multi-deep features fusion for high-resolution remote sensing image scene classification. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-05071-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
12
|
Residual Dense Network Based on Channel-Spatial Attention for the Scene Classification of a High-Resolution Remote Sensing Image. REMOTE SENSING 2020. [DOI: 10.3390/rs12111887] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The scene classification of a remote sensing image has been widely used in various fields as an important task of understanding the content of a remote sensing image. Specially, a high-resolution remote sensing scene contains rich information and complex content. Considering that the scene content in a remote sensing image is very tight to the spatial relationship characteristics, how to design an effective feature extraction network directly decides the quality of classification by fully mining the spatial information in a high-resolution remote sensing image. In recent years, convolutional neural networks (CNNs) have achieved excellent performance in remote sensing image classification, especially the residual dense network (RDN) as one of the representative networks of CNN, which shows a stronger feature learning ability as it fully utilizes all the convolutional layer information. Therefore, we design an RDN based on channel-spatial attention for scene classification of a high-resolution remote sensing image. First, multi-layer convolutional features are fused with residual dense blocks. Then, a channel-spatial attention module is added to obtain more effective feature representation. Finally, softmax classifier is applied to classify the scene after adopting data augmentation strategy for meeting the training requirements of the network parameters. Five experiments are conducted on the UC Merced Land-Use Dataset (UCM) and Aerial Image Dataset (AID), and the competitive results demonstrate that our method can extract more effective features and is more conducive to classifying a scene.
Collapse
|
13
|
An Improved Boundary-Aware Perceptual Loss for Building Extraction from VHR Images. REMOTE SENSING 2020. [DOI: 10.3390/rs12071195] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
With the development of deep learning technology, an enormous number of convolutional neural network (CNN) models have been proposed to address the challenging building extraction task from very high-resolution (VHR) remote sensing images. However, searching for better CNN architectures is time-consuming, and the robustness of a new CNN model cannot be guaranteed. In this paper, an improved boundary-aware perceptual (BP) loss is proposed to enhance the building extraction ability of CNN models. The proposed BP loss consists of a loss network and transfer loss functions. The usage of the boundary-aware perceptual loss has two stages. In the training stage, the loss network learns the structural information from circularly transferring between the building mask and the corresponding building boundary. In the refining stage, the learned structural information is embedded into the building extraction models via the transfer loss functions without additional parameters or postprocessing. We verify the effectiveness and efficiency of the proposed BP loss both on the challenging WHU aerial dataset and the INRIA dataset. Substantial performance improvements are observed within two representative CNN architectures: PSPNet and UNet, which are widely used on pixel-wise labelling tasks. With BP loss, UNet with ResNet101 achieves 90.78% and 76.62% on IoU (intersection over union) scores on the WHU aerial dataset and the INRIA dataset, respectively, which are 1.47% and 1.04% higher than those simply trained with the cross-entropy loss function. Additionally, similar improvements (0.64% on the WHU aerial dataset and 1.69% on the INRIA dataset) are also observed on PSPNet, which strongly supports the robustness of the proposed BP loss.
Collapse
|
14
|
An Efficient and Lightweight Convolutional Neural Network for Remote Sensing Image Scene Classification. SENSORS 2020; 20:s20071999. [PMID: 32252483 PMCID: PMC7181261 DOI: 10.3390/s20071999] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2020] [Revised: 03/31/2020] [Accepted: 04/01/2020] [Indexed: 11/23/2022]
Abstract
Classifying remote sensing images is vital for interpreting image content. Presently, remote sensing image scene classification methods using convolutional neural networks have drawbacks, including excessive parameters and heavy calculation costs. More efficient and lightweight CNNs have fewer parameters and calculations, but their classification performance is generally weaker. We propose a more efficient and lightweight convolutional neural network method to improve classification accuracy with a small training dataset. Inspired by fine-grained visual recognition, this study introduces a bilinear convolutional neural network model for scene classification. First, the lightweight convolutional neural network, MobileNetv2, is used to extract deep and abstract image features. Each feature is then transformed into two features with two different convolutional layers. The transformed features are subjected to Hadamard product operation to obtain an enhanced bilinear feature. Finally, the bilinear feature after pooling and normalization is used for classification. Experiments are performed on three widely used datasets: UC Merced, AID, and NWPU-RESISC45. Compared with other state-of-art methods, the proposed method has fewer parameters and calculations, while achieving higher accuracy. By including feature fusion with bilinear pooling, performance and accuracy for remote scene classification can greatly improve. This could be applied to any remote sensing image classification task.
Collapse
|
15
|
AttentionBased Deep Feature Fusion for the Scene Classification of HighResolution Remote Sensing Images. REMOTE SENSING 2019. [DOI: 10.3390/rs11171996] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Scene classification of highresolution remote sensing images (HRRSI) is one of the most important means of landcover classification. Deep learning techniques, especially the convolutional neural network (CNN) have been widely applied to the scene classification of HRRSI due to the advancement of graphic processing units (GPU). However, they tend to extract features from the whole images rather than discriminative regions. The visual attention mechanism can force the CNN to focus on discriminative regions, but it may suffer from the influence of intraclass diversity and repeated texture. Motivated by these problems, we propose an attention-based deep feature fusion (ADFF) framework that constitutes three parts, namely attention maps generated by Gradientweighted Class Activation Mapping (GradCAM), a multiplicative fusion of deep features and the centerbased cross-entropy loss function. First of all, we propose to make attention maps generated by GradCAM as an explicit input in order to force the network to concentrate on discriminative regions. Then, deep features derived from original images and attention maps are proposed to be fused by multiplicative fusion in order to consider both improved abilities to distinguish scenes of repeated texture and the salient regions. Finally, the centerbased cross-entropy loss function that utilizes both the cross-entropy loss and center loss function is proposed to backpropagate fused features so as to reduce the effect of intraclass diversity on feature representations. The proposed ADFF architecture is tested on three benchmark datasets to show its performance in scene classification. The experiments confirm that the proposed method outperforms most competitive scene classification methods with an average overall accuracy of 94% under different training ratios.
Collapse
|