1
|
Chen X, Zhu G, Liu M, Chen Z. Few-shot remote sensing image scene classification based on multiscale covariance metric network (MCMNet). Neural Netw 2023; 163:132-145. [PMID: 37044028 DOI: 10.1016/j.neunet.2023.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 03/07/2023] [Accepted: 04/02/2023] [Indexed: 04/08/2023]
Abstract
Few-shot learning (FSL) is a paradigm that simulates the fast learning ability of human beings, which can learn the feature differences between two groups of small-scale samples with common label space, and the label space of the training set and the test set is not repeated. By this way, it can quickly identify the categories of the unseen image in the test set. This method is widely used in image scene recognition, and it is expected to overcome difficulties of scarce annotated samples in remote sensing (RS). However, among most existing FSL methods, images were embed into Euclidean space, and the similarity between features at the last layer of deep network were measured by Euclidean distance. It is difficult to measure the inter-class similarity and intra-class difference of RS images. In this paper, we propose a multi-scale covariance network (MCMNet) for the application of remote sensing scene classification (RSSC). Taking Conv64F as the backbone, we mapped the features of the 1, 2, and 4 layers of the network to the manifold space by constructing a regional covariance matrix to form a covariance network with different scales. For each layer of features, we introduce the center in manifold space as a prototype for different categories of features. We simultaneously measure the similarity of three prototypes on the manifold space with different scales to form three loss functions and optimize the whole network by episodic training strategy. We conducted comparative experiments on three public datasets. The results show that the classification accuracy (CA) of our proposed method is from 1.35 % to 2.36% higher than that of the most excellent method, which demonstrates that the performance of MCMNet outperforms other methods.
Collapse
|
2
|
Liang C, Serge A, Zhang X, Wang H, Wang W. Assessment of street forest characteristics in four African cities using google street view measurement: Potentials and implications. ENVIRONMENTAL RESEARCH 2023; 221:115261. [PMID: 36657594 DOI: 10.1016/j.envres.2023.115261] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 12/30/2022] [Accepted: 01/09/2023] [Indexed: 06/17/2023]
Abstract
Accurate information on urban forests of tree sizes, health state, community structures, and spatial distribution is still limited in African cities. Using a Google Street View (GSV)-based tree-size measuring method developed by our team, this paper aims to evaluate street trees of four African metropolitan cities using GSV data. The study compiled a large dataset with 46,016 street trees in 3454 sites in Kampala, Nairobi, Bloemfontein, and Johannesburg. The data including tree size (diameter at breast height, DBH; tree height, TH; underbranch height, UBH; canopy size), tree floristic composition (apical dominance types, broadleaf-conifer-palm leaf, flowering or not), tree health (leaf color, diebacks, dead tree, and bracket-supporting percent), streetside development (lane number, roadside shops, parking vehicle, and pedestrian density), and geolocation (latitude, longitude). These data can be spatially visualized with the help of ArcGIS, and the large dataset favors reliable maps from the street-view level. Data statistics showed that four cities were dominated by broad-leaved, apical dominance, and flowering trees, with a low level of unhealthy leaves and a tiny percentage of dead. The arbor-shrubs-herb structure vegetation dominated all four cities. Kampala had the most slender trees (DBH = 23 cm, TH = 8.4 m), while Nairobi and Johannesburg had the thickest trees (DBH = 38 cm, TH = 8.5-8.6 m). Bare land rates were lowest at 23% in Bloemfontein and highest at 33% in Nairobi. Principal analysis and Pearson correlations showed that these tree variations were closely associated with street development and local land use configuration. By comparing the urban tree data in other regions of the world, we found that the trees in African cities are generally giant but have a lower density (the trees within a 100-m street segment). Our findings emphasized that GSV data is feasible enough for urban forest monitoring in Africa, and the database is helpful for urban landscape planning and management.
Collapse
Affiliation(s)
- Chentao Liang
- Key Laboratory of Forest Plant Ecology (MOE), Heilongjiang Provincial Key Laboratory of Ecological Utilization of Forestry-based Active Substances, College of Chemistry, Chemical Engineering and Resource Utilization, Northeast Forestry University, Harbin, 150040, China
| | - Angali Serge
- Key Laboratory of Forest Plant Ecology (MOE), Heilongjiang Provincial Key Laboratory of Ecological Utilization of Forestry-based Active Substances, College of Chemistry, Chemical Engineering and Resource Utilization, Northeast Forestry University, Harbin, 150040, China
| | - Xu Zhang
- Key Laboratory of Forest Plant Ecology (MOE), Heilongjiang Provincial Key Laboratory of Ecological Utilization of Forestry-based Active Substances, College of Chemistry, Chemical Engineering and Resource Utilization, Northeast Forestry University, Harbin, 150040, China
| | - Huimei Wang
- State Key Laboratory of Subtropical Silviculture, College of Forestry and Biotechnology, Zhejiang A&F University, Hangzhou, 311300, Zhejiang, China.
| | - Wenjie Wang
- Key Laboratory of Forest Plant Ecology (MOE), Heilongjiang Provincial Key Laboratory of Ecological Utilization of Forestry-based Active Substances, College of Chemistry, Chemical Engineering and Resource Utilization, Northeast Forestry University, Harbin, 150040, China; Urban forests and wetlands group, Northeast Institute of Geography and Agroecology, Changchun 130102, China; State Key Laboratory of Subtropical Silviculture, College of Forestry and Biotechnology, Zhejiang A&F University, Hangzhou, 311300, Zhejiang, China.
| |
Collapse
|
3
|
Self-supervised learning for remote sensing scene classification under the few shot scenario. Sci Rep 2023; 13:433. [PMID: 36624136 PMCID: PMC9829684 DOI: 10.1038/s41598-022-27313-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Accepted: 12/29/2022] [Indexed: 01/11/2023] Open
Abstract
Scene classification is a crucial research problem in remote sensing (RS) that has attracted many researchers recently. It has many challenges due to multiple issues, such as: the complexity of remote sensing scenes, the classes overlapping (as a scene may contain objects that belong to foreign classes), and the difficulty of gaining sufficient labeled scenes. Deep learning (DL) solutions and in particular convolutional neural networks (CNN) are now state-of-the-art solution in RS scene classification; however, CNN models need huge amounts of annotated data, which can be costly and time-consuming. On the other hand, it is relatively easy to acquire large amounts of unlabeled images. Recently, Self-Supervised Learning (SSL) is proposed as a method that can learn from unlabeled images, potentially reducing the need for labeling. In this work, we propose a deep SSL method, called RS-FewShotSSL, for RS scene classification under the few shot scenario when we only have a few (less than 20) labeled scenes per class. Under this scenario, typical DL solutions that fine-tune CNN models, pre-trained on the ImageNet dataset, fail dramatically. In the SSL paradigm, a DL model is pre-trained from scratch during the pretext task using the large amounts of unlabeled scenes. Then, during the main or the so-called downstream task, the model is fine-tuned on the labeled scenes. Our proposed RS-FewShotSSL solution is composed of an online network and a target network both using the EfficientNet-B3 CNN model as a feature encoder backbone. During the pretext task, RS-FewShotSSL learns discriminative features from the unlabeled images using cross-view contrastive learning. Different views are generated from each image using geometric transformations and passed to the online and target networks. Then, the whole model is optimized by minimizing the cross-view distance between the online and target networks. To address the problem of limited computation resources available to us, our proposed method uses a novel DL architecture that can be trained using both high-resolution and low-resolution images. During the pretext task, RS-FewShotSSL is trained using low-resolution images, thereby, allowing for larger batch sizes which significantly boosts the performance of the proposed pipeline on the task of RS classification. In the downstream task, the target network is discarded, and the online network is fine-tuned using the few labeled shots or scenes. Here, we use smaller batches of both high-resolution and low-resolution images. This architecture allows RS-FewshotSSL to benefit from both large batch sizes and full image sizes, thereby learning from the large amounts of unlabeled data in an effective way. We tested RS-FewShotSSL on three RS public datasets, and it demonstrated a significant improvement compared to other state-of-the-art methods such as: SimCLR, MoCo, BYOL and IDSSL.
Collapse
|
4
|
A Lightweight Self-Supervised Representation Learning Algorithm for Scene Classification in Spaceborne SAR and Optical Images. REMOTE SENSING 2022. [DOI: 10.3390/rs14132956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Despite the increasing amount of spaceborne synthetic aperture radar (SAR) images and optical images, only a few annotated data can be used directly for scene classification tasks based on convolution neural networks (CNNs). For this situation, self-supervised learning methods can improve scene classification accuracy through learning representations from extensive unlabeled data. However, existing self-supervised scene classification algorithms are hard to deploy on satellites, due to the high computation consumption. To address this challenge, we propose a simple, yet effective, self-supervised representation learning (Lite-SRL) algorithm for the scene classification task. First, we design a lightweight contrastive learning structure for Lite-SRL, we apply a stochastic augmentation strategy to obtain augmented views from unlabeled spaceborne images, and Lite-SRL maximizes the similarity of augmented views to learn valuable representations. Then, we adopt the stop-gradient operation to make Lite-SRL’s training process not rely on large queues or negative samples, which can reduce the computation consumption. Furthermore, in order to deploy Lite-SRL on low-power on-board computing platforms, we propose a distributed hybrid parallelism (DHP) framework and a computation workload balancing (CWB) module for Lite-SRL. Experiments on representative datasets including OpenSARUrban, WHU-SAR6, NWPU-Resisc45, and AID dataset demonstrate that Lite-SRL can improve the scene classification accuracy under limited annotated data, and it is generalizable to both SAR and optical images. Meanwhile, compared with six state-of-the-art self-supervised algorithms, Lite-SRL has clear advantages in overall accuracy, number of parameters, memory consumption, and training latency. Eventually, to evaluate the proposed work’s on-board operational capability, we transplant Lite-SRL to the low-power computing platform NVIDIA Jetson TX2.
Collapse
|
5
|
Geographic Scene Understanding of High-Spatial-Resolution Remote Sensing Images: Methodological Trends and Current Challenges. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12126000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
As one of the primary means of Earth observation, high-spatial-resolution remote sensing images can describe the geometry, texture and structure of objects in detail. It has become a research hotspot to recognize the semantic information of objects, analyze the semantic relationship between objects and then understand the more abstract geographic scenes in high-spatial-resolution remote sensing images. Based on the basic connotation of geographic scene understanding of high-spatial-resolution remote sensing images, this paper firstly summarizes the keystones in geographic scene understanding, such as various semantic hierarchies, complex spatial structures and limited labeled samples. Then, the achievements in the processing strategies and techniques of geographic scene understanding in recent years are reviewed from three layers: visual semantics, object semantics and concept semantics. On this basis, the new challenges in the research of geographic scene understanding of high-spatial-resolution remote sensing images are analyzed, and future research prospects have been proposed.
Collapse
|
6
|
Two-Stream Swin Transformer with Differentiable Sobel Operator for Remote Sensing Image Classification. REMOTE SENSING 2022. [DOI: 10.3390/rs14061507] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Remote sensing (RS) image classification has attracted much attention recently and is widely used in various fields. Different to natural images, the RS image scenes consist of complex backgrounds and various stochastically arranged objects, thus making it difficult for networks to focus on the target objects in the scene. However, conventional classification methods do not have any special treatment for remote sensing images. In this paper, we propose a two-stream swin transformer network (TSTNet) to address these issues. TSTNet consists of two streams (i.e., original stream and edge stream) which use both the deep features of the original images and the ones from the edges to make predictions. The swin transformer is used as the backbone of each stream given its good performance. In addition, a differentiable edge Sobel operator module (DESOM) is included in the edge stream which can learn the parameters of Sobel operator adaptively and provide more robust edge information that can suppress background noise. Experimental results on three publicly available remote sensing datasets show that our TSTNet achieves superior performance over the state-of-the-art (SOTA) methods.
Collapse
|
7
|
Abstract
Remote sensing scene classification remains challenging due to the complexity and variety of scenes. With the development of attention-based methods, Convolutional Neural Networks (CNNs) have achieved competitive performance in remote sensing scene classification tasks. As an important method of the attention-based model, the Transformer has achieved great success in the field of natural language processing. Recently, the Transformer has been used for computer vision tasks. However, most existing methods divide the original image into multiple patches and encode the patches as the input of the Transformer, which limits the model’s ability to learn the overall features of the image. In this paper, we propose a new remote sensing scene classification method, Remote Sensing Transformer (TRS), a powerful “pure CNNs → Convolution + Transformer → pure Transformers” structure. First, we integrate self-attention into ResNet in a novel way, using our proposed Multi-Head Self-Attention layer instead of 3 × 3 spatial revolutions in the bottleneck. Then we connect multiple pure Transformer encoders to further improve the representation learning performance completely depending on attention. Finally, we use a linear classifier for classification. We train our model on four public remote sensing scene datasets: UC-Merced, AID, NWPU-RESISC45, and OPTIMAL-31. The experimental results show that TRS exceeds the state-of-the-art methods and achieves higher accuracy.
Collapse
|
8
|
UFS-LSTM: unsupervised feature selection with long short-term memory network for remote sensing scene classification. EVOLUTIONARY INTELLIGENCE 2021. [DOI: 10.1007/s12065-021-00660-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
9
|
Unsupervised Domain Adaption for High-Resolution Coastal Land Cover Mapping with Category-Space Constrained Adversarial Network. REMOTE SENSING 2021. [DOI: 10.3390/rs13081493] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Coastal land cover mapping (CLCM) across image domains presents a fundamental and challenging segmentation task. Although adversaries-based domain adaptation methods have been proposed to address this issue, they always implement distribution alignment via a global discriminator while ignoring the data structure. Additionally, the low inter-class variances and intricate spatial details of coastal objects may entail poor presentation. Therefore, this paper proposes a category-space constrained adversarial method to execute category-level adaptive CLCM. Focusing on the underlying category information, we introduce a category-level adversarial framework to align semantic features. We summarize two diverse strategies to extract category-wise domain labels for source and target domains, where the latter is driven by self-supervised learning. Meanwhile, we generalize the lightweight adaptation module to multiple levels across a robust baseline, aiming to fine-tune the features at different spatial scales. Furthermore, the self-supervised learning approach is also leveraged as an improvement strategy to optimize the result within segmented training. We examine our method on two converse adaptation tasks and compare them with other state-of-the-art models. The overall visualization results and evaluation metrics demonstrate that the proposed method achieves excellent performance in the domain adaptation CLCM with high-resolution remotely sensed images.
Collapse
|
10
|
Unsupervised Adversarial Domain Adaptation with Error-Correcting Boundaries and Feature Adaption Metric for Remote-Sensing Scene Classification. REMOTE SENSING 2021. [DOI: 10.3390/rs13071270] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Unsupervised domain adaptation (UDA) based on adversarial learning for remote-sensing scene classification has become a research hotspot because of the need to alleviating the lack of annotated training data. Existing methods train classifiers according to their ability to distinguish features from source or target domains. However, they suffer from the following two limitations: (1) the classifier is trained on source samples and forms a source-domain-specific boundary, which ignores features from the target domain and (2) semantically meaningful features are merely built from the adversary of a generator and a discriminator, which ignore selecting the domain invariant features. These issues limit the distribution matching performance of source and target domains, since each domain has its distinctive characteristic. To resolve these issues, we propose a framework with error-correcting boundaries and feature adaptation metric. Specifically, we design an error-correcting boundaries mechanism to build target-domain-specific classifier boundaries via multi-classifiers and error-correcting discrepancy loss, which significantly distinguish target samples and reduce their distinguished uncertainty. Then, we employ a feature adaptation metric structure to enhance the adaptation of ambiguous features via shallow layers of the backbone convolutional neural network and alignment loss, which automatically learns domain invariant features. The experimental results on four public datasets outperform other UDA methods of remote-sensing scene classification.
Collapse
|
11
|
RS-SSKD: Self-Supervision Equipped with Knowledge Distillation for Few-Shot Remote Sensing Scene Classification. SENSORS 2021; 21:s21051566. [PMID: 33668138 PMCID: PMC7956409 DOI: 10.3390/s21051566] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 02/16/2021] [Accepted: 02/19/2021] [Indexed: 11/16/2022]
Abstract
While growing instruments generate more and more airborne or satellite images, the bottleneck in remote sensing (RS) scene classification has shifted from data limits toward a lack of ground truth samples. There are still many challenges when we are facing unknown environments, especially those with insufficient training data. Few-shot classification offers a different picture under the umbrella of meta-learning: digging rich knowledge from a few data are possible. In this work, we propose a method named RS-SSKD for few-shot RS scene classification from a perspective of generating powerful representation for the downstream meta-learner. Firstly, we propose a novel two-branch network that takes three pairs of original-transformed images as inputs and incorporates Class Activation Maps (CAMs) to drive the network mining, the most relevant category-specific region. This strategy ensures that the network generates discriminative embeddings. Secondly, we set a round of self-knowledge distillation to prevent overfitting and boost the performance. Our experiments show that the proposed method surpasses current state-of-the-art approaches on two challenging RS scene datasets: NWPU-RESISC45 and RSD46-WHU. Finally, we conduct various ablation experiments to investigate the effect of each component of the proposed method and analyze the training time of state-of-the-art methods and ours.
Collapse
|
12
|
A Dual-Model Architecture with Grouping-Attention-Fusion for Remote Sensing Scene Classification. REMOTE SENSING 2021. [DOI: 10.3390/rs13030433] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Remote sensing images contain complex backgrounds and multi-scale objects, which pose a challenging task for scene classification. The performance is highly dependent on the capacity of the scene representation as well as the discriminability of the classifier. Although multiple models possess better properties than a single model on these aspects, the fusion strategy for these models is a key component to maximize the final accuracy. In this paper, we construct a novel dual-model architecture with a grouping-attention-fusion strategy to improve the performance of scene classification. Specifically, the model employs two different convolutional neural networks (CNNs) for feature extraction, where the grouping-attention-fusion strategy is used to fuse the features of the CNNs in a fine and multi-scale manner. In this way, the resultant feature representation of the scene is enhanced. Moreover, to address the issue of similar appearances between different scenes, we develop a loss function which encourages small intra-class diversities and large inter-class distances. Extensive experiments are conducted on four scene classification datasets include the UCM land-use dataset, the WHU-RS19 dataset, the AID dataset, and the OPTIMAL-31 dataset. The experimental results demonstrate the superiority of the proposed method in comparison with the state-of-the-arts.
Collapse
|
13
|
Dias Abreu G, Pires LF, Campos LCD, Goliatt L. Multitask Learning for Predicting Natural Flows: A Case Study at Paraiba do Sul River. PROGRESS IN ARTIFICIAL INTELLIGENCE 2021. [DOI: 10.1007/978-3-030-86230-5_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
14
|
Abstract
Distracted driving behavior has become a leading cause of vehicle crashes. This paper proposes a data augmentation method for distracted driving detection based on the driving operation area. First, the class activation mapping method is used to show the key feature areas of driving behavior analysis, and then the driving operation areas are detected by the faster R-CNN detection model for data augmentation. Finally, the convolutional neural network classification mode is implemented and evaluated to detect the original dataset and the driving operation area dataset. The classification result achieves a 96.97% accuracy using the distracted driving dataset. The results show the necessity of driving operation area extraction in the preprocessing stage, which can effectively remove the redundant information in the images to get a higher classification accuracy rate. The method of this research can be used to detect drivers in actual application scenarios to identify dangerous driving behaviors, which helps to give early warning of unsafe driving behaviors and avoid accidents.
Collapse
|