1
|
Abstract
Recognizing objects in images requires complex skills that involve knowledge about the context and the ability to identify the borders of the objects. In computer vision, this task is called semantic segmentation and it pertains to the classification of each pixel in an image. The task is of main importance in many real-life scenarios: in autonomous vehicles, it allows the identification of objects surrounding the vehicle; in medical diagnosis, it improves the ability of early detecting of dangerous pathologies and thus mitigates the risk of serious consequences. In this work, we propose a new ensemble method able to solve the semantic segmentation task. The model is based on convolutional neural networks (CNNs) and transformers. An ensemble uses many different models whose predictions are aggregated to form the output of the ensemble system. The performance and quality of the ensemble prediction are strongly connected with some factors; one of the most important is the diversity among individual models. In our approach, this is enforced by adopting different loss functions and testing different data augmentations. We developed the proposed method by combining DeepLabV3+, HarDNet-MSEG, and Pyramid Vision Transformers. The developed solution was then assessed through an extensive empirical evaluation in five different scenarios: polyp detection, skin detection, leukocytes recognition, environmental microorganism detection, and butterfly recognition. The model provides state-of-the-art results.
Collapse
|
2
|
A Block Shuffle Network with Superpixel Optimization for Landsat Image Semantic Segmentation. REMOTE SENSING 2022. [DOI: 10.3390/rs14061432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
In recent years, with the development of deep learning in remotely sensed big data, semantic segmentation has been widely used in large-scale landcover classification. Landsat imagery has the advantages of wide coverage, easy acquisition, and good quality. However, there are two significant challenges for the semantic segmentation of mid-resolution remote sensing images: the insufficient feature extraction capability of deep convolutional neural network (DCNN); low edge contour accuracy. In this paper, we propose a block shuffle module to enhance the feature extraction capability of DCNN, a differentiable superpixel branch to optimize the feature of small objects and the accuracy of edge contours, and a self-boosting method to fuse semantic information and edge contour information to further optimize the fine-grained edge contour. We label three sets of Landsat landcover classification datasets, and achieved an overall accuracy of 86.3%, 83.2%, and 73.4% on the three datasets, respectively. Compared with other mainstream semantic segmentation networks, our proposed block shuffle network achieves state-of-the-art performance, and has good generalization ability.
Collapse
|
3
|
Unsupervised Deep Learning for Landslide Detection from Multispectral Sentinel-2 Imagery. REMOTE SENSING 2021. [DOI: 10.3390/rs13224698] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
This paper proposes a new approach based on an unsupervised deep learning (DL) model for landslide detection. Recently, supervised DL models using convolutional neural networks (CNN) have been widely studied for landslide detection. Even though these models provide robust performance and reliable results, they depend highly on a large labeled dataset for their training step. As an alternative, in this paper, we developed an unsupervised learning model by employing a convolutional auto-encoder (CAE) to deal with the problem of limited labeled data for training. The CAE was used to learn and extract the abstract and high-level features without using training data. To assess the performance of the proposed approach, we used Sentinel-2 imagery and a digital elevation model (DEM) to map landslides in three different case studies in India, China, and Taiwan. Using minimum noise fraction (MNF) transformation, we reduced the multispectral dimension to three features containing more than 80% of scene information. Next, these features were stacked with slope data and NDVI as inputs to the CAE model. The Huber reconstruction loss was used to evaluate the inputs. We achieved reconstruction losses ranging from 0.10 to 0.147 for the MNF features, slope, and NDVI stack for all three study areas. The mini-batch K-means clustering method was used to cluster the features into two to five classes. To evaluate the impact of deep features on landslide detection, we first clustered a stack of MNF features, slope, and NDVI, then the same ones plus with the deep features. For all cases, clustering based on deep features provided the highest precision, recall, F1-score, and mean intersection over the union in landslide detection.
Collapse
|
4
|
High-Resolution Boundary Refined Convolutional Neural Network for Automatic Agricultural Greenhouses Extraction from GaoFen-2 Satellite Imageries. REMOTE SENSING 2021. [DOI: 10.3390/rs13214237] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Agricultural greenhouses (AGs) are an important component of modern facility agriculture, and accurately mapping and dynamically monitoring their distribution are necessary for agricultural scientific management and planning. Semantic segmentation can be adopted for AG extraction from remote sensing images. However, the feature maps obtained by traditional deep convolutional neural network (DCNN)-based segmentation algorithms blur spatial details and insufficient attention is usually paid to contextual representation. Meanwhile, the maintenance of the original morphological characteristics, especially the boundaries, is still a challenge for precise identification of AGs. To alleviate these problems, this paper proposes a novel network called high-resolution boundary refined network (HBRNet). In this method, we design a new backbone with multiple paths based on HRNetV2 aiming to preserve high spatial resolution and improve feature extraction capability, in which the Pyramid Cross Channel Attention (PCCA) module is embedded to residual blocks to strengthen the interaction of multiscale information. Moreover, the Spatial Enhancement (SE) module is employed to integrate the contextual information of different scales. In addition, we introduce the Spatial Gradient Variation (SGV) unit in the Boundary Refined (BR) module to couple the segmentation task and boundary learning task, so that they can share latent high-level semantics and interact with each other, and combine this with the joint loss to refine the boundary. In our study, GaoFen-2 remote sensing images in Shouguang City, Shandong Province, China are selected to make the AG dataset. The experimental results show that HBRNet demonstrates a significant improvement in segmentation performance up to an IoU score of 94.89%, implying that this approach has advantages and potential for precise identification of AGs.
Collapse
|
5
|
Dual Attention Feature Fusion and Adaptive Context for Accurate Segmentation of Very High-Resolution Remote Sensing Images. REMOTE SENSING 2021. [DOI: 10.3390/rs13183715] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Land cover classification of high-resolution remote sensing images aims to obtain pixel-level land cover understanding, which is often modeled as semantic segmentation of remote sensing images. In recent years, convolutional network (CNN)-based land cover classification methods have achieved great advancement. However, previous methods fail to generate fine segmentation results, especially for the object boundary pixels. In order to obtain boundary-preserving predictions, we first propose to incorporate spatially adapting contextual cues. In this way, objects with similar appearance can be effectively distinguished with the extracted global contextual cues, which are very helpful to identify pixels near object boundaries. On this basis, low-level spatial details and high-level semantic cues are effectively fused with the help of our proposed dual attention mechanism. Concretely, when fusing multi-level features, we utilize the dual attention feature fusion module based on both spatial and channel attention mechanisms to relieve the influence of the large gap, and further improve the segmentation accuracy of pixels near object boundaries. Extensive experiments were carried out on the ISPRS 2D Semantic Labeling Vaihingen data and GaoFen-2 data to demonstrate the effectiveness of our proposed method. Our method achieves better performance compared with other state-of-the-art methods.
Collapse
|
6
|
A Novel Hybrid Approach Based on Deep CNN Features to Detect Knee Osteoarthritis. SENSORS 2021; 21:s21186189. [PMID: 34577402 PMCID: PMC8471198 DOI: 10.3390/s21186189] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 09/09/2021] [Accepted: 09/10/2021] [Indexed: 11/17/2022]
Abstract
In the recent era, various diseases have severely affected the lifestyle of individuals, especially adults. Among these, bone diseases, including Knee Osteoarthritis (KOA), have a great impact on quality of life. KOA is a knee joint problem mainly produced due to decreased Articular Cartilage between femur and tibia bones, producing severe joint pain, effusion, joint movement constraints and gait anomalies. To address these issues, this study presents a novel KOA detection at early stages using deep learning-based feature extraction and classification. Firstly, the input X-ray images are preprocessed, and then the Region of Interest (ROI) is extracted through segmentation. Secondly, features are extracted from preprocessed X-ray images containing knee joint space width using hybrid feature descriptors such as Convolutional Neural Network (CNN) through Local Binary Patterns (LBP) and CNN using Histogram of oriented gradient (HOG). Low-level features are computed by HOG, while texture features are computed employing the LBP descriptor. Lastly, multi-class classifiers, that is, Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbour (KNN), are used for the classification of KOA according to the Kellgren-Lawrence (KL) system. The Kellgren-Lawrence system consists of Grade I, Grade II, Grade III, and Grade IV. Experimental evaluation is performed on various combinations of the proposed framework. The experimental results show that the HOG features descriptor provides approximately 97% accuracy for the early detection and classification of KOA for all four grades of KL.
Collapse
|
7
|
Multi-Resolution Supervision Network with an Adaptive Weighted Loss for Desert Segmentation. REMOTE SENSING 2021. [DOI: 10.3390/rs13112054] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Desert segmentation of remote sensing images is the basis of analysis of desert area. Desert images are usually characterized by large image size, large-scale change, and irregular location distribution of surface objects. The multi-scale fusion method is widely used in the existing deep learning segmentation models to solve the above problems. Based on the idea of multi-scale feature extraction, this paper took the segmentation results of each scale as an independent optimization task and proposed a multi-resolution supervision network (MrsSeg) to further improve the desert segmentation result. Due to the different optimization difficulty of each branch task, we also proposed an auxiliary adaptive weighted loss function (AWL) to automatically optimize the training process. MrsSeg first used a lightweight backbone to extract different-resolution features, then adopted a multi-resolution fusion module to fuse the local information and global information, and finally, a multi-level fusion decoder was used to aggregate and merge the features at different levels to get the desert segmentation result. In this method, each branch loss was treated as an independent task, AWL was proposed to calculate and adjust the weight of each branch. By giving priority to the easy tasks, the improved loss function could effectively improve the convergence speed of the model and the desert segmentation result. The experimental results showed that MrsSeg-AWL effectively improved the learning ability of the model and has faster convergence speed, lower parameter complexity, and more accurate segmentation results.
Collapse
|
8
|
LaeNet: A Novel Lightweight Multitask CNN for Automatically Extracting Lake Area and Shoreline from Remote Sensing Images. REMOTE SENSING 2020. [DOI: 10.3390/rs13010056] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Variations of lake area and shoreline can indicate hydrological and climatic changes effectively. Accordingly, how to automatically and simultaneously extract lake area and shoreline from remote sensing images attracts our attention. In this paper, we formulate lake area and shoreline extraction as a multitask learning problem. Different from existing models that take the deep and complex network architecture as the backbone to extract feature maps, we present LaeNet—a novel end-to-end lightweight multitask fully CNN with no-downsampling to automatically extract lake area and shoreline from remote sensing images. Landsat-8 images over Selenco and the vicinity in the Tibetan Plateau are utilized to train and evaluate our model. Experimental results over the testing image patches achieve an Accuracy of 0.9962, Precision of 0.9912, Recall of 0.9982, F1-score of 0.9941, and mIoU of 0.9879, which align with the mainstream semantic segmentation models (UNet, DeepLabV3+, etc.) or even better. Especially, the running time of each epoch and the size of our model are only 6 s and 0.047 megabytes, which achieve a significant reduction compared to the other models. Finally, we conducted fieldwork to collect the in-situ shoreline position for one typical part of lake Selenco, in order to further evaluate the performance of our model. The validation indicates high accuracy in our results (DRMSE: 30.84 m, DMAE: 22.49 m, DSTD: 21.11 m), only about one pixel deviation for Landsat-8 images. LaeNet can be expanded potentially to the tasks of area segmentation and edge extraction in other application fields.
Collapse
|
9
|
Object Detection and Image Segmentation with Deep Learning on Earth Observation Data: A Review—Part II: Applications. REMOTE SENSING 2020. [DOI: 10.3390/rs12183053] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
In Earth observation (EO), large-scale land-surface dynamics are traditionally analyzed by investigating aggregated classes. The increase in data with a very high spatial resolution enables investigations on a fine-grained feature level which can help us to better understand the dynamics of land surfaces by taking object dynamics into account. To extract fine-grained features and objects, the most popular deep-learning model for image analysis is commonly used: the convolutional neural network (CNN). In this review, we provide a comprehensive overview of the impact of deep learning on EO applications by reviewing 429 studies on image segmentation and object detection with CNNs. We extensively examine the spatial distribution of study sites, employed sensors, used datasets and CNN architectures, and give a thorough overview of applications in EO which used CNNs. Our main finding is that CNNs are in an advanced transition phase from computer vision to EO. Upon this, we argue that in the near future, investigations which analyze object dynamics with CNNs will have a significant impact on EO research. With a focus on EO applications in this Part II, we complete the methodological review provided in Part I.
Collapse
|
10
|
Multi-Scale Residual Deep Network for Semantic Segmentation of Buildings with Regularizer of Shape Representation. REMOTE SENSING 2020. [DOI: 10.3390/rs12182932] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
It is challenging for semantic segmentation of buildings based on high-resolution remote sensing images, given high variability of appearance and complicated backgrounds of the buildings and their images. In this communication, we proposed an ensemble multi-scale residual deep learning method with the regularizer of shape representation for semantic segmentation of buildings. Based on the U-Net architecture using residual connections and multi-scale ASPP (atrous spatial pyramid pooling) modules, our method introduced the regularizer of shape representation and ensemble learning of multi-scale models to enhance model training and reduce over-fitting. In our method, the shape representation was coded in an antoencoder that was used to encode and reconstruct the shape characteristics of the buildings. In prediction, we consider multi-scale trained models for different resolution inputs and side effects to obtain an optimal semantic segmentation. With the high-resolution image of the Changshan, an island county in China, we used two-thirds of the study region image to train the model and the remaining one-third for the independent test. We obtained the accuracy of 0.98–0.99, mean intersection over union (MIoU) of 0.91–0.93 and Jaccard coefficient of 0.89–0.92 in validation. In the independent test, our method achieved state-of-the-art performance (MIoU: 0.83; Jaccard index: 0.81). By comparing with the existing representative methods on four different data sets, the proposed method consistently improved the learning process and generalization. The study shows important contributions of ensemble learning of multi-scale residual models and regularizer of shape representation to semantic segmentation of buildings.
Collapse
|
11
|
Performance Analysis of Deep Convolutional Autoencoders with Different Patch Sizes for Change Detection from Burnt Areas. REMOTE SENSING 2020. [DOI: 10.3390/rs12162576] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Fire is one of the primary sources of damages to natural environments globally. Estimates show that approximately 4 million km2 of land burns yearly. Studies have shown that such estimates often underestimate the real extent of burnt land, which highlights the need to find better, state-of-the-art methods to detect and classify these areas. This study aimed to analyze the use of deep convolutional Autoencoders in the classification of burnt areas, considering different sample patch sizes. A simple Autoencoder and the U-Net and ResUnet architectures were evaluated. We collected Landsat 8 OLI+ data from three scenes in four consecutive dates to detect the changes specifically in the form of burnt land. The data were sampled according to four different sampling strategies to evaluate possible performance changes related to sampling window sizes. The training stage used two scenes, while the validation stage used the remaining scene. The ground truth change mask was created using the Normalized Burn Ratio (NBR) spectral index through a thresholding approach. The classifications were evaluated according to the F1 index, Kappa index, and mean Intersection over Union (mIoU) value. Results have shown that the U-Net and ResUnet architectures offered the best classifications with average F1, Kappa, and mIoU values of approximately 0.96, representing excellent classification results. We have also verified that a sampling window size of 256 by 256 pixels offered the best results.
Collapse
|
12
|
Identification of Salt Deposits on Seismic Images Using Deep Learning Method for Semantic Segmentation. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2020. [DOI: 10.3390/ijgi9010024] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Several areas of Earth that are rich in oil and natural gas also have huge deposits of salt below the surface. Because of this connection, knowing precise locations of large salt deposits is extremely important to companies involved in oil and gas exploration. To locate salt bodies, professional seismic imaging is needed. These images are analyzed by human experts which leads to very subjective and highly variable renderings. To motivate automation and increase the accuracy of this process, TGS-NOPEC Geophysical Company (TGS) has sponsored a Kaggle competition that was held in the second half of 2018. The competition was very popular, gathering 3221 individuals and teams. Data for the competition included a training set of 4000 seismic image patches and corresponding segmentation masks. The test set contained 18,000 seismic image patches used for evaluation (all images are 101 × 101 pixels). Depth information of the sample location was also provided for every seismic image patch. The method presented in this paper is based on the author’s participation and it relies on training a deep convolutional neural network (CNN) for semantic segmentation. The architecture of the proposed network is inspired by the U-Net model in combination with ResNet and DenseNet architectures. To better comprehend the properties of the proposed architecture, a series of experiments were conducted applying standardized approaches using the same training framework. The results showed that the proposed architecture is comparable and, in most cases, better than these segmentation models.
Collapse
|