1
|
Mustafa NE, Alizadeh F. Unmanned aerial vehicle (UAV) images of road vehicles dataset. Data Brief 2024; 54:110264. [PMID: 38516279 PMCID: PMC10950728 DOI: 10.1016/j.dib.2024.110264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 02/02/2024] [Accepted: 02/22/2024] [Indexed: 03/23/2024] Open
Abstract
The Intelligent Transportation System (ITS) seeks to improve traffic flow to guarantee transportation safety. One of the ITS's fundamental tenets is identifying and classifying vehicles into various classes. Although the issues related to small size, variety of forms, and similarity in visual appearance of the vehicles, as well as the influence of the weather on the video and image quality, make it challenging to categorize vehicles using unmanned aerial vehicles (UAV); they are becoming more popular in computer vision-related applications. Traffic accidents are now a serious public health concern that must be addressed in the Kurdistan Region of Iraq. An automatic vehicle detection and classification system can be considered one of the remedies to solve this issue. This paper presents a dataset of 2,160 images of vehicles on the roads in the Iraqi Kurdistan Region to address the issue of the absence of such a dataset. The images in the proposed collection were taken with a Mavic Air 2 drone in the Iraqi cities of Sulaymaniyah and Erbil. The images are categorized into five classes: bus, truck, taxi, personal car, and motorcycle. Data gathering considered diverse circumstances, multiple vehicle sizes, weather and lighting conditions, and massive camera movements. Pre-processing and data augmentation methods were applied to the images in our proposed dataset, including auto-orient, brightness, hue, and noise algorithm, which can be used to build an efficient deep learning (DL) model. After applying these augmentation techniques for the car, taxi, truck, motorcycle, and bus classes, the number of images was increased to 5,353, 1,500, 1,192, 282, and 176, respectively.
Collapse
Affiliation(s)
| | - Fattah Alizadeh
- Department of Computer Science, University of Toronto Metropolitan, Toronto, Canada
| |
Collapse
|
2
|
Zhao B, Song R. Enhancing two-stage object detection models via data-driven anchor box optimization in UAV-based maritime SAR. Sci Rep 2024; 14:4765. [PMID: 38413792 PMCID: PMC10899653 DOI: 10.1038/s41598-024-55570-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 02/25/2024] [Indexed: 02/29/2024] Open
Abstract
The high-altitude imaging capabilities of Unmanned Aerial Vehicles (UAVs) offer an effective solution for maritime Search and Rescue (SAR) operations. In such missions, the accurate identification of boats, personnel, and objects within images is crucial. While object detection models trained on general image datasets can be directly applied to these tasks, their effectiveness is limited due to the unique challenges posed by the specific characteristics of maritime SAR scenarios. Addressing this challenge, our study leverages the large-scale benchmark dataset SeaDronesSee, specific to UAV-based maritime SAR, to analyze and explore the unique attributes of image data in this scenario. We identify the need for optimization in detecting specific categories of difficult-to-detect objects within this context. Building on this, an anchor box optimization strategy is proposed based on clustering analysis, aimed at enhancing the performance of the renowned two-stage object detection models in this specialized task. Experiments were conducted to validate the proposed anchor box optimization method and to explore the underlying reasons for its effectiveness. The experimental results show our optimization method achieved a 45.8% and a 10% increase in average precision over the default anchor box configurations of torchvision and the SeaDronesSee official sample code configuration respectively. This enhancement was particularly evident in the model's significantly improved ability to detect swimmers, floaters, and life jackets on boats within the SeaDronesSee dataset's SAR scenarios. The methods and findings of this study are anticipated to provide the UAV-based maritime SAR research community with valuable insights into data characteristics and model optimization, offering a meaningful reference for future research.
Collapse
Affiliation(s)
- Beigeng Zhao
- College of Public Security Information Technology and Intelligence, Criminal Investigation Police University of China, Shenyang, China.
| | - Rui Song
- College of Public Security Information Technology and Intelligence, Criminal Investigation Police University of China, Shenyang, China
| |
Collapse
|
3
|
Xi Y, Liu Y, Li T, Ding J, Zhang Y, Tarkoma S, Li Y, Hui P. A Satellite Imagery Dataset for Long-Term Sustainable Development in United States Cities. Sci Data 2023; 10:866. [PMID: 38049491 PMCID: PMC10696003 DOI: 10.1038/s41597-023-02576-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 09/15/2023] [Indexed: 12/06/2023] Open
Abstract
Cities play an important role in achieving sustainable development goals (SDGs) to promote economic growth and meet social needs. Especially satellite imagery is a potential data source for studying sustainable urban development. However, a comprehensive dataset in the United States (U.S.) covering multiple cities, multiple years, multiple scales, and multiple indicators for SDG monitoring is lacking. To support the research on SDGs in U.S. cities, we develop a satellite imagery dataset using deep learning models for five SDGs containing 25 sustainable development indicators. The proposed dataset covers the 100 most populated U.S. cities and corresponding Census Block Groups from 2014 to 2023. Specifically, we collect satellite imagery and identify objects with state-of-the-art object detection and semantic segmentation models to observe cities' bird's-eye view. We further gather population, nighttime light, survey, and built environment data to depict SDGs regarding poverty, health, education, inequality, and living environment. We anticipate the dataset to help urban policymakers and researchers to advance SDGs-related studies, especially applying satellite imagery to monitor long-term and multi-scale SDGs in cities.
Collapse
Affiliation(s)
- Yanxin Xi
- Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Yu Liu
- Beijing National Research Center for Information Science and Technology (BNRist), Beijing, P. R. China
- Department of Electronic Engineering, Tsinghua University, Beijing, P. R. China
| | - Tong Li
- Beijing National Research Center for Information Science and Technology (BNRist), Beijing, P. R. China
- Department of Electronic Engineering, Tsinghua University, Beijing, P. R. China
| | - Jingtao Ding
- Beijing National Research Center for Information Science and Technology (BNRist), Beijing, P. R. China
- Department of Electronic Engineering, Tsinghua University, Beijing, P. R. China
| | - Yunke Zhang
- Beijing National Research Center for Information Science and Technology (BNRist), Beijing, P. R. China
- Department of Electronic Engineering, Tsinghua University, Beijing, P. R. China
| | - Sasu Tarkoma
- Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Yong Li
- Beijing National Research Center for Information Science and Technology (BNRist), Beijing, P. R. China.
- Department of Electronic Engineering, Tsinghua University, Beijing, P. R. China.
| | - Pan Hui
- Department of Computer Science, University of Helsinki, Helsinki, Finland.
- Computational Media and Arts Thrust, Hong Kong University of Science and Technology (Guangzhou), Guangzhou, P. R. China.
- Division of Emerging Interdisciplinary Areas, Hong Kong University of Science and Technology, Hong Kong, P. R. China.
| |
Collapse
|
4
|
Pan Y, Yang J, Zhu L, Yao L, Zhang B. Aerial images object detection method based on cross-scale multi-feature fusion. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:16148-16168. [PMID: 37920007 DOI: 10.3934/mbe.2023721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/04/2023]
Abstract
Aerial image target detection technology has essential application value in navigation security, traffic control and environmental monitoring. Compared with natural scene images, the background of aerial images is more complex, and there are more small targets, which puts higher requirements on the detection accuracy and real-time performance of the algorithm. To further improve the detection accuracy of lightweight networks for small targets in aerial images, we propose a cross-scale multi-feature fusion target detection method (CMF-YOLOv5s) for aerial images. Based on the original YOLOv5s, a bidirectional cross-scale feature fusion sub-network (BsNet) is constructed, using a newly designed multi-scale fusion module (MFF) and cross-scale feature fusion strategy to enhance the algorithm's ability, that fuses multi-scale feature information and reduces the loss of small target feature information. To improve the problem of the high leakage detection rate of small targets in aerial images, we constructed a multi-scale detection head containing four outputs to improve the network's ability to perceive small targets. To enhance the network's recognition rate of small target samples, we improve the K-means algorithm by introducing a genetic algorithm to optimize the prediction frame size to generate anchor boxes more suitable for aerial images. The experimental results show that on the aerial image small target dataset VisDrone-2019, the proposed method can detect more small targets in aerial images with complex backgrounds. With a detection speed of 116 FPS, compared with the original algorithm, the detection accuracy metrics mAP0.5 and mAP0.5:0.95 for small targets are improved by 5.5% and 3.6%, respectively. Meanwhile, compared with eight advanced lightweight networks such as YOLOv7-Tiny and PP-PicoDet-s, mAP0.5 improves by more than 3.3%, and mAP0.5:0.95 improves by more than 1.9%.
Collapse
Affiliation(s)
- Yang Pan
- School of Electronics and Information, Xi'an Polytechnic University, Xi'an 710048, China
| | - Jinhua Yang
- School of Electronics and Information, Xi'an Polytechnic University, Xi'an 710048, China
| | - Lei Zhu
- School of Electronics and Information, Xi'an Polytechnic University, Xi'an 710048, China
| | - Lina Yao
- School of Electronics and Information, Xi'an Polytechnic University, Xi'an 710048, China
| | - Bo Zhang
- School of Electronics and Information, Xi'an Polytechnic University, Xi'an 710048, China
| |
Collapse
|
5
|
Mirzaei B, Nezamabadi-pour H, Raoof A, Derakhshani R. Small Object Detection and Tracking: A Comprehensive Review. SENSORS (BASEL, SWITZERLAND) 2023; 23:6887. [PMID: 37571664 PMCID: PMC10422231 DOI: 10.3390/s23156887] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 07/27/2023] [Accepted: 08/01/2023] [Indexed: 08/13/2023]
Abstract
Object detection and tracking are vital in computer vision and visual surveillance, allowing for the detection, recognition, and subsequent tracking of objects within images or video sequences. These tasks underpin surveillance systems, facilitating automatic video annotation, identification of significant events, and detection of abnormal activities. However, detecting and tracking small objects introduce significant challenges within computer vision due to their subtle appearance and limited distinguishing features, which results in a scarcity of crucial information. This deficit complicates the tracking process, often leading to diminished efficiency and accuracy. To shed light on the intricacies of small object detection and tracking, we undertook a comprehensive review of the existing methods in this area, categorizing them from various perspectives. We also presented an overview of available datasets specifically curated for small object detection and tracking, aiming to inform and benefit future research in this domain. We further delineated the most widely used evaluation metrics for assessing the performance of small object detection and tracking techniques. Finally, we examined the present challenges within this field and discussed prospective future trends. By tackling these issues and leveraging upcoming trends, we aim to push forward the boundaries in small object detection and tracking, thereby augmenting the functionality of surveillance systems and broadening their real-world applicability.
Collapse
Affiliation(s)
- Behzad Mirzaei
- Intelligent Data Processing Laboratory (IDPL), Department of Electrical Engineering, Shahid Bahonar University of Kerman, Kerman 76169-13439, Iran
| | - Hossein Nezamabadi-pour
- Intelligent Data Processing Laboratory (IDPL), Department of Electrical Engineering, Shahid Bahonar University of Kerman, Kerman 76169-13439, Iran
| | - Amir Raoof
- Department of Earth Sciences, Utrecht University, 3584CB Utrecht, The Netherlands
| | - Reza Derakhshani
- Department of Earth Sciences, Utrecht University, 3584CB Utrecht, The Netherlands
- Department of Geology, Shahid Bahonar University of Kerman, Kerman 76169-13439, Iran
| |
Collapse
|
6
|
Nie G, Huang H. Multi-Oriented Object Detection in Aerial Images With Double Horizontal Rectangles. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:4932-4944. [PMID: 35849674 DOI: 10.1109/tpami.2022.3191753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Most existing methods adopt the quadrilateral or rotated rectangle representation to detect multi-oriented objects. Yet, the same oriented object may correspond to several different representations, due to different vertex ordering, or angular periodicity and edge exchangeability. To ensure the uniqueness of the representation, some engineered rules are usually added. This makes these methods suffer from discontinuity problem, resulting in degraded performance for objects around some orientation. In this article, we propose to encode the multi-oriented object with double horizontal rectangles (DHRec) to solve the discontinuity problem. Specifically, for an oriented object, we arrange the horizontal and vertical coordinates of its four vertices in left-right and top-down order, respectively. The first (resp. second) horizontal box is given by two diagonal points with smallest (resp. second) and third (resp. largest) coordinates in both horizontal and vertical dimensions. We then regress three factors given by area ratios between different regions, helping to guide the oriented object decoding from the predicted DHRec. Inherited from the uniqueness of horizontal rectangle representation, the proposed method is free of discontinuity issue, and can accurately detect objects of arbitrary orientation. Extensive experimental results show that the proposed method significantly improves the existing baseline representation, and outperforms state-of-the-art methods. The code is available at: https://github.com/lightbillow/DHRec.
Collapse
|
7
|
Lu S, Lu H, Dong J, Wu S. Object Detection for UAV Aerial Scenarios Based on Vectorized IOU. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23063061. [PMID: 36991772 PMCID: PMC10054878 DOI: 10.3390/s23063061] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 03/02/2023] [Accepted: 03/10/2023] [Indexed: 05/27/2023]
Abstract
Object detection in unmanned aerial vehicle (UAV) images is an extremely challenging task and involves problems such as multi-scale objects, a high proportion of small objects, and high overlap between objects. To address these issues, first, we design a Vectorized Intersection Over Union (VIOU) loss based on YOLOv5s. This loss uses the width and height of the bounding box as a vector to construct a cosine function that corresponds to the size of the box and the aspect ratio and directly compares the center point value of the box to improve the accuracy of the bounding box regression. Second, we propose a Progressive Feature Fusion Network (PFFN) that addresses the issue of insufficient semantic extraction of shallow features by Panet. This allows each node of the network to fuse semantic information from deep layers with features from the current layer, thus significantly improving the detection ability of small objects in multi-scale scenes. Finally, we propose an Asymmetric Decoupled (AD) head, which separates the classification network from the regression network and improves the classification and regression capabilities of the network. Our proposed method results in significant improvements on two benchmark datasets compared to YOLOv5s. On the VisDrone 2019 dataset, the performance increased by 9.7% from 34.9% to 44.6%, and on the DOTA dataset, the performance increased by 2.1%.
Collapse
Affiliation(s)
- Shun Lu
- College of Big Data and Information Engineering, Guizhou University, Guiyang 550025, China
| | - Hanyu Lu
- College of Big Data and Information Engineering, Guizhou University, Guiyang 550025, China
- Bijie 5G Innovation and Application Research Institute, Guizhou University of Engineering Science, Bijie 551700, China
| | - Jun Dong
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China
- Anhui Zhongke Deji Intelligence Technology Co., Ltd., Hefei 230045, China
| | - Shuang Wu
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China
| |
Collapse
|
8
|
Wang J, Meng L, Li W, Yang W, Yu L, Xia GS. Learning to Extract Building Footprints From Off-Nadir Aerial Images. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:1294-1301. [PMID: 35344484 DOI: 10.1109/tpami.2022.3162583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Extracting building footprints from aerial images is essential for precise urban mapping with photogrammetric computer vision technologies. Existing approaches mainly assume that the roof and footprint of a building are well overlapped, which may not hold in off-nadir aerial images as there is often a big offset between them. In this paper, we propose an offset vector learning scheme, which turns the building footprint extraction problem in off-nadir images into an instance-level joint prediction problem of the building roof and its corresponding "roof to footprint" offset vector. Thus the footprint can be estimated by translating the predicted roof mask according to the predicted offset vector. We further propose a simple but effective feature-level offset augmentation module, which can significantly refine the offset vector prediction by introducing little extra cost. Moreover, a new dataset, Buildings in Off-Nadir Aerial Images (BONAI), is created and released in this paper. It contains 268,958 building instances across 3,300 aerial images with fully annotated instance-level roof, footprint, and corresponding offset vector for each building. Experiments on the BONAI dataset demonstrate that our method achieves the state-of-the-art, outperforming other competitors by 3.37 to 7.39 points in F1-score. The codes, datasets, and trained models are available at https://github.com/jwwangchn/BONAI.git.
Collapse
|
9
|
Object Localization in Weakly Labeled Remote Sensing Images Based on Deep Convolutional Features. REMOTE SENSING 2022. [DOI: 10.3390/rs14133230] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Object recognition, as one of the most fundamental and challenging problems in high-resolution remote sensing image interpretation, has received increasing attention in recent years. However, most conventional object recognition pipelines aim to recognize instances with bounding boxes in a supervised learning strategy, which require intensive and manual labor for instance annotation creation. In this paper, we propose a weakly supervised learning method to alleviate this problem. The core idea of our method is to recognize multiple objects in an image using only image-level semantic labels and indicate the recognized objects with location points instead of box extent. Specifically, a deep convolutional neural network is first trained to perform semantic scene classification, of which the result is employed for the categorical determination of objects in an image. Then, by back-propagating the categorical feature from the fully connected layer to the deep convolutional layer, the categorical and spatial information of an image are combined to obtain an object discriminative localization map, which can effectively indicate the salient regions of objects. Next, a dynamic updating method of local response extremum is proposed to further determine the locations of objects in an image. Finally, extensive experiments are conducted to localize aircraft and oiltanks in remote sensing images based on different convolutional neural networks. Experimental results show that the proposed method outperforms the-state-of-the-art methods, achieving the precision, recall, and F1-score at 94.50%, 88.79%, and 91.56% for aircraft localization and 89.12%, 83.04%, and 85.97% for oiltank localization, respectively. We hope that our work could serve as a basic reference for remote sensing object localization via a weakly supervised strategy and provide new opportunities for further research.
Collapse
|
10
|
Deep Learning-Based Automatic Detection of Ships: An Experimental Study Using Satellite Images. J Imaging 2022; 8:jimaging8070182. [PMID: 35877626 PMCID: PMC9325223 DOI: 10.3390/jimaging8070182] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 06/06/2022] [Accepted: 06/23/2022] [Indexed: 12/10/2022] Open
Abstract
The remote sensing surveillance of maritime areas represents an essential task for both security and environmental reasons. Recently, learning strategies belonging to the field of machine learning (ML) have become a niche of interest for the community of remote sensing. Specifically, a major challenge is the automatic classification of ships from satellite imagery, which is needed for traffic surveillance systems, the protection of illegal fisheries, control systems of oil discharge, and the monitoring of sea pollution. Deep learning (DL) is a branch of ML that has emerged in the last few years as a result of advancements in digital technology and data availability. DL has shown capacity and efficacy in tackling difficult learning tasks that were previously intractable. Specifically, DL methods, such as convolutional neural networks (CNNs), have been reported to be efficient in image detection and recognition applications. In this paper, we focused on the development of an automatic ship detection (ASD) approach by using DL methods for assessing the Airbus ship dataset (composed of about 40 K satellite images). The paper explores and analyzes the distinct variations of the YOLO algorithm for the detection of ships from satellite images. A comparison of different versions of YOLO algorithms for ship detection, such as YOLOv3, YOLOv4, and YOLOv5, is presented, after training them on a personal computer with a large dataset of satellite images of the Airbus Ship Challenge and Shipsnet. The differences between the algorithms could be observed on the personal computer. We have confirmed that these algorithms can be used for effective ship detection from satellite images. The conclusion drawn from the conducted research is that the YOLOv5 object detection algorithm outperforms the other versions of the YOLO algorithm, i.e., YOLOv4 and YOLOv3 in terms accuracy of 99% for YOLOv5 compared to 98% and 97% respectively for YOLOv4 and YOLOv3.
Collapse
|
11
|
Adaptive Unsupervised-Shadow-Detection Approach for Remote-Sensing Image Based on Multichannel Features. REMOTE SENSING 2022. [DOI: 10.3390/rs14122756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Shadow detection is an essential research topic in the remote-sensing domain, as the presence of shadow causes the loss of ground-object information in real areas. It is hard to define specific threshold values for the identification of shadow areas with the existing unsupervised approaches due to the complexity of remote-sensing scenes. In this study, an adaptive unsupervised-shadow-detection method based on multichannel features is proposed, which can adaptively distinguish shadow in different scenes. First, new multichannel features were designed in the hue, saturation, and intensity color space, and the shadow properties of high hue, high saturation, and low intensity were considered to solve the insufficient feature-extraction problem of shadows. Then, a dynamic local adaptive particle swarm optimization was proposed to calculate the segmentation thresholds for shadows in an adaptive manner. Finally, experiments performed on the Aerial Imagery dataset for Shadow Detection (AISD) demonstrated the superior performance of the proposed approach in comparison with traditional unsupervised shadow-detection and state-of-the-art deep-learning methods. The experimental results show that the proposed approach can detect the shadow areas in remote-sensing images more accurately and efficiently, with the F index being 82.70% on the testing images. Thus, the proposed approach has better application potential in scenarios without a large number of labeled samples.
Collapse
|
12
|
Li H, Zech J, Hong D, Ghamisi P, Schultz M, Zipf A. Leveraging OpenStreetMap and Multimodal Remote Sensing Data with Joint Deep Learning for Wastewater Treatment Plants Detection. INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION : ITC JOURNAL 2022; 110:102804. [PMID: 36338308 PMCID: PMC9626640 DOI: 10.1016/j.jag.2022.102804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 04/11/2022] [Accepted: 04/29/2022] [Indexed: 06/16/2023]
Abstract
Humans rely on clean water for their health, well-being, and various socio-economic activities. During the past few years, the COVID-19 pandemic has been a constant reminder of about the importance of hygiene and sanitation for public health. The most common approach to securing clean water supplies for this purpose is via wastewater treatment. To date, an effective method of detecting wastewater treatment plants (WWTP) accurately and automatically via remote sensing is unavailable. In this paper, we provide a solution to this task by proposing a novel joint deep learning (JDL) method that consists of a fine-tuned object detection network and a multi-task residual attention network (RAN). By leveraging OpenStreetMap (OSM) and multimodal remote sensing (RS) data, our JDL method is able to simultaneously tackle two different tasks: land use land cover (LULC) and WWTP classification. Moreover, JDL exploits the complementary effects between these tasks for a performance gain. We train JDL using 4,187 WWTP features and 4,200 LULC samples and validate the performance of the proposed method over a selected area around Stuttgart with 723 WWTP features and 1,200 LULC samples to generate an LULC classification map and a WWTP detection map. Extensive experiments conducted with different comparative methods demonstrate the effectiveness and efficiency of our JDL method in automatic WWTP detection in comparison with single-modality/single-task or traditional survey methods. Moreover, lessons learned pave the way for future works to simultaneously and effectively address multiple large-scale mapping tasks (e.g., both mapping LULC and detecting WWTP) from multimodal RS data via deep learning.
Collapse
Affiliation(s)
- Hao Li
- GIScience Chair, Institute of Geography, Heidelberg University, 69120 Heidelberg, Germany
| | - Johannes Zech
- GIScience Chair, Institute of Geography, Heidelberg University, 69120 Heidelberg, Germany
| | - Danfeng Hong
- Key Laboratory of Computational Optical Imaging Technology, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
| | - Pedram Ghamisi
- Helmholtz-Zentrum Dresden-Rossendorf, Helmholtz Institute Freiberg for Resource Technology, Machine Learning Group, D-09599 Freiberg, Saxony, Germany
- Institute of Advanced Research in Artificial Intelligence (IARAI), Landstraßer Hauptstraße 5, 1030 Vienna, Austria
| | - Michael Schultz
- GIScience Chair, Institute of Geography, Heidelberg University, 69120 Heidelberg, Germany
| | - Alexander Zipf
- GIScience Chair, Institute of Geography, Heidelberg University, 69120 Heidelberg, Germany
- HeiGIT at Heidelberg University, Schloss-Wolfsbrunnenweg 33, 69118Heidelberg, Germany
| |
Collapse
|
13
|
Abstract
Rotated object detection in aerial images is still challenging due to arbitrary orientations, large scale and aspect ratio variations, and extreme density of objects. Existing state-of-the-art rotated object detection methods mainly rely on angle-based detectors. However, angle-based detectors can easily suffer from a long-standing boundary problem. To tackle this problem, we propose a purely angle-free framework for rotated object detection, called Point RCNN. Point RCNN is a two-stage detector including both PointRPN and PointReg which are angle-free. Given an input aerial image, first, the backbone-FPN extracts hierarchical features, then, the PointRPN module generates an accurate rotated region of interests (RRoIs) by converting the learned representative points of each rotated object using the MinAreaRect function of OpenCV. Motivated by RepPoints, we designed a coarse-to-fine process to regress and refine the representative points for more accurate RRoIs. Next, based on the learned RRoIs of PointRPN, the PointReg module learns to regress and refine the corner points of each RRoI to perform more accurate rotated object detection. Finally, the final rotated bounding box of each rotated object can be attained based on the learned four corner points. In addition, aerial images are often severely unbalanced in categories, and existing rotated object detection methods almost ignore this problem. To tackle the severely unbalanced dataset problem, we propose a balanced dataset strategy. We experimentally verified that re-sampling the images of the rare categories can stabilize the training procedure and further improve the detection performance. Specifically, the performance was improved from 80.37 mAP to 80.71 mAP in DOTA-v1.0. Without unnecessary elaboration, our Point RCNN method achieved new state-of-the-art detection performance on multiple large-scale aerial image datasets, including DOTA-v1.0, DOTA-v1.5, HRSC2016, and UCAS-AOD. Specifically, in DOTA-v1.0, our Point RCNN achieved better detection performance of 80.71 mAP. In DOTA-v1.5, Point RCNN achieved 79.31 mAP, which significantly improved the performance by 2.86 mAP (from ReDet’s 76.45 to our 79.31). In HRSC2016 and UCAS-AOD, our Point RCNN achieved higher performance of 90.53 mAP and 90.04 mAP, respectively.
Collapse
|
14
|
Liu H, Jiao L, Wang R, Xie C, Du J, Chen H, Li R. WSRD-Net: A Convolutional Neural Network-Based Arbitrary-Oriented Wheat Stripe Rust Detection Method. FRONTIERS IN PLANT SCIENCE 2022; 13:876069. [PMID: 35685013 PMCID: PMC9171371 DOI: 10.3389/fpls.2022.876069] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 04/12/2022] [Indexed: 06/15/2023]
Abstract
Wheat stripe rusts are responsible for the major reduction in production and economic losses in the wheat industry. Thus, accurate detection of wheat stripe rust is critical to improving wheat quality and the agricultural economy. At present, the results of existing wheat stripe rust detection methods based on convolutional neural network (CNN) are not satisfactory due to the arbitrary orientation of wheat stripe rust, with a large aspect ratio. To address these problems, a WSRD-Net method based on CNN for detecting wheat stripe rust is developed in this study. The model is a refined single-stage rotation detector based on the RetinaNet, by adding the feature refinement module (FRM) into the rotation RetinaNet network to solve the problem of feature misalignment of wheat stripe rust with a large aspect ratio. Furthermore, we have built an oriented annotation dataset of in-field wheat stripe rust images, called the wheat stripe rust dataset 2021 (WSRD2021). The performance of WSRD-Net is compared to that of the state-of-the-art oriented object detection models, and results show that WSRD-Net can obtain 60.8% AP and 73.8% Recall on the wheat stripe rust dataset, higher than the other four oriented object detection models. Furthermore, through the comparison with horizontal object detection models, it is found that WSRD-Net outperforms horizontal object detection models on localization for corresponding disease areas.
Collapse
Affiliation(s)
- Haiyun Liu
- Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Science, Hefei, China
- Science Island Branch, University of Science and Technology of China,
Hefei, China
| | - Lin Jiao
- Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Science, Hefei, China
- School of Internet, Anhui University, Hefei, China
| | - Rujing Wang
- Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Science, Hefei, China
- Science Island Branch, University of Science and Technology of China,
Hefei, China
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, China
| | - Chengjun Xie
- Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Science, Hefei, China
- Science Island Branch, University of Science and Technology of China,
Hefei, China
| | - Jianming Du
- Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Science, Hefei, China
| | - Hongbo Chen
- Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Science, Hefei, China
- Science Island Branch, University of Science and Technology of China,
Hefei, China
| | - Rui Li
- Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Science, Hefei, China
| |
Collapse
|
15
|
Deep Learning for Archaeological Object Detection on LiDAR: New Evaluation Measures and Insights. REMOTE SENSING 2022. [DOI: 10.3390/rs14071694] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Machine Learning-based workflows are being progressively used for the automatic detection of archaeological objects (intended as below-surface sites) in remote sensing data. Despite promising results in the detection phase, there is still a lack of a standard set of measures to evaluate the performance of object detection methods, since buried archaeological sites often have distinctive shapes that set them aside from other types of objects included in mainstream remote sensing datasets (e.g., Dataset of Object deTection in Aerial images, DOTA). Additionally, archaeological research relies heavily on geospatial information when validating the output of an object detection procedure, a type of information that is not normally considered in regular machine learning validation pipelines. This paper tackles these shortcomings by introducing two novel automatic evaluation measures, namely ‘centroid-based’ and ‘pixel-based’, designed to encode the salient aspects of the archaeologists’ thinking process. To test their usability, an experiment with different object detection deep neural networks was conducted on a LiDAR dataset. The experimental results show that these two automatic measures closely resemble the semi-automatic one currently used by archaeologists and therefore can be adopted as fully automatic evaluation measures in archaeological remote sensing detection. Adoption will facilitate cross-study comparisons and close collaboration between machine learning and archaeological researchers, which in turn will encourage the development of novel human-centred archaeological object detection tools.
Collapse
|
16
|
A Complete YOLO-Based Ship Detection Method for Thermal Infrared Remote Sensing Images under Complex Backgrounds. REMOTE SENSING 2022. [DOI: 10.3390/rs14071534] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
The automatic ship detection method for thermal infrared remote sensing images (TIRSIs) is of great significance due to its broad applicability in maritime security, port management, and target searching, especially at night. Most ship detection algorithms utilize manual features to detect visible image blocks which are accurately cut, and they are limited by illumination, clouds, and atmospheric strong waves in practical applications. In this paper, a complete YOLO-based ship detection method (CYSDM) for TIRSIs under complex backgrounds is proposed. In addition, thermal infrared ship datasets were made using the SDGSAT-1 thermal imaging system. First, in order to avoid the loss of texture characteristics during large-scale deep convolution, the TIRSIs with the resolution of 30 m were up-sampled to 10 m via bicubic interpolation method. Then, complete ships with similar characteristics were selected and marked in the middle of the river, the bay, and the sea. To enrich the datasets, the gray value stretching module was also added. Finally, the improved YOLOv5 s model was used to detect the ship candidate area quickly. To reduce intra-class variation, the 4.23–7.53 aspect ratios of ships were manually selected during labeling, and 8–10.5 μm ship datasets were constructed. Test results show that the precision of the CYSDM is 98.68%, which is 9.07% higher than that of the YOLOv5s algorithm. CYSDM provides an effective reference for large-scale, all-day ship detection.
Collapse
|
17
|
RAOD: refined oriented detector with augmented feature in remote sensing images object detection. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03393-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
18
|
Imaging Parameters-Considered Slender Target Detection in Optical Satellite Images. REMOTE SENSING 2022. [DOI: 10.3390/rs14061385] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
The existing slender target detection methods based on optical satellite images are greatly affected by the satellite perspective and the solar perspective. Due to limited data sources, it is difficult to implement a fully data-driven approach. This work introduces the imaging parameters of optical satellite images, which greatly reduces the influence of the satellite perspectives and the solar perspectives, and reduces the demand for the amount of data. We improve the oriented bounding box (OBB) detector based on faster R-CNN (region convolutional neural networks) and propose an imaging parameters-considered detector (IPC-Det) which is more suitable for our task. Specifically, in the first stage, the umbra and the shadow are extracted by horizontal bounding box (HBB), respectively, and then the matching of the umbra and the shadow is realized according to the imaging parameters. In the second stage, the paired umbra and shadow features are used to complete the classification and regression, and the target is obtained by OBB. In experiments, after introducing imaging parameters, our detection accuracy is improved by 3.9% (up to 87.5%), proving that this work is a successful attempt to introduce imaging parameters for slender target detection.
Collapse
|
19
|
Huang Z, Li W, Xia XG, Tao R. A General Gaussian Heatmap Label Assignment for Arbitrary-Oriented Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:1895-1910. [PMID: 35139019 DOI: 10.1109/tip.2022.3148874] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Recently, many arbitrary-oriented object detection (AOOD) methods have been proposed and attracted widespread attention in many fields. However, most of them are based on anchor-boxes or standard Gaussian heatmaps. Such label assignment strategy may not only fail to reflect the shape and direction characteristics of arbitrary-oriented objects, but also have high parameter-tuning efforts. In this paper, a novel AOOD method called General Gaussian Heatmap Label Assignment (GGHL) is proposed. Specifically, an anchor-free object-adaptation label assignment (OLA) strategy is presented to define the positive candidates based on two-dimensional (2D) oriented Gaussian heatmaps, which reflect the shape and direction features of arbitrary-oriented objects. Based on OLA, an oriented-bounding-box (OBB) representation component (ORC) is developed to indicate OBBs and adjust the Gaussian center prior weights to fit the characteristics of different objects adaptively through neural network learning. Moreover, a joint-optimization loss (JOL) with area normalization and dynamic confidence weighting is designed to refine the misalign optimal results of different subtasks. Extensive experiments on public datasets demonstrate that the proposed GGHL improves the AOOD performance with low parameter-tuning and time costs. Furthermore, it is generally applicable to most AOOD methods to improve their performance including lightweight models on embedded platforms.
Collapse
|
20
|
Nepal U, Eslamiat H. Comparing YOLOv3, YOLOv4 and YOLOv5 for Autonomous Landing Spot Detection in Faulty UAVs. SENSORS 2022; 22:s22020464. [PMID: 35062425 PMCID: PMC8778480 DOI: 10.3390/s22020464] [Citation(s) in RCA: 45] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 01/02/2022] [Accepted: 01/04/2022] [Indexed: 02/04/2023]
Abstract
In-flight system failure is one of the major safety concerns in the operation of unmanned aerial vehicles (UAVs) in urban environments. To address this concern, a safety framework consisting of following three main tasks can be utilized: (1) Monitoring health of the UAV and detecting failures, (2) Finding potential safe landing spots in case a critical failure is detected in step 1, and (3) Steering the UAV to a safe landing spot found in step 2. In this paper, we specifically look at the second task, where we investigate the feasibility of utilizing object detection methods to spot safe landing spots in case the UAV suffers an in-flight failure. Particularly, we investigate different versions of the YOLO objection detection method and compare their performances for the specific application of detecting a safe landing location for a UAV that has suffered an in-flight failure. We compare the performance of YOLOv3, YOLOv4, and YOLOv5l while training them by a large aerial image dataset called DOTA in a Personal Computer (PC) and also a Companion Computer (CC). We plan to use the chosen algorithm on a CC that can be attached to a UAV, and the PC is used to verify the trends that we see between the algorithms on the CC. We confirm the feasibility of utilizing these algorithms for effective emergency landing spot detection and report their accuracy and speed for that specific application. Our investigation also shows that the YOLOv5l algorithm outperforms YOLOv4 and YOLOv3 in terms of accuracy of detection while maintaining a slightly slower inference speed.
Collapse
|
21
|
Rhodes RE, Cowley HP, Huang JG, Gray-Roncal W, Wester BA, Drenkow N. Benchmarking Human Performance for Visual Search of Aerial Images. Front Psychol 2021; 12:733021. [PMID: 34970183 PMCID: PMC8713551 DOI: 10.3389/fpsyg.2021.733021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 11/08/2021] [Indexed: 12/05/2022] Open
Abstract
Aerial images are frequently used in geospatial analysis to inform responses to crises and disasters but can pose unique challenges for visual search when they contain low resolution, degraded information about color, and small object sizes. Aerial image analysis is often performed by humans, but machine learning approaches are being developed to complement manual analysis. To date, however, relatively little work has explored how humans perform visual search on these tasks, and understanding this could ultimately help enable human-machine teaming. We designed a set of studies to understand what features of an aerial image make visual search difficult for humans and what strategies humans use when performing these tasks. Across two experiments, we tested human performance on a counting task with a series of aerial images and examined the influence of features such as target size, location, color, clarity, and number of targets on accuracy and search strategies. Both experiments presented trials consisting of an aerial satellite image; participants were asked to find all instances of a search template in the image. Target size was consistently a significant predictor of performance, influencing not only accuracy of selections but the order in which participants selected target instances in the trial. Experiment 2 demonstrated that the clarity of the target instance and the match between the color of the search template and the color of the target instance also predicted accuracy. Furthermore, color also predicted the order of selecting instances in the trial. These experiments establish not only a benchmark of typical human performance on visual search of aerial images but also identify several features that can influence the task difficulty level for humans. These results have implications for understanding human visual search on real-world tasks and when humans may benefit from automated approaches.
Collapse
Affiliation(s)
- Rebecca E. Rhodes
- Johns Hopkins University Applied Physics Laboratory, Laurel, MD, United States
| | | | | | | | | | - Nathan Drenkow
- Johns Hopkins University Applied Physics Laboratory, Laurel, MD, United States
| |
Collapse
|