1
|
Peng J, Ouyang C, Peng H, Hu W, Wang Y, Jiang P. MultiFuseYOLO: Redefining Wine Grape Variety Recognition through Multisource Information Fusion. SENSORS (BASEL, SWITZERLAND) 2024; 24:2953. [PMID: 38733058 PMCID: PMC11086123 DOI: 10.3390/s24092953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 05/03/2024] [Accepted: 05/04/2024] [Indexed: 05/13/2024]
Abstract
Based on the current research on the wine grape variety recognition task, it has been found that traditional deep learning models relying only on a single feature (e.g., fruit or leaf) for classification can face great challenges, especially when there is a high degree of similarity between varieties. In order to effectively distinguish these similar varieties, this study proposes a multisource information fusion method, which is centered on the SynthDiscrim algorithm, aiming to achieve a more comprehensive and accurate wine grape variety recognition. First, this study optimizes and improves the YOLOV7 model and proposes a novel target detection and recognition model called WineYOLO-RAFusion, which significantly improves the fruit localization precision and recognition compared with YOLOV5, YOLOX, and YOLOV7, which are traditional deep learning models. Secondly, building upon the WineYOLO-RAFusion model, this study incorporated the method of multisource information fusion into the model, ultimately forming the MultiFuseYOLO model. Experiments demonstrated that MultiFuseYOLO significantly outperformed other commonly used models in terms of precision, recall, and F1 score, reaching 0.854, 0.815, and 0.833, respectively. Moreover, the method improved the precision of the hard to distinguish Chardonnay and Sauvignon Blanc varieties, which increased the precision from 0.512 to 0.813 for Chardonnay and from 0.533 to 0.775 for Sauvignon Blanc. In conclusion, the MultiFuseYOLO model offers a reliable and comprehensive solution to the task of wine grape variety identification, especially in terms of distinguishing visually similar varieties and realizing high-precision identifications.
Collapse
Affiliation(s)
- Jialiang Peng
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China; (J.P.); (C.O.); (H.P.)
| | - Cheng Ouyang
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China; (J.P.); (C.O.); (H.P.)
| | - Hao Peng
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China; (J.P.); (C.O.); (H.P.)
| | - Wenwu Hu
- College of Mechanical and Electrical Engineering, Hunan Agricultural University, Changsha 410128, China;
| | - Yi Wang
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China; (J.P.); (C.O.); (H.P.)
| | - Ping Jiang
- College of Mechanical and Electrical Engineering, Hunan Agricultural University, Changsha 410128, China;
| |
Collapse
|
2
|
Hammad M, Chelloug SA, Alayed W, El-Latif AAA. Optimizing Multimodal Scene Recognition through Mutual Information-Based Feature Selection in Deep Learning Models. APPLIED SCIENCES 2023; 13:11829. [DOI: 10.3390/app132111829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/23/2024]
Abstract
The field of scene recognition, which lies at the crossroads of computer vision and artificial intelligence, has experienced notable progress because of scholarly pursuits. This article introduces a novel methodology for scene recognition by combining convolutional neural networks (CNNs) with feature selection techniques based on mutual information (MI). The main goal of our study is to address the limitations inherent in conventional unimodal methods, with the aim of improving the precision and dependability of scene classification. The focus of our research is around the formulation of a comprehensive approach for scene detection, utilizing multimodal deep learning methodologies implemented on a solitary input image. Our work distinguishes itself by the innovative amalgamation of CNN- and MI-based feature selection. This integration provides distinct advantages and enhanced capabilities when compared to prevailing methodologies. In order to assess the effectiveness of our methodology, we performed tests on two openly accessible datasets, namely, the scene categorization dataset and the AID dataset. The results of these studies exhibited notable levels of precision, with accuracies of 100% and 98.83% achieved for the corresponding datasets. These findings surpass the performance of other established techniques. The primary objective of our end-to-end approach is to reduce complexity and resource requirements, hence creating a robust framework for the task of scene categorization. This work significantly advances the practical application of computer vision in various real-world scenarios, leading to a large improvement in the accuracy of scene recognition and interpretation.
Collapse
Affiliation(s)
- Mohamed Hammad
- EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Riyadh 11586, Saudi Arabia
- Department of Information Technology, Faculty of Computers and Information, Menoufia University, Shebin El Kom 32511, Egypt
| | - Samia Allaoua Chelloug
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia
| | - Walaa Alayed
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia
| | - Ahmed A. Abd El-Latif
- EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Riyadh 11586, Saudi Arabia
- Faculty of Informatics and Computing, Universiti Sultan Zainal Abidin (UniSZA), Besut Campus, Besut 22200, Malaysia
- Department of Mathematics and Computer Science, Faculty of Science, Menoufia University, Shebin Elkom 32511, Egypt
| |
Collapse
|
3
|
Prob-POS: A Framework for Improving Visual Explanations from Convolutional Neural Networks for Remote Sensing Image Classification. REMOTE SENSING 2022. [DOI: 10.3390/rs14133042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
During the past decades, convolutional neural network (CNN)-based models have achieved notable success in remote sensing image classification due to their powerful feature representation ability. However, the lack of explainability during the decision-making process is a common criticism of these high-capacity networks. Local explanation methods that provide visual saliency maps have attracted increasing attention as a means to surmount the barrier of explainability. However, the vast majority of research is conducted on the last convolutional layer, where the salient regions are unintelligible for partial remote sensing images, especially scenes that contain plentiful small targets or are similar to the texture image. To address these issues, we propose a novel framework called Prob-POS, which consists of the class-activation map based on the probe network (Prob-CAM) and the weighted probability of occlusion (wPO) selection strategy. The proposed probe network is a simple but effective architecture to generate elaborate explanation maps and can be applied to any layer of CNNs. The wPO is a quantified metric to evaluate the explanation effectiveness of each layer for different categories to automatically pick out the optimal explanation layer. Variational weights are taken into account to highlight the high-scoring regions in the explanation map. Experimental results on two publicly available datasets and three prevalent networks demonstrate that Prob-POS improves the faithfulness and explainability of CNNs on remote sensing images.
Collapse
|
4
|
Multi-Output Network Combining GNN and CNN for Remote Sensing Scene Classification. REMOTE SENSING 2022. [DOI: 10.3390/rs14061478] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Scene classification is an active research area in the remote sensing (RS) domain. Some categories of RS scenes, such as medium residential and dense residential scenes, would contain the same type of geographical objects but have various spatial distributions among these objects. The adjacency and disjointness relationships among geographical objects are normally neglected by existing RS scene classification methods using convolutional neural networks (CNNs). In this study, a multi-output network (MopNet) combining a graph neural network (GNN) and a CNN is proposed for RS scene classification with a joint loss. In a candidate RS image for scene classification, superpixel regions are constructed through image segmentation and are represented as graph nodes, while graph edges between nodes are created according to the spatial adjacency among corresponding superpixel regions. A training strategy of a jointly learning CNN and GNN is adopted in the MopNet. Through the message propagation mechanism of MopNet, spatial and topological relationships imbedded in the edges of graphs are employed. The parameters of the CNN and GNN in MopNet are updated simultaneously with the guidance of a joint loss via the backpropagation mechanism. Experimental results on the OPTIMAL-31 and aerial image dataset (AID) datasets show that the proposed MopNet combining a graph convolutional network (GCN) or graph attention network (GAT) and ResNet50 achieves state-of-the-art accuracy. The overall accuracy obtained on OPTIMAL-31 is 96.06% and those on AID are 95.53% and 97.11% under training ratios of 20% and 50%, respectively. Spatial and topological relationships imbedded in RS images are helpful for improving the performance of scene classification.
Collapse
|
5
|
Remote Sensing Scene Image Classification Based on Self-Compensating Convolution Neural Network. REMOTE SENSING 2022. [DOI: 10.3390/rs14030545] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
In recent years, convolution neural networks (CNNs) have been widely used in the field of remote sensing scene image classification. However, CNN models with good classification performance tend to have high complexity, and CNN models with low complexity are difficult to obtain high classification accuracy. These models hardly achieve a good trade-off between classification accuracy and model complexity. To solve this problem, we made the following three improvements and proposed a lightweight modular network model. First, we proposed a lightweight self-compensated convolution (SCC). Although traditional convolution can effectively extract features from the input feature map, when there are a large number of filters (such as 512 or 1024 common filters), this process takes a long time. To speed up the network without increasing the computational load, we proposed a self-compensated convolution. The core idea of this convolution is to perform traditional convolution by reducing the number of filters, and then compensate the convoluted channels by input features. It incorporates shallow features into the deep and complex features, which helps to improve the speed and classification accuracy of the model. In addition, we proposed a self-compensating bottleneck module (SCBM) based on the self-compensating convolution. The wider channel shortcut in this module facilitates more shallow information to be transferred to the deeper layer and improves the feature extraction ability of the model. Finally, we used the proposed self-compensation bottleneck module to construct a lightweight and modular self-compensation convolution neural network (SCCNN) for remote sensing scene image classification. The network is built by reusing bottleneck modules with the same structure. A lot of experiments were carried out on six open and challenging remote sensing image scene datasets. The experimental results show that the classification performance of the proposed method is superior to some of the state-of-the-art classification methods with less parameters.
Collapse
|
6
|
UFS-LSTM: unsupervised feature selection with long short-term memory network for remote sensing scene classification. EVOLUTIONARY INTELLIGENCE 2021. [DOI: 10.1007/s12065-021-00660-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
7
|
A Convolutional Neural Network Based on Grouping Structure for Scene Classification. REMOTE SENSING 2021. [DOI: 10.3390/rs13132457] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Convolutional neural network (CNN) is capable of automatically extracting image features and has been widely used in remote sensing image classifications. Feature extraction is an important and difficult problem in current research. In this paper, data augmentation for avoiding over fitting was attempted to enrich features of samples to improve the performance of a newly proposed convolutional neural network with UC-Merced and RSI-CB datasets for remotely sensed scene classifications. A multiple grouped convolutional neural network (MGCNN) for self-learning that is capable of promoting the efficiency of CNN was proposed, and the method of grouping multiple convolutional layers capable of being applied elsewhere as a plug-in model was developed. Meanwhile, a hyper-parameter C in MGCNN is introduced to probe into the influence of different grouping strategies for feature extraction. Experiments on the two selected datasets, the RSI-CB dataset and UC-Merced dataset, were carried out to verify the effectiveness of this newly proposed convolutional neural network, the accuracy obtained by MGCNN was 2% higher than the ResNet-50. An algorithm of attention mechanism was thus adopted and incorporated into grouping processes and a multiple grouped attention convolutional neural network (MGCNN-A) was therefore constructed to enhance the generalization capability of MGCNN. The additional experiments indicate that the incorporation of the attention mechanism to MGCNN slightly improved the accuracy of scene classification, but the robustness of the proposed network was enhanced considerably in remote sensing image classifications.
Collapse
|
8
|
Unsupervised Adversarial Domain Adaptation with Error-Correcting Boundaries and Feature Adaption Metric for Remote-Sensing Scene Classification. REMOTE SENSING 2021. [DOI: 10.3390/rs13071270] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Unsupervised domain adaptation (UDA) based on adversarial learning for remote-sensing scene classification has become a research hotspot because of the need to alleviating the lack of annotated training data. Existing methods train classifiers according to their ability to distinguish features from source or target domains. However, they suffer from the following two limitations: (1) the classifier is trained on source samples and forms a source-domain-specific boundary, which ignores features from the target domain and (2) semantically meaningful features are merely built from the adversary of a generator and a discriminator, which ignore selecting the domain invariant features. These issues limit the distribution matching performance of source and target domains, since each domain has its distinctive characteristic. To resolve these issues, we propose a framework with error-correcting boundaries and feature adaptation metric. Specifically, we design an error-correcting boundaries mechanism to build target-domain-specific classifier boundaries via multi-classifiers and error-correcting discrepancy loss, which significantly distinguish target samples and reduce their distinguished uncertainty. Then, we employ a feature adaptation metric structure to enhance the adaptation of ambiguous features via shallow layers of the backbone convolutional neural network and alignment loss, which automatically learns domain invariant features. The experimental results on four public datasets outperform other UDA methods of remote-sensing scene classification.
Collapse
|
9
|
Individual Sick Fir Tree (Abies mariesii) Identification in Insect Infested Forests by Means of UAV Images and Deep Learning. REMOTE SENSING 2021. [DOI: 10.3390/rs13020260] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Insect outbreaks are a recurrent natural phenomenon in forest ecosystems expected to increase due to climate change. Recent advances in Unmanned Aerial Vehicles (UAV) and Deep Learning (DL) Networks provide us with tools to monitor them. In this study we used nine orthomosaics and normalized Digital Surface Models (nDSM) to detect and classify healthy and sick Maries fir trees as well as deciduous trees. This study aims at automatically classifying treetops by means of a novel computer vision treetops detection algorithm and the adaptation of existing DL architectures. Considering detection alone, the accuracy results showed 85.70% success. In terms of detection and classification, we were able to detect/classify correctly 78.59% of all tree classes (39.64% for sick fir). However, with data augmentation, detection/classification percentage of the sick fir class rose to 73.01% at the cost of the result accuracy of all tree classes that dropped 63.57%. The implementation of UAV, computer vision and DL techniques contribute to the development of a new approach to evaluate the impact of insect outbreaks in forest.
Collapse
|
10
|
Ship Classification Based on Attention Mechanism and Multi-Scale Convolutional Neural Network for Visible and Infrared Images. ELECTRONICS 2020. [DOI: 10.3390/electronics9122022] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Visible image quality is very susceptible to changes in illumination, and there are limitations in ship classification using images acquired by a single sensor. This study proposes a ship classification method based on an attention mechanism and multi-scale convolutional neural network (MSCNN) for visible and infrared images. First, the features of visible and infrared images are extracted by a two-stream symmetric multi-scale convolutional neural network module, and then concatenated to make full use of the complementary features present in multi-modal images. After that, the attention mechanism is applied to the concatenated fusion features to emphasize local details areas in the feature map, aiming to further improve feature representation capability of the model. Lastly, attention weights and the original concatenated fusion features are added element by element and fed into fully connected layers and Softmax output layer for final classification output. Effectiveness of the proposed method is verified on a visible and infrared spectra (VAIS) dataset, which shows 93.81% accuracy in classification results. Compared with other state-of-the-art methods, the proposed method could extract features more effectively and has better overall classification performance.
Collapse
|
11
|
Detection of Invasive Species in Wetlands: Practical DL with Heavily Imbalanced Data. REMOTE SENSING 2020. [DOI: 10.3390/rs12203431] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Deep Learning (DL) has become popular due to its ease of use and accuracy, with Transfer Learning (TL) effectively reducing the number of images needed to solve environmental problems. However, this approach has some limitations which we set out to explore: Our goal is to detect the presence of an invasive blueberry species in aerial images of wetlands. This is a key problem in ecosystem protection which is also challenging in terms of DL due to the severe imbalance present in the data. Results for the ResNet50 network show a high classification accuracy while largely ignoring the blueberry class, rendering these results of limited practical interest to detect that specific class. Moreover, by using loss function weighting and data augmentation results more akin to our practical application, our goals can be obtained. Our experiments regarding TL show that ImageNet weights do not produce satisfactory results when only the final layer of the network is trained. Furthermore, only minor gains are obtained compared with random weights when the whole network is retrained. Finally, in a study of state-of-the-art DL architectures best results were obtained by the ResNeXt architecture with 93.75 True Positive Rate and 98.11 accuracy for the Blueberry class with ResNet50, Densenet, and wideResNet obtaining close results.
Collapse
|
12
|
Dual Path Attention Net for Remote Sensing Semantic Image Segmentation. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2020. [DOI: 10.3390/ijgi9100571] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Semantic segmentation plays an important role in being able to understand the content of remote sensing images. In recent years, deep learning methods based on Fully Convolutional Networks (FCNs) have proved to be effective for the sematic segmentation of remote sensing images. However, the rich information and complex content makes the training of networks for segmentation challenging, and the datasets are necessarily constrained. In this paper, we propose a Convolutional Neural Network (CNN) model called Dual Path Attention Network (DPA-Net) that has a simple modular structure and can be added to any segmentation model to enhance its ability to learn features. Two types of attention module are appended to the segmentation model, one focusing on spatial information the other focusing upon the channel. Then, the outputs of these two attention modules are fused to further improve the network’s ability to extract features, thus contributing to more precise segmentation results. Finally, data pre-processing and augmentation strategies are used to compensate for the small number of datasets and uneven distribution. The proposed network was tested on the Gaofen Image Dataset (GID). The results show that the network outperformed U-Net, PSP-Net, and DeepLab V3+ in terms of the mean IoU by 0.84%, 2.54%, and 1.32%, respectively.
Collapse
|