1
|
Chang J, He X, Li P, Tian T, Cheng X, Qiao M, Zhou T, Zhang B, Chang Z, Fan T. Multi-Scale Attention Network for Building Extraction from High-Resolution Remote Sensing Images. Sensors (Basel) 2024; 24:1010. [PMID: 38339726 PMCID: PMC10857135 DOI: 10.3390/s24031010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 01/18/2024] [Accepted: 01/25/2024] [Indexed: 02/12/2024]
Abstract
The precise building extraction from high-resolution remote sensing images holds significant application for urban planning, resource management, and environmental conservation. In recent years, deep neural networks (DNNs) have garnered substantial attention for their adeptness in learning and extracting features, becoming integral to building extraction methodologies and yielding noteworthy performance outcomes. Nonetheless, prevailing DNN-based models for building extraction often overlook spatial information during the feature extraction phase. Additionally, many existing models employ a simplistic and direct approach in the feature fusion stage, potentially leading to spurious target detection and the amplification of internal noise. To address these concerns, we present a multi-scale attention network (MSANet) tailored for building extraction from high-resolution remote sensing images. In our approach, we initially extracted multi-scale building feature information, leveraging the multi-scale channel attention mechanism and multi-scale spatial attention mechanism. Subsequently, we employed adaptive hierarchical weighting processes on the extracted building features. Concurrently, we introduced a gating mechanism to facilitate the effective fusion of multi-scale features. The efficacy of the proposed MSANet was evaluated using the WHU aerial image dataset and the WHU satellite image dataset. The experimental results demonstrate compelling performance metrics, with the F1 scores registering at 93.76% and 77.64% on the WHU aerial imagery dataset and WHU satellite dataset II, respectively. Furthermore, the intersection over union (IoU) values stood at 88.25% and 63.46%, surpassing benchmarks set by DeepLabV3 and GSMC.
Collapse
Affiliation(s)
- Jing Chang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China; (J.C.); (T.T.); (T.Z.)
| | - Xiaohui He
- School of Geoscience and Technology, Zhengzhou University, Zhengzhou 450001, China; (P.L.); (X.C.); (M.Q.); (B.Z.); (Z.C.); (T.F.)
- Ecometeorology Joint Laboratory of Zhengzhou University and Chinese Academy of Meteorological Science, Zhengzhou 450001, China
| | - Panle Li
- School of Geoscience and Technology, Zhengzhou University, Zhengzhou 450001, China; (P.L.); (X.C.); (M.Q.); (B.Z.); (Z.C.); (T.F.)
| | - Ting Tian
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China; (J.C.); (T.T.); (T.Z.)
| | - Xijie Cheng
- School of Geoscience and Technology, Zhengzhou University, Zhengzhou 450001, China; (P.L.); (X.C.); (M.Q.); (B.Z.); (Z.C.); (T.F.)
| | - Mengjia Qiao
- School of Geoscience and Technology, Zhengzhou University, Zhengzhou 450001, China; (P.L.); (X.C.); (M.Q.); (B.Z.); (Z.C.); (T.F.)
| | - Tao Zhou
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China; (J.C.); (T.T.); (T.Z.)
| | - Beibei Zhang
- School of Geoscience and Technology, Zhengzhou University, Zhengzhou 450001, China; (P.L.); (X.C.); (M.Q.); (B.Z.); (Z.C.); (T.F.)
| | - Ziqian Chang
- School of Geoscience and Technology, Zhengzhou University, Zhengzhou 450001, China; (P.L.); (X.C.); (M.Q.); (B.Z.); (Z.C.); (T.F.)
| | - Tingwei Fan
- School of Geoscience and Technology, Zhengzhou University, Zhengzhou 450001, China; (P.L.); (X.C.); (M.Q.); (B.Z.); (Z.C.); (T.F.)
| |
Collapse
|
2
|
Chen L, Li J, Zou Y, Wang T. ETU-Net: edge enhancement-guided U-Net with transformer for skin lesion segmentation. Phys Med Biol 2023; 69:015001. [PMID: 38131313 DOI: 10.1088/1361-6560/ad13d2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Accepted: 12/08/2023] [Indexed: 12/23/2023]
Abstract
Objective.Convolutional neural network (CNN)-based deep learning algorithms have been widely used in recent years for automatic skin lesion segmentation. However, the limited receptive fields of convolutional architectures hinder their ability to effectively model dependencies between different image ranges. The transformer is often employed in conjunction with CNN to extract both global and local information from images, as it excels at capturing long-range dependencies. However, this method cannot accurately segment skin lesions with blurred boundaries. To overcome this difficulty, we proposed ETU-Net.Approach.ETU-Net, a novel multi-scale architecture, combines edge enhancement, CNN, and transformer. We introduce the concept of edge detection operators into difference convolution, resulting in the design of the edge enhanced convolution block (EC block) and the local transformer block (LT block), which emphasize edge features. To capture the semantic information contained in local features, we propose the multi-scale local attention block (MLA block), which utilizes convolutions with different kernel sizes. Furthermore, to address the boundary uncertainty caused by patch division in the transformer, we introduce a novel global transformer block (GT block), which allows each patch to gather full-size feature information.Main results.Extensive experimental results on three publicly available skin datasets (PH2, ISIC-2017, and ISIC-2018) demonstrate that ETU-Net outperforms state-of-the-art hybrid methods based on CNN and Transformer in terms of segmentation performance. Moreover, ETU-Net exhibits excellent generalization ability in practical segmentation applications on dermatoscopy images contributed by the Wuxi No.2 People's Hospital.Significance.We propose ETU-Net, a novel multi-scale U-Net model guided by edge enhancement, which can address the challenges posed by complex lesion shapes and ambiguous boundaries in skin lesion segmentation tasks.
Collapse
Affiliation(s)
- Lifang Chen
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, People's Republic of China
| | - Jiawei Li
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, People's Republic of China
| | - Yunmin Zou
- Department of Dermatology, Wuxi No.2 People's Hospital, Wuxi, People's Republic of China
| | - Tao Wang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, People's Republic of China
| |
Collapse
|
3
|
Hao R, Wei Z, He X, Zhu K, He J, Wang J, Li M, Zhang L, Lv Z, Zhang X, Zhang Q. Robust Point Cloud Registration Network for Complex Conditions. Sensors (Basel) 2023; 23:9837. [PMID: 38139683 PMCID: PMC10747109 DOI: 10.3390/s23249837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 12/04/2023] [Accepted: 12/06/2023] [Indexed: 12/24/2023]
Abstract
Point cloud registration is widely used in autonomous driving, SLAM, and 3D reconstruction, and it aims to align point clouds from different viewpoints or poses under the same coordinate system. However, point cloud registration is challenging in complex situations, such as a large initial pose difference, high noise, or incomplete overlap, which will cause point cloud registration failure or mismatching. To address the shortcomings of the existing registration algorithms, this paper designed a new coarse-to-fine registration two-stage point cloud registration network, CCRNet, which utilizes an end-to-end form to perform the registration task for point clouds. The multi-scale feature extraction module, coarse registration prediction module, and fine registration prediction module designed in this paper can robustly and accurately register two point clouds without iterations. CCRNet can link the feature information between two point clouds and solve the problems of high noise and incomplete overlap by using a soft correspondence matrix. In the standard dataset ModelNet40, in cases of large initial pose difference, high noise, and incomplete overlap, the accuracy of our method, compared with the second-best popular registration algorithm, was improved by 7.0%, 7.8%, and 22.7% on the MAE, respectively. Experiments showed that our CCRNet method has advantages in registration results in a variety of complex conditions.
Collapse
Affiliation(s)
- Ruidong Hao
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China (J.W.); (Q.Z.)
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhongwei Wei
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China (J.W.); (Q.Z.)
| | - Xu He
- College of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Kaifeng Zhu
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China (J.W.); (Q.Z.)
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jiawei He
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China (J.W.); (Q.Z.)
| | - Jun Wang
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China (J.W.); (Q.Z.)
| | - Muyu Li
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian 116024, China;
| | - Lei Zhang
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China (J.W.); (Q.Z.)
| | - Zhuang Lv
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China (J.W.); (Q.Z.)
| | - Xin Zhang
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China (J.W.); (Q.Z.)
| | - Qiwen Zhang
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China (J.W.); (Q.Z.)
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
4
|
Cheng H, Li H. Identification of apple leaf disease via novel attention mechanism based convolutional neural network. Front Plant Sci 2023; 14:1274231. [PMID: 37920720 PMCID: PMC10619150 DOI: 10.3389/fpls.2023.1274231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 09/19/2023] [Indexed: 11/04/2023]
Abstract
Introduction The identification of apple leaf diseases is crucial for apple production. Methods To assist farmers in promptly recognizing leaf diseases in apple trees, we propose a novel attention mechanism. Building upon this mechanism and MobileNet v3, we introduce a new deep learning network. Results and discussion Applying this network to our carefully curated dataset, we achieved an impressive accuracy of 98.7% in identifying apple leaf diseases, surpassing similar models such as EfficientNet-B0, ResNet-34, and DenseNet-121. Furthermore, the precision, recall, and f1-score of our model also outperform these models, while maintaining the advantages of fewer parameters and less computational consumption of the MobileNet network. Therefore, our model has the potential in other similar application scenarios and has broad prospects.
Collapse
Affiliation(s)
| | - Heming Li
- School of Intelligence Engineering, Shandong Management University, Jinan, China
| |
Collapse
|
5
|
Wang H, Ding J, He S, Feng C, Zhang C, Fan G, Wu Y, Zhang Y. MFBP-UNet: A Network for Pear Leaf Disease Segmentation in Natural Agricultural Environments. Plants (Basel) 2023; 12:3209. [PMID: 37765373 PMCID: PMC10537337 DOI: 10.3390/plants12183209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 08/23/2023] [Accepted: 09/04/2023] [Indexed: 09/29/2023]
Abstract
The accurate prevention and control of pear tree diseases, especially the precise segmentation of leaf diseases, poses a serious challenge to fruit farmers globally. Given the possibility of disease areas being minute with ambiguous boundaries, accurate segmentation becomes difficult. In this study, we propose a pear leaf disease segmentation model named MFBP-UNet. It is based on the UNet network architecture and integrates a Multi-scale Feature Extraction (MFE) module and a Tokenized Multilayer Perceptron (BATok-MLP) module with dynamic sparse attention. The MFE enhances the extraction of detail and semantic features, while the BATok-MLP successfully fuses regional and global attention, striking an effective balance in the extraction capabilities of both global and local information. Additionally, we pioneered the use of a diffusion model for data augmentation. By integrating and analyzing different augmentation methods, we further improved the model's training accuracy and robustness. Experimental results reveal that, compared to other segmentation networks, MFBP-UNet shows a significant improvement across all performance metrics. Specifically, MFBP-UNet achieves scores of 86.15%, 93.53%, 90.89%, and 0.922 on MIoU, MP, MPA, and Dice metrics, marking respective improvements of 5.75%, 5.79%, 1.08%, and 0.074 over the UNet model. These results demonstrate the MFBP-UNet model's superior performance and generalization capabilities in pear leaf disease segmentation and its inherent potential to address analogous challenges in natural environment segmentation tasks.
Collapse
Affiliation(s)
- Haoyu Wang
- School of Information and Computer Science, Anhui Agricultural University, Hefei 230036, China; (H.W.); (J.D.); (C.Z.); (G.F.)
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei 230036, China; (S.H.); (C.F.)
| | - Jie Ding
- School of Information and Computer Science, Anhui Agricultural University, Hefei 230036, China; (H.W.); (J.D.); (C.Z.); (G.F.)
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei 230036, China; (S.H.); (C.F.)
| | - Sifan He
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei 230036, China; (S.H.); (C.F.)
- School of Natural Science, Anhui Agricultural University, Hefei 230036, China
| | - Cheng Feng
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei 230036, China; (S.H.); (C.F.)
- School of Natural Science, Anhui Agricultural University, Hefei 230036, China
| | - Cheng Zhang
- School of Information and Computer Science, Anhui Agricultural University, Hefei 230036, China; (H.W.); (J.D.); (C.Z.); (G.F.)
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei 230036, China; (S.H.); (C.F.)
| | - Guohua Fan
- School of Information and Computer Science, Anhui Agricultural University, Hefei 230036, China; (H.W.); (J.D.); (C.Z.); (G.F.)
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei 230036, China; (S.H.); (C.F.)
| | - Yunzhi Wu
- School of Information and Computer Science, Anhui Agricultural University, Hefei 230036, China; (H.W.); (J.D.); (C.Z.); (G.F.)
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei 230036, China; (S.H.); (C.F.)
| | - Youhua Zhang
- School of Information and Computer Science, Anhui Agricultural University, Hefei 230036, China; (H.W.); (J.D.); (C.Z.); (G.F.)
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei 230036, China; (S.H.); (C.F.)
| |
Collapse
|
6
|
Qin C, Li Y, Liu C, Ma X. Cuff-Less Blood Pressure Prediction Based on Photoplethysmography and Modified ResNet. Bioengineering (Basel) 2023; 10:bioengineering10040400. [PMID: 37106587 PMCID: PMC10135940 DOI: 10.3390/bioengineering10040400] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Revised: 03/20/2023] [Accepted: 03/20/2023] [Indexed: 04/29/2023] Open
Abstract
Cardiovascular disease (CVD) has become a common health problem of mankind, and the prevalence and mortality of CVD are rising on a year-to-year basis. Blood pressure (BP) is an important physiological parameter of the human body and also an important physiological indicator for the prevention and treatment of CVD. Existing intermittent measurement methods do not fully indicate the real BP status of the human body and cannot get rid of the restraining feeling of a cuff. Accordingly, this study proposed a deep learning network based on the ResNet34 framework for continuous prediction of BP using only the promising PPG signal. The high-quality PPG signals were first passed through a multi-scale feature extraction module after a series of pre-processing to expand the perceptive field and enhance the perception ability on features. Subsequently, useful feature information was then extracted by stacking multiple residual modules with channel attention to increase the accuracy of the model. Lastly, in the training stage, the Huber loss function was adopted to stabilize the iterative process and obtain the optimal solution of the model. On a subset of the MIMIC dataset, the errors of both SBP and DBP predicted by the model met the AAMI standards, while the accuracy of DBP reached Grade A of the BHS standard, and the accuracy of SBP almost reached Grade A of the BHS standard. The proposed method verifies the potential and feasibility of PPG signals combined with deep neural networks in the field of continuous BP monitoring. Furthermore, the method is easy to deploy in portable devices, and it is more consistent with the future trend of wearable blood-pressure-monitoring devices (e.g., smartphones and smartwatches).
Collapse
Affiliation(s)
- Caijie Qin
- Institute of Information Engineering, Sanming University, Sanming 365004, China
- CBSR&NLPR, Institute of Automation, Chinese Academy of Sciences, Beijing 100049, China
| | - Yong Li
- Institute of Information Engineering, Sanming University, Sanming 365004, China
| | - Chibiao Liu
- Institute of Information Engineering, Sanming University, Sanming 365004, China
| | - Xibo Ma
- CBSR&NLPR, Institute of Automation, Chinese Academy of Sciences, Beijing 100049, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
7
|
Zhang C, Xu K, Ma Y, Wan J. GFI-Net: Global Feature Interaction Network for Monocular Depth Estimation. Entropy (Basel) 2023; 25:421. [PMID: 36981310 PMCID: PMC10047826 DOI: 10.3390/e25030421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Revised: 02/16/2023] [Accepted: 02/22/2023] [Indexed: 06/18/2023]
Abstract
Monocular depth estimation techniques are used to recover the distance from the target to the camera plane in an image scene. However, there are still several problems, such as insufficient estimation accuracy, the inaccurate localization of details, and depth discontinuity in planes parallel to the camera plane. To solve these problems, we propose the Global Feature Interaction Network (GFI-Net), which aims to utilize geometric features, such as object locations and vanishing points, on a global scale. In order to capture the interactive information of the width, height, and channel of the feature graph and expand the global information in the network, we designed a global interactive attention mechanism. The global interactive attention mechanism reduces the loss of pixel information and improves the performance of depth estimation. Furthermore, the encoder uses the Transformer to reduce coding losses and improve the accuracy of depth estimation. Finally, a local-global feature fusion module is designed to improve the depth map's representation of detailed areas. The experimental results on the NYU-Depth-v2 dataset and the KITTI dataset showed that our model achieved state-of-the-art performance with full detail recovery and depth continuation on the same plane.
Collapse
|
8
|
Sun F, Zhang X, Liu Y, Jiang H. Multi-Object Detection in Security Screening Scene Based on Convolutional Neural Network. Sensors (Basel) 2022; 22:7836. [PMID: 36298187 PMCID: PMC9611169 DOI: 10.3390/s22207836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 10/01/2022] [Accepted: 10/13/2022] [Indexed: 06/16/2023]
Abstract
The technique for target detection based on a convolutional neural network has been widely implemented in the industry. However, the detection accuracy of X-ray images in security screening scenarios still requires improvement. This paper proposes a coupled multi-scale feature extraction and multi-scale attention architecture. We integrate this architecture into the Single Shot MultiBox Detector (SSD) algorithm and find that it can significantly improve the effectiveness of target detection. Firstly, ResNet is used as the backbone network to replace the original VGG network to improve the feature extraction capability of the convolutional neural network for images. Secondly, a multi-scale feature extraction (MSE) structure is designed to enrich the information contained in the multi-stage prediction feature layer. Finally, the multi-scale attention architecture (MSA) is fused onto the prediction feature layer to eliminate the redundant features' interference and extract effective contextual information. In addition, a combination of Adaptive-NMS and Soft-NMS is used to output the final prediction anchor boxes when performing non-maximum suppression. The results of the experiments show that the improved method improves the mean average precision (mAP) value by 7.4% compared to the original approach. New modules make detection much more accurate while keeping the detection speed the same.
Collapse
|
9
|
Zhang B, Shi Y, Hou L, Yin Z, Chai C. TSMG: A Deep Learning Framework for Recognizing Human Learning Style Using EEG Signals. Brain Sci 2021; 11:brainsci11111397. [PMID: 34827396 PMCID: PMC8615788 DOI: 10.3390/brainsci11111397] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Revised: 10/20/2021] [Accepted: 10/21/2021] [Indexed: 11/16/2022] Open
Abstract
Educational theory claims that integrating learning style into learning-related activities can improve academic performance. Traditional methods to recognize learning styles are mostly based on questionnaires and online behavior analyses. These methods are highly subjective and inaccurate in terms of recognition. Electroencephalography (EEG) signals have significant potential for use in the measurement of learning style. This study uses EEG signals to design a deep-learning-based model of recognition to recognize people's learning styles with EEG features by using a non-overlapping sliding window, one-dimensional spatio-temporal convolutions, multi-scale feature extraction, global average pooling, and the group voting mechanism; this model is named the TSMG model (Temporal-Spatial-Multiscale-Global model). It solves the problem of processing EEG data of variable length, and improves the accuracy of recognition of the learning style by nearly 5% compared with prevalent methods, while reducing the cost of calculation by 41.93%. The proposed TSMG model can also recognize variable-length data in other fields. The authors also formulated a dataset of EEG signals (called the LSEEG dataset) containing features of the learning style processing dimension that can be used to test and compare models of recognition. This dataset is also conducive to the application and further development of EEG technology to recognize people's learning styles.
Collapse
Affiliation(s)
- Bingxue Zhang
- Department of Optical-Electrical & Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China; (B.Z.); (Y.S.); (Z.Y.)
| | - Yang Shi
- Department of Optical-Electrical & Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China; (B.Z.); (Y.S.); (Z.Y.)
| | - Longfeng Hou
- Department of Energy & Power Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China;
| | - Zhong Yin
- Department of Optical-Electrical & Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China; (B.Z.); (Y.S.); (Z.Y.)
| | - Chengliang Chai
- Department of Optical-Electrical & Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China; (B.Z.); (Y.S.); (Z.Y.)
- Correspondence:
| |
Collapse
|
10
|
Liao Y, Liu Q. Multi-Level and Multi-Scale Feature Aggregation Network for Semantic Segmentation in Vehicle-Mounted Scenes. Sensors (Basel) 2021; 21:s21093270. [PMID: 34065155 PMCID: PMC8126014 DOI: 10.3390/s21093270] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 04/18/2021] [Accepted: 04/30/2021] [Indexed: 11/16/2022]
Abstract
The main challenges of semantic segmentation in vehicle-mounted scenes are object scale variation and trading off model accuracy and efficiency. Lightweight backbone networks for semantic segmentation usually extract single-scale features layer-by-layer only by using a fixed receptive field. Most modern real-time semantic segmentation networks heavily compromise spatial details when encoding semantics, and sacrifice accuracy for speed. Many improving strategies adopt dilated convolution and add a sub-network, in which either intensive computation or redundant parameters are brought. We propose a multi-level and multi-scale feature aggregation network (MMFANet). A spatial pyramid module is designed by cascading dilated convolutions with different receptive fields to extract multi-scale features layer-by-layer. Subseqently, a lightweight backbone network is built by reducing the feature channel capacity of the module. To improve the accuracy of our network, we design two additional modules to separately capture spatial details and high-level semantics from the backbone network without significantly increasing the computation cost. Comprehensive experimental results show that our model achieves 79.3% MIoU on the Cityscapes test dataset at a speed of 58.5 FPS, and it is more accurate than SwiftNet (75.5% MIoU). Furthermore, the number of parameters of our model is at least 53.38% less than that of other state-of-the-art models.
Collapse
|