Zheng F, Wang X, Wang L, Zhang X, Zhu H, Wang L, Zhang H. A Fine-Grained Semantic Alignment Method Specific to Aggregate Multi-Scale Information for Cross-Modal Remote Sensing Image Retrieval.
SENSORS (BASEL, SWITZERLAND) 2023;
23:8437. [PMID:
37896530 PMCID:
PMC10610807 DOI:
10.3390/s23208437]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 09/21/2023] [Accepted: 10/11/2023] [Indexed: 10/29/2023]
Abstract
Due to the swift growth in the scale of remote sensing imagery, scholars have progressively directed their attention towards achieving efficient and adaptable cross-modal retrieval for remote sensing images. They have also steadily tackled the distinctive challenge posed by the multi-scale attributes of these images. However, existing studies primarily concentrate on the characterization of these features, neglecting the comprehensive investigation of the complex relationship between multi-scale targets and the semantic alignment of these targets with text. To address this issue, this study introduces a fine-grained semantic alignment method that adequately aggregates multi-scale information (referred to as FAAMI). The proposed approach comprises multiple stages. Initially, we employ a computing-friendly cross-layer feature connection method to construct a multi-scale feature representation of an image. Subsequently, we devise an efficient feature consistency enhancement module to rectify the incongruous semantic discrimination observed in cross-layer features. Finally, a shallow cross-attention network is employed to capture the fine-grained semantic relationship between multiple-scale image regions and the corresponding words in the text. Extensive experiments were conducted using two datasets: RSICD and RSITMD. The results demonstrate that the performance of FAAMI surpasses that of recently proposed advanced models in the same domain, with significant improvements observed in R@K and other evaluation metrics. Specifically, the mR values achieved by FAAMI are 23.18% and 35.99% for the two datasets, respectively.
Collapse