Zhang L, Xu M, Yin J, Zhang C, Shao L. Weakly Supervised Complets Ranking for Deep Image Quality Modeling.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020;
31:5041-5054. [PMID:
32167910 DOI:
10.1109/tnnls.2019.2962548]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Despite the competitive prediction performance, recent deep image quality models suffer from the following limitations. First, it is deficiently effective to interpret and quantify the region-level quality, which contributes to global features during deep architecture training. Second, human visual perception is sensitive to compositional features (i.e., the sophisticated spatial configurations among regions), but explicitly incorporating them into a deep model is challenging. Third, the state-of-the-art deep quality models typically use rectangular image patches as inputs, but there is no evidence that these rectangles can reflect arbitrarily shaped objects, such as beaches and jungles. By defining the complet, which is a set of image segments collaboratively characterizing the spatial/geometric distribution of multiple visual elements, we propose a novel quality-modeling framework that involves two key modules: a complet ranking algorithm and a spatially-aware dual aggregation network (SDA-Net). Specifically, to describe the region-level quality features, we build complets to characterize the high-order spatial interactions among the arbitrarily shaped segments in each image. To obtain complets that are highly descriptive to image compositions, a weakly supervised complet ranking algorithm is designed by quantifying the quality of each complet. The algorithm seamlessly encodes three factors: the image-level quality discrimination, weakly supervised constraint, and complet geometry of each image. Based on the top-ranking complets, a novel multi-column convolutional neural network (CNN) called SDA-Net is designed, which supports input segments with arbitrary shapes. The key is a dual-aggregation mechanism that fuses the intracomplet deep features and the intercomplet deep features under a unified framework. Thorough experimental validations on a series of benchmark data sets demonstrated the superiority of our method.
Collapse