1
|
Zhang Y, Ding K, Li N, Wang H, Huang X, Kuo CCJ. Perceptually Weighted Rate Distortion Optimization for Video-Based Point Cloud Compression. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:5933-5947. [PMID: 37903048 DOI: 10.1109/tip.2023.3327003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/01/2023]
Abstract
Dynamic point cloud is a volumetric visual data representing realistic 3D scenes for virtual reality and augmented reality applications. However, its large data volume has been the bottleneck of data processing, transmission, and storage, which requires effective compression. In this paper, we propose a Perceptually Weighted Rate-Distortion Optimization (PWRDO) scheme for Video-based Point Cloud Compression (V-PCC), which aims to minimize the perceptual distortion of reconstructed point cloud at the given bit rate. Firstly, we propose a general framework of perceptually optimized V-PCC to exploit visual redundancies in point clouds. Secondly, a multi-scale Projection based Point Cloud quality Metric (PPCM) is proposed to measure the perceptual quality of 3D point cloud. The PPCM model comprises 3D-to-2D patch projection, multi-scale structural distortion measurement, and fusion model. Approximations and simplifications of the proposed PPCM are also presented for both V-PCC integration and low complexity. Thirdly, based on the simplified PPCM model, we propose a PWRDO scheme with Lagrange multiplier adaptation, which is incorporated into the V-PCC to enhance the coding efficiency. Experimental results show that the proposed PPCM models can be used as standalone quality metrics, and they are able to achieve higher consistency with the human subjective scores than the state-of-the-art objective visual quality metrics. Also, compared with the latest V-PCC reference model, the proposed PWRDO-based V-PCC scheme achieves an average bit rate reduction of 13.52%, 8.16%, 10.56% and 9.54%, respectively, in terms of four objective visual quality metrics for point clouds. It is significantly superior to the state-of-the-art coding algorithms. The computational complexity of the proposed PWRDO increases by 1.71% and 0.05% on average to the V-PCC encoder and decoder, respectively, which is negligible. The source codes of the PPCM and PWRDO schemes are available at https://github.com/VVCodec/PPCM-PWRDO.
Collapse
|
2
|
Wu C, He G, Lai X, Li Y. MPCNet: Compressed multi-view video restoration via motion-parallax complementation network. Neural Netw 2023; 167:601-614. [PMID: 37713766 DOI: 10.1016/j.neunet.2023.08.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Revised: 07/21/2023] [Accepted: 08/21/2023] [Indexed: 09/17/2023]
Abstract
The performance in restoring compressed multi-view video (MVV) of the existing learning-based methods is limited because they only utilize information of temporally adjacent frames or parallax neighboring views. However, the compression artifacts caused by multi-view coding (MVC) may be related to the reference errors of intra-frame, inter-frame, and inter-view. In this paper, with delicately utilizing the stereo information from both temporal and parallax domains, a motion-parallax complementation network (MPCNet) is proposed to restore the quality of compressed MVV more efficiently. First, we introduce a motion-parallax complementation strategy consisting of a coarse stage and a fine stage. By mutually compensating the feature extracted from multiple domains, useful multi-frame information can be efficiently preserved and aggregated step by step. Second, an attention-based feature filtering and modulation module (AFFM) is proposed, which provides an efficient fusion method for two features by suppressing misleading information. By deploying it in most submodules of the proposed approach, the representational ability of MPCNet can be improved, resulting in a more substantial restoration performance. Experimental results prove the effectiveness of MPCNet by an average increase of 1.978 dB in PSNR, and 0.0282 in MS-SSIM. The BD-rate reduction can reach 47.342% on average. The subjective quality is greatly improved and lots of compression distortions are eliminated. Meanwhile, this work also benefits the accuracy improvement for high-level vision tasks, e.g., mIoU of semantic segmentation and mAP of object detection achieve 0.352 and 51.71, respectively. Quantitative and qualitative analyses demonstrate that MPCNet outperforms state-of-the-art approaches.
Collapse
Affiliation(s)
- Chang Wu
- School of Electronic Engineering, Xidian University, Xi'an, Shaanxi 710071, China
| | - Gang He
- School of Telecommunications Engineering, Xidian University, Xi'an, Shaanxi 710071, China.
| | - Xinquan Lai
- School of Electronic Engineering, Xidian University, Xi'an, Shaanxi 710071, China
| | - Yunsong Li
- School of Telecommunications Engineering, Xidian University, Xi'an, Shaanxi 710071, China
| |
Collapse
|
3
|
Liu W, Ma L, Qiu B, Cui M. Stereoscopic view synthesis with progressive structure reconstruction and scene constraints. PLoS One 2022; 17:e0279249. [PMID: 36534690 PMCID: PMC9762595 DOI: 10.1371/journal.pone.0279249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2022] [Accepted: 12/04/2022] [Indexed: 12/23/2022] Open
Abstract
Depth image-based rendering (DIBR) is an important technology in the process of 2D-to-3D conversion. It uses texture images and related depth maps to render virtual views. While there are still some challenging problems in the current DIBR systems, such as disocclusion occurrences. Inpainting methods based on deep learning have recently shown significant improvements and generated plausible images. However, most of these methods may not deal well with the disocclusion holes in the synthesized views, because on the one hand they only treat this issue as generative inpainting after 3D warping, rather than following the full DIBR processing procedures. While on the other hand the distributions of holes on the virtual views are always around the transition regions of foreground and background, which makes them more difficult to distinguish without special constraints. Motivated by these observations, this paper proposes a novel learning-based method for stereoscopic view synthesis, in which the disocclusion regions are restored by a progressive structure reconstruction strategy instead of direct texture inpainting. Additionally, some special cues in the synthesized scenes are further exploited as constraints for the network to alleviate hallucinated structure mixtures among different layers. Extensive empirical evaluations and comparisons validate the strengths of the proposed approach and demonstrate that the model is more suitable for stereoscopic synthesis in the 2D-to-3D conversion applications.
Collapse
Affiliation(s)
- Wei Liu
- Electromechanic Engineering College, Nanyang Normal University, Nanyang, Henan, China
- * E-mail: (WL); (BQ)
| | - Liyan Ma
- Computer Engineering and Science College, Shanghai University, Shanghai, China
| | - Bo Qiu
- Electronic and Information Engineering College, Hebei University of Technology, Tianjin, China
- * E-mail: (WL); (BQ)
| | - Mingyue Cui
- Electromechanic Engineering College, Nanyang Normal University, Nanyang, Henan, China
| |
Collapse
|
4
|
Zhang H, Cao J, Zheng D, Yao X, Ling BWK. Deep Learning-Based Synthesized View Quality Enhancement with DIBR Distortion Mask Prediction Using Synthetic Images. SENSORS (BASEL, SWITZERLAND) 2022; 22:8127. [PMID: 36365828 PMCID: PMC9656180 DOI: 10.3390/s22218127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Revised: 10/20/2022] [Accepted: 10/20/2022] [Indexed: 06/16/2023]
Abstract
Recently, deep learning-based image quality enhancement models have been proposed to improve the perceptual quality of distorted synthesized views impaired by compression and the Depth Image-Based Rendering (DIBR) process in a multi-view video system. However, due to the lack of Multi-view Video plus Depth (MVD) data, the training data for quality enhancement models is small, which limits the performance and progress of these models. Augmenting the training data to enhance the synthesized view quality enhancement (SVQE) models is a feasible solution. In this paper, a deep learning-based SVQE model using more synthetic synthesized view images (SVIs) is suggested. To simulate the irregular geometric displacement of DIBR distortion, a random irregular polygon-based SVI synthesis method is proposed based on existing massive RGB/RGBD data, and a synthetic synthesized view database is constructed, which includes synthetic SVIs and the DIBR distortion mask. Moreover, to further guide the SVQE models to focus more precisely on DIBR distortion, a DIBR distortion mask prediction network which could predict the position and variance of DIBR distortion is embedded into the SVQE models. The experimental results on public MVD sequences demonstrate that the PSNR performance of the existing SVQE models, e.g., DnCNN, NAFNet, and TSAN, pre-trained on NYU-based synthetic SVIs could be greatly promoted by 0.51-, 0.36-, and 0.26 dB on average, respectively, while the MPPSNRr performance could also be elevated by 0.86, 0.25, and 0.24 on average, respectively. In addition, by introducing the DIBR distortion mask prediction network, the SVI quality obtained by the DnCNN and NAFNet pre-trained on NYU-based synthetic SVIs could be further enhanced by 0.02- and 0.03 dB on average in terms of the PSNR and 0.004 and 0.121 on average in terms of the MPPSNRr.
Collapse
|
5
|
Lei J, Zhang Z, Pan Z, Liu D, Liu X, Chen Y, Ling N. Disparity-Aware Reference Frame Generation Network for Multiview Video Coding. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:4515-4526. [PMID: 35727785 DOI: 10.1109/tip.2022.3183436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Multiview video coding (MVC) aims to compress the multiview video through the elimination of video redundancies, where the quality of the reference frame directly affects the compression efficiency. In this paper, we propose a deep virtual reference frame generation method based on a disparity-aware reference frame generation network (DAG-Net) to transform the disparity relationship between different viewpoints and generate a more reliable reference frame. The proposed DAG-Net consists of a multi-level receptive field module, a disparity-aware alignment module, and a fusion reconstruction module. First, a multi-level receptive field module is designed to enlarge the receptive field, and extract the multi-scale deep features of the temporal and inter-view reference frames. Then, a disparity-aware alignment module is proposed to learn the disparity relationship, and perform disparity shift on the inter-view reference frame to align it with the temporal reference frame. Finally, a fusion reconstruction module is utilized to fuse the complementary information and generate a more reliable virtual reference frame. Experiments demonstrate that the proposed reference frame generation method achieves superior performance for multiview video coding.
Collapse
|
6
|
Sheng X, Li L, Liu D, Xiong Z. Attribute Artifacts Removal for Geometry-Based Point Cloud Compression. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:3399-3413. [PMID: 35503831 DOI: 10.1109/tip.2022.3170722] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Geometry-based point cloud compression (G-PCC) can achieve remarkable compression efficiency for point clouds. However, it still leads to serious attribute compression artifacts, especially under low bitrate scenarios. In this paper, we propose a Multi-Scale Graph Attention Network (MS-GAT) to remove the artifacts of point cloud attributes compressed by G-PCC. We first construct a graph based on point cloud geometry coordinates and then use the Chebyshev graph convolutions to extract features of point cloud attributes. Considering that one point may be correlated with points both near and far away from it, we propose a multi-scale scheme to capture the short- and long-range correlations between the current point and its neighboring and distant points. To address the problem that various points may have different degrees of artifacts caused by adaptive quantization, we introduce the quantization step per point as an extra input to the proposed network. We also incorporate a weighted graph attentional layer into the network to pay special attention to the points with more attribute artifacts. To the best of our knowledge, this is the first attribute artifacts removal method for G-PCC. We validate the effectiveness of our method over various point clouds. Objective comparison results show that our proposed method achieves an average of 9.74% BD-rate reduction compared with Predlift and 10.13% BD-rate reduction compared with RAHT. Subjective comparison results present that visual artifacts such as color shifting, blurring, and quantization noise are reduced.
Collapse
|
7
|
|
8
|
Zhang M, Chen Y, Pan Y, Zeng Z. A Fast Image Deformity Correction Algorithm for Underwater Turbulent Image Distortion. SENSORS (BASEL, SWITZERLAND) 2019; 19:E3818. [PMID: 31487831 PMCID: PMC6766914 DOI: 10.3390/s19183818] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 08/28/2019] [Accepted: 09/02/2019] [Indexed: 06/10/2023]
Abstract
An algorithm correcting distortion based on estimating the pixel shift is proposed for the degradation caused by underwater turbulence. The distorted image is restored and reconstructed by reference frame selection and two-dimensional pixel registration. A support vector machine-based kernel correlation filtering algorithm is proposed and applied to improve the speed and efficiency of the correction algorithm. In order to validate the algorithm, laboratory experiments on a controlled simulation system of turbulent water and field experiments in rivers and oceans are carried out, and the experimental results are compared with traditional, theoretical model-based and particle image velocimetry-based restoration and reconstruction algorithms. Using subjective visual evaluation, image distortion has been effectively suppressed; based on an objective performance statistical analysis, the measured values are better than the traditional and formerly studied restoration and reconstruction algorithms. The method proposed in this paper is also much faster than the other algorithms. It can be concluded that the proposed algorithm can effectively improve the de-distortion effect of the underwater turbulence degraded image, and provide potential techniques for the accurate operation of underwater target detection in real time.
Collapse
Affiliation(s)
- Min Zhang
- School of Computer Science and Information Engineering, Hubei University, Wuhan 430062, China.
| | - Yuzhang Chen
- School of Computer Science and Information Engineering, Hubei University, Wuhan 430062, China.
| | - Yongcai Pan
- School of Computer Science and Information Engineering, Hubei University, Wuhan 430062, China.
| | - Zhangfan Zeng
- School of Computer Science and Information Engineering, Hubei University, Wuhan 430062, China.
| |
Collapse
|
9
|
Liu H, Zhang Y, Zhang H, Fan C, Kwong S, Kuo CCJ, Fan X. Deep Learning based Picture-Wise Just Noticeable Distortion Prediction Model for Image Compression. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:641-656. [PMID: 31425033 DOI: 10.1109/tip.2019.2933743] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Picture Wise Just Noticeable Difference (PW-JND), which accounts for the minimum difference of a picture that human visual system can perceive, can be widely used in perception-oriented image and video processing. However, the conventional Just Noticeable Difference (JND) models calculate the JND threshold for each pixel or sub-band separately, which may not reflect the total masking effect of a picture accurately. In this paper, we propose a deep learning based PW-JND prediction model for image compression. Firstly, we formulate the task of predicting PW-JND as a multi-class classification problem, and propose a framework to transform the multi-class classification problem to a binary classification problem solved by just one binary classifier. Secondly, we construct a deep learning based binary classifier named perceptually lossy/lossless predictor which can predict whether an image is perceptually lossy to another or not. Finally, we propose a sliding window based search strategy to predict PW-JND based on the prediction results of the perceptually lossy/lossless predictor. Experimental results show that the mean accuracy of the perceptually lossy/lossless predictor reaches 92%, and the absolute prediction error of the proposed PW-JND model is 0.79 dB on average, which shows the superiority of the proposed PW-JND model to the conventional JND models.
Collapse
|