Deep Reference Frame for Versatile Video Coding with Structural Re-parameterization | IEEE Conference Publication | IEEE Xplore

Deep Reference Frame for Versatile Video Coding with Structural Re-parameterization


Abstract:

In video coding, inter-prediction leverages neigh-boring frames to reduce temporal redundancy. The quality of these reference frames is essential for effective inter-pred...Show More

Abstract:

In video coding, inter-prediction leverages neigh-boring frames to reduce temporal redundancy. The quality of these reference frames is essential for effective inter-prediction. Although many neural network-based methods have been proposed to improve the quality of reference frames, there is still room for the performance and efficiency trade-off. In this paper, we propose an interpolation diverse branch block (InterDBB) suitable for lightweight frame interpolation networks, which optimizes deep reference frame interpolation networks to improve performance without sacrificing speed and increasing complexity. Specifically, we propose a multi-branch structural reparameterization block without batch normalization. This straightforward yet effective modification ensures training stability and performance improvement. Moreover, we propose a parameterized motion estimation strategy based on different input resolution, to achieve a better trade-off between performance and computational complexity. Experimental results demonstrate that our method achieves -2.01%/-2.87%/-2.44% coding efficiency improvements for Y/U/V components under random access (RA) configuration compared to VTM-11.0_NNVC-5.0.
Date of Conference: 08-11 December 2024
Date Added to IEEE Xplore: 27 January 2025
ISBN Information:

ISSN Information:

Conference Location: Tokyo, Japan

Funding Agency:

References is not available for this document.

I. INTRODUCTION

With the development of multimedia technology, there’s a growing need for video storage and transmission. The Joint Video Experts Team (JVET) has introduced several video coding standards, with versatile video coding (VVC) being the latest [1]. A core part of VVC, inter-prediction, minimizes temporal redundancy by finding the best match for the current Coding Unit (CU) in reference frames, thereby reducing bitrate and enhancing reference frame reliability [2]. The rapid progress of deep learning has led to an increasing number of researchers integrating neural network based tools into existing coding frameworks [3]–[11], with studies exploring its application in inter prediction through bi-prediction [12], [13], fractional interpolation [14], [15], and reference frame interpolation [3], [16], [17]. Although these NN-based methods have improved inter-prediction performance, their high computational complexity has also resulted in longer coding times and higher memory usage, limiting their practical application. In order to reduce complexity, some lightweight models have been proposed, such as reducing input channels [18]–[20], and decreasing the number of layers [18]. However, these techniques inevitably result in performance loss. The Joint Video Experts Team (JVET) emphasizes the need for low-complexity NNVC methods [21]. Inter-frame prediction techniques are also an important direction in this regard. In recent years, researchers have continuously proposed new video frame synthesis methods, aiming to achieve higher performance and lower computational complexity. Jia et al. [3] developed a technique that produces interpolated frames with a closer resemblance to the current encoding frame, yielding substantial bitrate savings. Meng et al. [22] proposed a deep reference frame interpolation network that significantly reduces computational complexity, expanding its practical applicability. Considering the already low complexity of existing solutions, our aim is to enhance performance without adding any complexity to the current approach.

Select All
1.
B. Bross, J. Chen, J. Ohm, G. J. Sullivan and Y. Wang, "Developments in international video coding standardization after avc with an overview of versatile video coding (VVC)", Proc. IEEE, vol. 109, no. 9, pp. 1463-1493, 2021.
2.
W. Chien, L. Zhang, M. Winken, X. Li, R. Liao, H. Gao, et al., "Motion vector coding and block merging in the versatile video coding standard", IEEE Trans. Circuits Syst. Video Technol, vol. 31, no. 10, pp. 3848-3861, 2021.
3.
J. Jia, D. Ding, W. Meng, Z. Chen, Z. Liu, X. Xu, et al., "EE1-2.1-related: DRF model without QP input", JVET meeting, 2023.
4.
X. Cheng and Z. Chen, "Video frame interpolation via deformable separable convolution", AAAI Conference on Artificial Intelligence, pp. 10 607-10 614, 2020.
5.
H. Choi and I. V. Bajic, "Deep frame prediction for video coding", IEEE Trans. Circuits Syst. Video Technol, vol. 30, no. 7, pp. 1843-1855, 2020.
6.
J. Dong, K. Ota and M. Dong, "Video frame interpolation: A comprehensive survey", ACM Trans. Multim. Comput. Commun. Appl, vol. 19, no. 2s, pp. 78:1-78:31, 2023.
7.
C. Wu, N. Singhal and P. Krähenbühl, "Video compression through image interpolation", European Conference on Computer Vision, pp. 425-440, 2018.
8.
R. Chang, L. Wang, X. Xu and S. Liu, "EE1-1.5: Optimization for complexity-performance trade-off of HOP network", JVET meeting, 2023.
9.
D. Rusanovskyy, Y. Li and M. Karczewicz, "EE1-1.2 complexity-performance tradeoff of decomposition", JVET meeting, 2023.
10.
Y. Li, D. Rusanovskyy and M. Karczewicz, "EE1-4.4: Low complexity NN filter with design elements of unified filter architecture and EE1-1.2 and EE1-1.3", JVET meeting, 2023.
11.
F. Galpin, S. Eadie, D. Rusanovskyy, Y. Li, J. Li, L. Wang, et al., "AHG11: EE1-0 high operation point model", JVET meeting, 2023.
12.
T. Zhao, W. Feng, H. Zeng, Y. Xu, Y. Niu and J. Liu, "Learning-based video coding with joint deep compression and enhancement", ACM International Conference on Multimedia, pp. 3045-3054, 2022.
13.
Z. Zhao, S. Wang, S. Wang, X. Zhang, S. Ma and J. Yang, "Enhanced bi-prediction with convolutional neural network for high-efficiency video coding", IEEE Trans. Circuits Syst. Video Technol, vol. 29, no. 11, pp. 3291-3301, 2019.
14.
H. Azgin, E. Kalali and I. Hamzaoglu, "An approximate versatile video coding fractional interpolation hardware", IEEE International Conference on Consumer Electronics, pp. 1-4, 2020.
15.
C. D. Pham and J. Zhou, "Deep learning-based luma and chroma fractional interpolation in video coding", IEEE Access, vol. 7, pp. 112 535-112 543, 2019.
16.
L. Zhao, S. Wang, X. Zhang, S. Wang, S. Ma and W. Gao, "Enhanced motion-compensated video coding with deep virtual reference frame generation", IEEE Trans. Image Process, vol. 28, no. 10, pp. 4832-4844, 2019.
17.
W. Bao, W. Meng, J. Jia, Y. Zhang, H. Wang, Z. Chen, et al., "EE1-5.1: Deep reference frame generation for inter prediction enhancement", JVET meeting, 2023.
18.
A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, et al., "Mobilenets: Efficient convolutional neural networks for mobile vision applications", 2017.
19.
M. Sandler, A. G. Howard, M. Zhu, A. Zhmoginov and L. Chen, "Mo-bilenetv2: Inverted residuals and linear bottlenecks", IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510-4520, 2018.
20.
A. Howard, R. Pang, H. Adam, Q. V. Le, M. Sandler, B. Chen, et al., "Searching for mobilenetv3", IEEE International Conference on Computer Vision, pp. 1314-1324, 2019.
21.
Z. Wang and F. Li, "Convolutional neural network based low complexity HEVC intra encoder", Multim. Tools Appl, vol. 80, no. 2, pp. 2441-2460, 2021.
22.
W. Meng, Y. Zhang, J. Jia, S. Chao and Z. Chen, "Towards lightweight deep reference frame for versatile video coding", IEEE International Conference on Visual Communications and Image Processing, pp. 1-5, 2023.
23.
L. Kong, B. Jiang, D. Luo, W. Chu, X. Huang, Y. Tai, et al., "Ifrnet: Intermediate feature refine network for efficient frame interpolation", IEEE Conference on Computer Vision and Pattern Recognition, pp. 1959-1968, 2022.
24.
X. Ding, X. Zhang, N. Ma, J. Han, G. Ding and J. Sun, "Repvgg: Making vgg-style convnets great again", IEEE Conference on Computer Vision and Pattern Recognition, pp. 13 733-13 742, 2021.
25.
X. Ding, X. Zhang, J. Han and G. Ding, "Diverse branch block: Building a convolution as an inception-like unit", IEEE Conference on Computer Vision and Pattern Recognition, pp. 10 886-10 895, 2021.
26.
J. Jia, Y. Zhang, H. Zhu, Z. Chen, Z. Liu, X. Xu, et al., "Deep reference frame generation method for VVC inter prediction enhancement", IEEE Trans. Circuits Syst. Video Technol, vol. 34, no. 5, pp. 3111-3124, 2024.
27.
T. Xue, B. Chen, J. Wu, D. Wei and W. T. Freeman, "Video enhancement with task-oriented flow", Int. J. Comput. Vis, vol. 127, no. 8, pp. 1106-1125, 2019.
28.
D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization", International Conference on Learning Representations, 2015.
29.
E. Alshina, R.-L. Liao, S. Liu and A. Segall, "JVET common test conditions and evaluation procedures for neural network-based video coding technology", JVET meeting, 2023.
30.
G. Zhang, C. Liu, Y. Cui, X. Zhao, K. Ma and L. Wang, "Vfimamba: Video frame interpolation with state space models", 2024.
Contact IEEE to Subscribe

References

References is not available for this document.