Loading [MathJax]/extensions/MathMenu.js
Motion-Vector-Driven Lightweight ROI Tracking for Real-Time Saliency-Guided Video Encoding | IEEE Conference Publication | IEEE Xplore

Motion-Vector-Driven Lightweight ROI Tracking for Real-Time Saliency-Guided Video Encoding


Abstract:

The huge computation burden of state-of-the-art video coding technologies can be mitigated with Region-of-Interest (ROI) techniques that limit the highest coding effort t...Show More

Abstract:

The huge computation burden of state-of-the-art video coding technologies can be mitigated with Region-of-Interest (ROI) techniques that limit the highest coding effort to salient regions. However, the complexity overhead of saliency detection can easily cancel out the speed gain of ROI coding. This work introduces a lightweight ROI tracking technique that can be used in place of compute-intensive ROI detection to guide a video encoder in inter coding. Low computational overhead is achieved by feeding motion vectors (MVs) of a video encoder back to our neural network that is trained for accurate estimation of ROI movement and size changes. The network training is carried out with our new dataset that is also released in this work to foster the development of head tracking techniques in applications like video conferencing. Our experimental results demonstrate substantial speedups with minimal accuracy tradeoffs over traditional salient object detection (SOD) methods. In scenarios, where a single ROI is tracked with a 64-frame detection interval, our solution obtains up to 50-fold speedup with accuracy of 87% and an average ROI center error of 16 pixels. These results confirm that our ROI tracking approach is a potential technique for low-cost and low-power streaming media applications.
Date of Conference: 26-30 August 2024
Date Added to IEEE Xplore: 23 October 2024
ISBN Information:

ISSN Information:

Conference Location: Lyon, France
References is not available for this document.

I. Introduction

The skyrocketing growth of visual data consumption by humans and machines has led to an unprecedented surge in global video traffic. This trend, coupled with the advent of high-quality immersive media applications, calls for more sophisticated video compression technologies that are able to overcome the constraints imposed by existing network and storage capacities. The latest video coding standards, like High Efficiency Video Coding (HEVC/H265) [1] and Versatile Video Coding (VVC/H266) [2], are in place to mitigate video bandwidth demands, but their computational requirements are cumbersome to reach without optimizations, particularly in real-time streaming media domain.

Select All
1.
G. J. Sullivan, J. R. Ohm, W. J. Han and T. Wiegand, "Overview of the high efficiency video coding (HEVC) standard", IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, pp. 1649-1668, Dec. 2012.
2.
B. Bross et al., "Overview of the versatile video coding (VVC) standard and its applications", IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 10, pp. 3736-3764, Oct. 2021.
3.
Y. Zhang, L. Zhu, G. Jiang, S. Kwong and C. C. Jay Kuo, "A survey on perceptually optimized video coding", ACM Comput. Surv., vol. 55, no. 12, pp. 1-37, Dec. 2023.
4.
J. Mannos and D. Sakrison, "The effects of a visual fidelity criterion of the encoding of images", IEEE Trans. Inf. Theory, vol. 20, no. 4, pp. 525-536, Jul. 1974.
5.
L. Itti, "Automatic foveation for video compression using a neurobiological model of visual attention", IEEE Trans. Image Process., vol. 13, no. 10, pp. 1304-1318, Oct. 2004.
6.
H. Zhou, Y. Lin, L. Yang, J. Lai and X. Xie, "Benchmarking deep models on salient object detection", Pattern Recognit., vol. 145, pp. 109951, Jan. 2024.
7.
L. Itti, C. Koch and E. Niebur, "A model of saliency-based visual attention for rapid scene analysis", IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 11, pp. 1254-1259, Nov. 1998.
8.
R. Achanta, S. Hemami, F. Estrada and S. Susstrunk, "Frequency-tuned salient region detection", Proc. IEEE Conf. Comput. Vision Pattern Recognit. Miami Florida USA, pp. 1597-1604, Jun. 2009.
9.
Q. Hou, M. M. Cheng, X. Hu, A. Borji, Z. Tu and P. H. S. Torr, "Deeply supervised salient object detection with short connections", IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 4, pp. 815-828, Apr. 2019.
10.
X. Qin, Z. Zhang, C. Huang, C. Gao, M. Dehghan and M. Jagersand, "BASNet: boundary-aware salient object detection", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Long Beach California USA, pp. 7471-7481, Jun. 2019.
11.
H. Wang, C. Chenglizhao, L. Linfeng and P. Chong, "Video saliency object detection with motion quality compensation" in Electron., vol. 12, no. 7, Mar. 2023.
12.
K. Ugur et al., "Motion compensated prediction and interpolation filter design in H.265/HEVC", IEEE J. Sel. Topics Signal Process., vol. 7, no. 6, pp. 946-956, Jul. 2013.
13.
W. Luo, J. Xing, A. Milan, X. Zhang, W. Liu and T.-K. Kim, "Multiple object tracking: a literature review", Artif. Intell., vol. 293, Apr. 2021.
14.
L. Favalli, A. Mecocci and F. Moschetti, "Object tracking for retrieval applications in MPEG-2", IEEE Trans. Circuits Syst. Video Technol., vol. 10, no. 3, pp. 427-432, Apr. 2000.
15.
R. D. Sutter, K. D. Wolf, S. Lerouge and R. V. de Walle, "Lightweight object tracking in compressed video streams demonstrated in region-of-interest coding", EURASIP J. Adv. Signal Process., vol. 2007, no. 97845, Jan. 2007.
16.
L. Bommes, X. Lin and J. Zhou, "MVmed: fast multi-object tracking in the compressed domain", Proc. IEEE Conf. Ind. Electron. Appl. Kristiansand Norway, pp. 1419-1424, Nov. 2020.
17.
Q. Liu, B. Liu, Y. Wu, W. Li and N. Yu, "Real-time online multi-object tracking in compressed domain", IEEE Access, vol. 7, pp. 76489-76499, Jun. 2019.
18.
R. C. Moura and E. M. Hemerly, "A spatiotemporal motion-vector filter for object tracking on compressed video", Proc. IEEE Int. Conf. Adv. Video Signal Based Surveillance Boston Massachusetts USA, pp. 427-434, Oct. 2010.
19.
W. Li and D. Powers, "Multiple object tracking using motion vectors from compressed video", Proc. Int. Conf. Digit. Image Comput.: Techn. Appl. Sydney New South Wales Australia, pp. 1-5, Nov. 2017.
20.
H. Wang, J. Shen, Z. Chen and J. Shen, "A fast object tracking approach based on the motion vector in a compressed domain", Int. J. Adv. Robot. Syst., vol. 10, no. 1, Jan. 2013.
21.
S. Jain and J. E. Gonzalez, "Fast semantic segmentation on video using block motion-based feature interpolation", Proc. Eur. Conf. Comput. Vis. Workshops, pp. 3-6, Sep. 2018.
22.
T. Ujiie, M. Hiromoto and T. Sato, "Interpolation-based object detection using motion vectors for embedded real-time tracking systems", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Salt Lake City Utah USA, pp. 616-624, Jun. 2018.
23.
S. Zhu and Z. Xu, "Spatiotemporal visual saliency guided perceptual high efficiency video coding with neural network", Neurocomputing, vol. 275, pp. 511-522, Jan. 2018.
24.
H. Hadizadeh and I. V. Bajić, "Saliency-aware video compression", IEEE Trans. Image Process., vol. 23, no. 1, pp. 19-33, Jan. 2014.
25.
S. Zhu, C. Liu and Z. Xu, "High-definition video compression system based on perception guidance of salient information of a convolutional neural network and HEVC compression domain", IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 7, pp. 1946-1959, Jul. 2020.
26.
M. Xu, X. Deng, S. Li and Z. Wang, "Region-of-interest based conversational HEVC coding with hierarchical perception model of face", IEEE J. Sel. Topics Signal Process., vol. 8, no. 3, pp. 475-489, Jun. 2014.
27.
L. Duan, J. Liu, W. Yang, T. Huang and W. Gao, "Video coding for machines: a paradigm of collaborative compression and intelligent analytics", IEEE Trans. Image Process., vol. 29, pp. 8680-8695, Aug. 2020.
28.
M. Cerf, P. Frady and C. Koch, "Faces and text attract gaze independent of the task: experimental data and computer model" in J. Vis., vol. 9, no. 12, pp. 1-15, Nov. 2009.
29.
X. Deng, M. Xu and Z. Wang, "A ROI-based bit allocation scheme for HEVC towards perceptual conversational video coding", Proc. Int. Conf. Adv. Comput. Intell. Hangzhou China, pp. 206-211, Oct. 2013.
30.
M. Xu, X. Deng, S. Li and Z. Wang, "Region-of-interest based conversational HEVC coding with hierarchical perception model of face", IEEE J. Sel. Topics Signal Process., vol. 8, no. 3, pp. 475-489, Jun. 2014.
Contact IEEE to Subscribe

References

References is not available for this document.