Loading [MathJax]/extensions/MathZoom.js
Deep Learning in Latent Space for Video Prediction and Compression | IEEE Conference Publication | IEEE Xplore

Deep Learning in Latent Space for Video Prediction and Compression


Abstract:

Learning-based video compression has achieved substantial progress during recent years. The most influential approaches adopt deep neural networks (DNNs) to remove spatia...Show More

Abstract:

Learning-based video compression has achieved substantial progress during recent years. The most influential approaches adopt deep neural networks (DNNs) to remove spatial and temporal redundancies by finding the appropriate lower-dimensional representations of frames in the video. We propose a novel DNN based framework that predicts and compresses video sequences in the latent vector space. The proposed method first learns the efficient lower-dimensional latent space representation of each video frame and then performs inter-frame prediction in that latent domain. The proposed latent domain compression of individual frames is obtained by a deep autoencoder trained with a generative adversarial network (GAN). To exploit the temporal correlation within the video frame sequence, we employ a convolutional long short-term memory (ConvLSTM) network to predict the latent vector representation of the future frame. We demonstrate our method with two applications; video compression and abnormal event detection that share the identical latent frame prediction network. The proposed method exhibits superior or competitive performance compared to the state-of-the-art algorithms specifically designed for either video compression or anomaly detection.1
Date of Conference: 20-25 June 2021
Date Added to IEEE Xplore: 02 November 2021
ISBN Information:

ISSN Information:

Conference Location: Nashville, TN, USA
References is not available for this document.

1. Introduction

Video data transmission occupies the majority of the internet data traffic nowadays. With the trend of extensive mobile devices usage worldwide, video data streaming is extensively used for productivity tools and entertainment platforms that assist people's work and life in various aspects. On top of the ubiquitous video engagement, superior video quality standards such as 4k UHD, and VR 360 became more widely available, which makes high performance video compression even more critical. Traditional video coding standards such as MPEG, AVC/H.264 [49], HEVC/H.265 [43], and VP9 [38] have achieved impressive performance on video compression tasks. However, as their primary applications are human perception driven, those hand-crafted codecs are likely suboptimal for machine-related tasks such as deep learning based video analytic.

Select All
1.
Video Trace Library, [online] Available: http://trace.eas.asu.edu/index.html.
2.
A. Adam, E. Rivlin, I. Shimshoni and D. Reinitz, "Robust real-time unusual event detection using multiple fixed-location monitors", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 3, pp. 555-560, 2008.
3.
Eirikur Agustsson, David Minnen, Nick Johnston, Johannes Balle, Sung Jin Hwang and George Toderici, "Scale-space flow for end-to-end optimized video compression", Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
4.
Eirikur Agustsson, Michael Tschannen, Fabian Mentzer, Radu Timofte and Luc Van Gool, "Generative adversarial networks for extreme learned image compression", Proceedings of the IEEE International Conference on Computer Vision, pp. 221-231, 2019.
5.
Mohammad Babaeizadeh, Chelsea Finn, Dumitru Erhan, Roy H. Campbell and Sergey Levine, "Stochastic variational video prediction", International Conference on Learning Representations, 2018.
6.
Mohammad Haris Baig, Vladlen Koltun and Lorenzo Tor-resani, "Learning to inpaint for image compression" in Advances in Neural Information Processing Systems, Curran Associates, Inc, vol. 30, pp. 1246-1255, 2017.
7.
Johannes Balle, Valero Laparra and Eero P. Simoncelli, "End-to-end optimized image compression", 5th International Conference on Learning Representations ICLR 2017 Toulon France April 24-26 2017 Conference Track Proceedings, 2017, [online] Available: OpenReview.net.
8.
Johannes Balle, David Minnen, Saurabh Singh, Sung Jin Hwang and Nick Johnston, "Variational image compression with a scale hyperprior", International Conference on Learning Representations, 2018.
9.
Fabrice Bellard, BPG Image Fromat, [online] Available: https://bellard.org/bpg/.
10.
João Carreira and Andrew Zisserman, "Quo vadis action recognition? A new model and the kinetics dataset", CoRR, vol. abs/1705.07750, 2017.
11.
T. Chen, H. Liu, Q. Shen, T. Yue, X. Cao and Z. Ma, "Deepcoder: A deep neural network based video compression", 2017 IEEE Visual Communications and Image Processing (VCIP), pp. 1-4, 2017.
12.
Zhengxue Cheng, Heming Sun, Masaru Takeuchi and Jiro Katto, "Learning image and video compression through spatial-temporal energy compaction", The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
13.
Yong Shean Chong and Yong Haur Tay, "Abnormal event detection in videos using spatiotemporal autoencoder", CoRR, vol. abs/1701.01546, 2017.
14.
Emily Denton and Rob Fergus, "Stochastic video generation with a learned prior", Proceedings of the 35th International Conference on Machine Learning volume 80 of Proceedings of Machine Learning Research, pp. 1174-1183, 10-15 Jul 2018.
15.
Abdelaziz Djelouah, Joaquim Campos, Simone Schaub-Meyer and Christopher Schroers, "Neural inter-frame compression for video coding", The IEEE International Conference on Computer Vision (ICCV), October 2019.
16.
Chelsea Finn, Ian Goodfellow and Sergey Levine, "Unsupervised learning for physical interaction through video prediction" in Advances in Neural Information Processing Systems, Curran Associates, Inc, vol. 29, pp. 64-72, 2016.
17.
Adam Golinski, Reza Pourreza, Yang Yang, Guillaume Sautiere and Taco S. Cohen, "Feedback recurrent autoencoder for video compression", Proceedings of the Asian Conference on Computer Vision (ACCV), November 2020.
18.
Dong Gong, Lingqiao Liu, Vuong Le, Budhaditya Saha, Moussa Reda Mansour, Svetha Venkatesh, et al., "Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection", Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
19.
Amirhossein Habibian, Ties van Rozendaal, Jakub M. Tom-czak and Taco S. Cohen, "Video compression with rate-distortion autoencoders", The IEEE International Conference on Computer Vision (ICCV), October 2019.
20.
Mahmudul Hasan, Jonghyun Choi, Jan Neumann, Amit K. Roy-Chowdhury and Larry S. Davis, "Learning temporal regularity in video sequences", The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
21.
Zhihao Hu, Zhenghao Chen, Dong Xu, Guo Lu, Wanli Ouyang and Shuhang Gu, "Improving deep video compression by resolution-adaptive flow coding", Computer Vision – ECCV 2020, pp. 193-209, 2020.
22.
Radford M. Neal, Ian H. Witten and John G. Cleary, "Arithmatic coding for data compression", Communications of the ACM, vol. 30, pp. 520-540, 1987.
23.
Nick Johnston, Damien Vincent, David Minnen, Michele Covell, Saurabh Singh, Troy Chinen, et al., "Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks", The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
24.
Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle and Ole Winther, "Autoencoding beyond pixels using a learned similarity metric", Proceedings of The 33rd International Conference on Machine Learning volume 48 of Proceedings of Machine Learning Research, pp. 1558-1566, 20-22 Jun 2016.
25.
Alex X. Lee, Richard Zhang, Frederik Ebert, Pieter Abbeel, Chelsea Finn and Sergey Levine, "Stochastic adversarial video prediction", CoRR, vol. abs/1804.01523, 2018.
26.
B. Liu, A. Cao and H. Kim, "Unified signal compression using generative adversarial networks", ICASSP 2020 - 2020 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 3177-3181, 2020.
27.
Wen Liu, Weixin Luo, Dongze Lian and Shenghua Gao, "Future frame prediction for anomaly detection - A new baseline", 2018 IEEE Conference on Computer Vision and Pattern Recognition CVPR 2018 Salt Lake City UT USA June 18-22 2018, pp. 6536-6545, 2018.
28.
Salvator Lombardo, Jun Han, Christopher Schroers and Stephan Mandt, "Deep generative video compression" in Advances in Neural Information Processing Systems, Curran Associates, Inc, vol. 32, pp. 9287-9298, 2019.
29.
C. Lu, J. Shi and J. Jia, "Abnormal event detection at 150 fps in matlab", 2013 IEEE International Conference on Computer Vision, pp. 2720-2727, 2013.
30.
Guo Lu, Chunlei Cai, Xiaoyun Zhang, Li Chen, Wanli Ouyang, Dong Xu, et al., "Content adaptive and error propagation aware deep video compression", Computer Vision - ECCV 2020 - 16th European Conference Glasgow UK August 23-28 2020 Proceedings Part II volume 12347 of Lecture Notes in Computer Science, pp. 456-472, 2020.
Contact IEEE to Subscribe

References

References is not available for this document.