Discovering Primary Objects in Videos by Saliency Fusion and Iterative Appearance Estimation | IEEE Journals & Magazine | IEEE Xplore

Discovering Primary Objects in Videos by Saliency Fusion and Iterative Appearance Estimation


Abstract:

In this paper, we propose a new method for detecting primary objects in unconstrained videos in a completely automatic setting. Here, we define the primary object in a vi...Show More

Abstract:

In this paper, we propose a new method for detecting primary objects in unconstrained videos in a completely automatic setting. Here, we define the primary object in a video as the object that presents saliently in most of the frames. Unlike previous works considering only local saliency detection or common pattern discovery, the proposed method integrates the local visual/motion saliency extracted from each frame, global appearance consistency throughout the video, and spatiotemporal smoothness constraint on object trajectories. We first identify a temporal coherent salient region throughout the whole video, and then explicitly learn a global appearance model to distinguish the primary object against the background. In order to obtain high-quality saliency estimations from both appearance and motion cues, we propose a novel self-adaptive saliency map fusion method by learning the reliability of saliency maps from labeled data. As a whole, our method can robustly localize and track primary objects in diverse video content, and handle the challenges such as fast object and camera motion, large scale and appearance variation, background clutter, and pose deformation. Moreover, compared with some existing approaches that assume the object is present in all the frames, our approach can naturally handle the case where the object is present only in part of the frames, e.g., the object enters the scene in the middle of the video or leaves the scene before the video ends. We also propose a new video data set containing 51 videos for primary object detection with per-frame ground-truth labeling. Quantitative experiments on several challenging video data sets demonstrate the superiority of our method compared with the recent state of the arts.
Page(s): 1070 - 1083
Date of Publication: 14 May 2015

ISSN Information:

Funding Agency:

Citations are not available for this document.

I. Introduction

With the prevalence of online social video sharing, considerable amounts of videos are being created and processed every day. In many of those videos, there exists a primary object that we want to focus our attention on, e.g., a child or a pet in a homemade personal video. We define the primary object in a video sequence as the object that presents saliently in most of the frames, and some examples are shown in Fig. 1. In this paper, we address the problem of automatically discovering the primary objects in videos, which is an essential step for many applications such as advertisement design [36] and video summarization [20], [28], [44]. Traditional video object detection and localization methods, however, are either too category specific (e.g., face [47] and pedestrian detection [13]) or heavily rely on manual initialization (e.g., object tracking [19] and interactive object segmentation [18]). They are suitable for targeted object detection that is tailored to users’ interests, but are too limited for many multimedia applications that require automatically processing large volumes of video data with diverse content. Throughout this paper, we will also use the term foreground object or simply foreground interchangeably with the term primary object.

Examples of primary object discovery. Each row corresponds to one video, and the red rectangle highlights the primary object.

Cites in Papers - |

Cites in Papers - IEEE (18)

Select All
1.
Jingchun Cheng, Yuhui Yuan, Yali Li, Jingdong Wang, Shengjin Wang, "Learning to Segment Video Object With Accurate Boundaries", IEEE Transactions on Multimedia, vol.23, pp.3112-3123, 2021.
2.
Jiangyue Xia, Jingqi Tian, Jiankai Xing, Jiawen Cheng, Jun Zhang, Jiangtao Wen, Zhengguang Li, Jian Lou, "Social Data Assisted Multi-Modal Video Analysis For Saliency Detection", ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.2278-2282, 2020.
3.
Zhigang Jin, Jingkun Li, Dong Li, "Co-Saliency Detection for RGBD Images Based on Effective Propagation Mechanism", IEEE Access, vol.7, pp.141311-141318, 2019.
4.
Shan An, Si Liu, Zhibiao Huang, Guangfu Che, Qian Bao, Zhaoqi Zhu, Yu Chen, Dennis Z. Weng, "RotateView: A Video Composition System for Interactive Product Display", IEEE Transactions on Multimedia, vol.21, no.12, pp.3095-3105, 2019.
5.
Runmin Cong, Jianjun Lei, Huazhu Fu, Ming-Ming Cheng, Weisi Lin, Qingming Huang, "Review of Visual Saliency Detection With Comprehensive Information", IEEE Transactions on Circuits and Systems for Video Technology, vol.29, no.10, pp.2941-2959, 2019.
6.
Zhigang Tu, Wei Xie, Justin Dauwels, Baoxin Li, Junsong Yuan, "Semantic Cues Enhanced Multimodality Multistream CNN for Action Recognition", IEEE Transactions on Circuits and Systems for Video Technology, vol.29, no.5, pp.1423-1437, 2019.
7.
Yuxin Peng, Yunzhen Zhao, Junchao Zhang, "Two-Stream Collaborative Learning With Spatial-Temporal Attention for Video Classification", IEEE Transactions on Circuits and Systems for Video Technology, vol.29, no.3, pp.773-786, 2019.
8.
Koteswar Rao Jerripothula, Jianfei Cai, Junsong Yuan, "Efficient Video Object Co-Localization With Co-Saliency Activated Tracklets", IEEE Transactions on Circuits and Systems for Video Technology, vol.29, no.3, pp.744-755, 2019.
9.
Andrea Manno-Kovacs, "Direction Selective Contour Detection for Salient Objects", IEEE Transactions on Circuits and Systems for Video Technology, vol.29, no.2, pp.375-389, 2019.
10.
Wenguan Wang, Jianbing Shen, Hanqiu Sun, Ling Shao, "Video Co-Saliency Guided Co-Segmentation", IEEE Transactions on Circuits and Systems for Video Technology, vol.28, no.8, pp.1727-1736, 2018.
11.
Yeong Jun Koh, Chang-Su Kim, "Unsupervised Primary Object Discovery in Videos Based on Evolutionary Primary Object Modeling With Reliable Object Proposals", IEEE Transactions on Image Processing, vol.26, no.11, pp.5203-5216, 2017.
12.
Yeong Jun Koh, Chang-Su Kim, "Primary Object Segmentation in Videos Based on Region Augmentation and Reduction", 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.7417-7425, 2017.
13.
Jiang Yang, Junsong Yuan, "Temporally enhanced image object proposals for videos", 2017 IEEE International Conference on Multimedia and Expo (ICME), pp.445-450, 2017.
14.
Le Wang, Gang Hua, Rahul Sukthankar, Jianru Xue, Zhenxing Niu, Nanning Zheng, "Video Object Discovery and Co-Segmentation with Extremely Weak Supervision", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, no.10, pp.2074-2088, 2017.
15.
Yuan Yuan, Dong Wang, Qi Wang, "Anomaly Detection in Traffic Scenes via Spatial-Aware Motion Reconstruction", IEEE Transactions on Intelligent Transportation Systems, vol.18, no.5, pp.1198-1209, 2017.
16.
Jiong Yang, Brian Price, Xiaohui Shen, Zhe Lin, Junsong Yuan, "Fast Appearance Modeling for Automatic Primary Video Object Segmentation", IEEE Transactions on Image Processing, vol.25, no.2, pp.503-515, 2016.
17.
Gangqiang Zhao, Junsong Yuan, Gang Hua, Jiong Yang, "Topical video object discovery from key frames by modeling word co-occurrence prior", IEEE Transactions on Image Processing, vol.24, no.12, pp.5739-5752, 2015.
18.
Cewu Lu, Renjie Liao, Jiaya Jia, "Personal object discovery in first-person videos", IEEE Transactions on Image Processing, vol.24, no.12, pp.5789-5799, 2015.

Cites in Papers - Other Publishers (9)

1.
Zitao Gao, Xiangjian Liu, Anna K. Wang, Liyu Lin, "A simulated two-stream network via multilevel distillation of reviewed features and decoupled logits for video action recognition", The Visual Computer, 2024.
2.
Md. Yousuf Ali , Bin Jiang , Oindrila Chowdhury , Md. Harun‐Ar‐Rashid , M. Shamim Hossain , Khalid AlMutib , " Cosine modulated filter bank‐based architecture for extracting and fusing saliency features ", Expert Systems , 2023 .
3.
John Philip Bhimavarapu, Sriharsha Ramaraju, Dimmita Nagajyothi, Inumula Veeraraghava Rao, "Convolutional neural network based object detection system for video surveillance application", Concurrency and Computation: Practice and Experience, 2022.
4.
M. Indirani, S. Shankar, "Spatiotemporal Particle Swarm Optimization with Incremental Deep Learning-Based Salient Multiple Object Detection", Inventive Computation and Information Technologies, vol.173, pp.831, 2021.
5.
Souad Chaabouni, Jenny Benois-Pineau, Chokri Ben Amar, "ChaboNet : Design of a deep CNN for prediction of visual saliency in natural video", Journal of Visual Communication and Image Representation, vol.60, pp.79, 2019.
6.
Yeong Jun Koh, Young-Yoon Lee, Chang-Su Kim, "Sequential Clique Optimization for Video Object Segmentation", Computer Vision ? ECCV 2018, vol.11218, pp.537, 2018.
7.
Jiong Yang, Junsong Yuan, "Temporally enhanced image object proposals for online video object and action detections", Journal of Visual Communication and Image Representation, vol.53, pp.245, 2018.
8.
Dingwen Zhang, Huazhu Fu, Junwei Han, Ali Borji, Xuelong Li, "A Review of Co-Saliency Detection Algorithms", ACM Transactions on Intelligent Systems and Technology, vol.9, no.4, pp.1, 2018.
9.
Zhigang Tu, Zuwei Guo, Wei Xie, Mengjia Yan, Remco C. Veltkamp, Baoxin Li, Junsong Yuan, "Fusing Disparate Object Signatures for Salient Object Detection in Video", Pattern Recognition, 2017.
Contact IEEE to Subscribe

References

References is not available for this document.