Journals & Magazines >IEEE Transactions on Pattern ... >Volume: 42 Issue: 8

Motion-Guided Cascaded Refinement Network for Video Object Segmentation

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In this work, we propose a motion-guided cascaded refinement network for video object segmentation. By assuming the foreground objects show different motion patterns from...Show More

Metadata

Abstract:

In this work, we propose a motion-guided cascaded refinement network for video object segmentation. By assuming the foreground objects show different motion patterns from the background, for each video frame we apply an active contour model on optical flow to coarsely segment the foreground. The proposed Cascaded Refinement Network (CRN) then takes as guidance the coarse segmentation to generate an accurate segmentation in full resolution. In this way, the motion information and the deep CNNs can complement each other well to accurately segment the foreground objects from video frames. To deal with multi-instance cases, we extend our method with a spatial-temporal instance embedding model that further segments the foreground regions into instances and propagates instance labels. We further introduce a single-channel residual attention module in CRN to incorporate the coarse segmentation map as attention, which makes the network effective and efficient in both training and testing. We perform experiments on popular benchmarks and the results show that our method achieves state-of-the-art performance with high time efficiency.

Published in: IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: 42, Issue: 8, 01 August 2020)

Page(s): 1957 - 1967

Date of Publication: 19 March 2019

ISSN Information:

PubMed ID: 30908256

DOI: 10.1109/TPAMI.2019.2906175

Contents

1 Introduction

Motion plays a key role in many state-of-the-art methods of Video Object Segmentation (VOS) [1], [2], [3], [4], as motion estimations like optical flow [5] or pixel trajectory [6] reveals the pixel-wise correspondence between frames and enables the propagation of instance labels. Moreover, the rich spatio-temporal structure in motion also provides information that is beneficial for segmenting moving objects. However, motion estimation itself is still a difficult task as it suffers from challenges like noise, blurring, deformation, and occlusions. Different from previous methods that mainly rely on motion, recent attempts based on deep CNNs [7], [8], [9] tackle the problem of VOS by appearance learning. Building upon the powerful learning ability and the large amounts of training data, deep CNNs have achieved very good performance in still image segmentation [10]. For the task of VOS, however the annotated training data is lacking and treating frames as still images loses the information hidden in motion. It has been shown in [7], [8] that after fine-tuning on the first frame, deep CNNs can “recognize” the target object with similar appearance from subsequent frames. However, relying only on “memorizing” the appearance of the target object in the first frame may suffer from several limitations. For example, the object's appearance may change over time, and objects in the background may show similar appearance to the target object. Although online adaptation [7] shows robustness for video frames’ temporal variations, repeatedly finetuning the model at each time step is very time-consuming and negatively affects the efficiency.

References is not available for this document.

Motion-Guided Cascaded Refinement Network for Video Object Segmentation

Abstract:

Metadata

Abstract:

ISSN Information:

1 Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Motion-Guided Cascaded Refinement Network for Video Object Segmentation

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1 Introduction

Authors

Figures

References

Citations

Keywords

Metrics

References