Loading [MathJax]/extensions/MathMenu.js
SAIL-VOS: Semantic Amodal Instance Level Video Object Segmentation – A Synthetic Dataset and Baselines | IEEE Conference Publication | IEEE Xplore

SAIL-VOS: Semantic Amodal Instance Level Video Object Segmentation – A Synthetic Dataset and Baselines


Abstract:

We introduce SAIL-VOS (Semantic Amodal Instance Level Video Object Segmentation), a new dataset aiming to stimulate semantic amodal segmentation research. Humans can effo...Show More

Abstract:

We introduce SAIL-VOS (Semantic Amodal Instance Level Video Object Segmentation), a new dataset aiming to stimulate semantic amodal segmentation research. Humans can effortlessly recognize partially occluded objects and reliably estimate their spatial extent beyond the visible. However, few modern computer vision techniques are capable of reasoning about occluded parts of an object. This is partly due to the fact that very few image datasets and no video dataset exist which permit development of those methods. To address this issue, we present a synthetic dataset extracted from the photo-realistic game GTA-V. Each frame is accompanied with densely annotated, pixel-accurate visible and amodal segmentation masks with semantic labels. More than 1.8M objects are annotated resulting in 100 times more annotations than existing datasets. We demonstrate the challenges of the dataset by quantifying the performance of several baselines. Data and additional material is available at http://sailvos.web.illinois.edu.
Date of Conference: 15-20 June 2019
Date Added to IEEE Xplore: 09 January 2020
ISBN Information:

ISSN Information:

Conference Location: Long Beach, CA, USA
References is not available for this document.

1. Introduction

Semantic amodal instance level video object segmentation (SAIL-VOS), i.e., semantically segmenting individual objects in videos even under occlusion, is an important problem for sophisticated occlusion reasoning, depth ordering, and object size prediction. Particularly the temporal sequence provided by a densely and semantically labeled video dataset is increasingly important since it enables assessment of temporal reasoning and evaluation of methods which anticipate the behavior of objects and humans.

Select All
1.
MapInfoTool, [online] Available: https://github.com/CamxxCore/MapInfoTool/.
2.
H. A. Alhaija, S. K. Mustikovela, L. Mescheder, A. Geiger and C. Rother, "Augmented Reality Meets Deep Learning for Car Instance Segmentation in Urban Scenes", Proc. BMVC, 2017.
3.
M. Andriluka, S. Roth and B. Schiele, "People-Trackingby-Detection and People-Detection-by-Tracking", Proc. CVPR, 2008.
4.
P. Arbeláez, M. Maire, C. Fowlkes and J. Malik, "Contour Detection and Hierarchical Image Segmentation", PAMI, 2010.
5.
A. Arnab and P. H. S. Torr, "Pixelwise instance segmentation with a dynamically instantiated network", Proc. CVPR, 2017.
6.
M. Bai and R. Urtasun, "Deep watershed transform for instance segmentation", Proc. CVPR, 2017.
7.
S. Baker, D. Scharstein, J. P. Lewis, S. Roth, M. J. Black and R. Szeliski, "A database and evaluation methodology for optical flow", IJCV, 2011.
8.
O. Barinova, V. Lempitsky and P. Kohli, "On detection of multiple object instances using hough transforms", PAMI, 2012.
9.
J. L. Barron, D. J. Fleet and S. S. Beauchemin, "Performance of optical flow techniques", IJCV, 1994.
10.
A. Blade, "Script Hook V", [online] Available: http://www.dev-c.com/gtav/scripthookv/.
11.
S. Caelles, Y. Chen, J. Pont-Tuset and L. Van Gool, "Semantically-guided video object segmentation", arXiv preprint arXiv:1704.01926, 2017.
12.
S. Caelles, K.-K. Maninis, J. Pont-Tuset, L. Leal-Taixé, D. Cremers and L. Van Gool, "One-shot video object segmentation", Proc. CVPR, 2017.
13.
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. L. Yuille, "Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs", Proc. ICLR, 2015.
14.
L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff and H. Adam, "Encoder-decoder with atrous separable convolution for semantic image segmentation", Proc. ECCV, 2018.
15.
Y. Chen, J. Pont-Tuset, A. Montes and L. Van Gool, "Blazingly fast video object segmentation with pixel-wise metric learning", Proc. CVPR, 2018.
16.
Y.-T. Chen, X. Liu and M.-H. Yang, "Multi-instance object segmentation with occlusion handling", Proc. CVPR, 2015.
17.
J. Cheng, Y.-H. Tsai, W.-C. Hung, S. Wang and M.-H. Yang, "Fast and accurate online video object segmentation via tracking parts", Proc. CVPR, 2018.
18.
J. Cheng, Y.-H. Tsai, S. Wang and M.-H. Yang, "SegFlow: Joint learning for video object segmentation and optical flow", Proc. ICCV, 2017.
19.
D. Comaniciu and P. Meer, "Robust Analysis of Feature Spaces: Color Image Segmentation", Proc. CVPR, 1997.
20.
M. Cordts, M. Omran, S. Ramos, R. Rehfeld, M. Enzweiler, R. Benenson, et al., "The Cityscapes Dataset for Semantic Urban Scene Understanding", Proc. CVPR, 2016.
21.
J. Dai, K. He, Y. Li, S. Ren and J. Sun, "Instance-sensitive fully convolutional networks", Proc. ECCV, 2016.
22.
J. Dai, K. He and J. Sun, "Instance-aware semantic segmentation via multi-task network cascades", Proc. CVPR, 2016.
23.
S. Edelman and T. Poggio, "Integrating Visual Cues for Object Segmentation and Recognition", Optics News, 1989.
24.
K. Ehsani, R. Mottaghi and A. Farhadi, "SeGAN: Segmenting and Generating the Invisible", Proc. CVPR, 2018.
25.
M. Everingham, L. van Gool, C. K. I. Williams, J. Winn and A. Zisserman, "The PASCAL Visual Object Classes (VOC) Challenge", IJCV, 2010.
26.
M. Fabbri, F. Lanzi, S. Calderara, A. Palazzi, R. Vezzani and R. Cucchiara, "Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World", Proc. ECCV, 2018.
27.
A. Faktor and M. Irani, "Video segmentation by non-local consensus voting", BMVC, 2014.
28.
A. Fan, F. Zhong, D. Lischinski, D. Cohen-Or and B. Chen, "JumpCut: Non-Successive Mask Transfer and Interpolation for Video Cutout", Proc. SIGGRAPH, 2015.
29.
P. Follmann, T. Bottger, P. H¨ artinger, R. K¨ onig and and¨ M. Ulrich, "MVTec D2S: Densely Segmented Supermarket Dataset", Proc. ECCV, 2018.
30.
P. Follmann, R. Konig, P. H¨ artinger and M. Klostermann, "Learning to see the invisible: End-to-end trainable amodal instance segmentation", arXiv preprint arXiv:1804.08864, 2018.
Contact IEEE to Subscribe

References

References is not available for this document.