1. Introduction
Semantic amodal instance level video object segmentation (SAIL-VOS), i.e., semantically segmenting individual objects in videos even under occlusion, is an important problem for sophisticated occlusion reasoning, depth ordering, and object size prediction. Particularly the temporal sequence provided by a densely and semantically labeled video dataset is increasingly important since it enables assessment of temporal reasoning and evaluation of methods which anticipate the behavior of objects and humans.