1. Introduction
Salient Object Detection (SOD) is an important computer vision problem that aims to identify and segment the most prominent object in a scene. It has found successful applications in a variety of tasks such as object recognition [59], image retrieval [38], [61], SLAM [37] and video analysis [25], [19], [14]. To tackle the innate challenges in addressing difficult scenes with low texture contrast or in the presence of cluttered backgrounds, depth information has been incorporated as a complementary input source. The growing interests in the development of RGB-D SOD methods [12], [42], [48] are especially boosted by the rapid progress and flourish of varied 3D imaging sensors [29], ranging from the traditional stereo imaging that produces disparity maps, to the more recent structured lighting [76], [30], time-of-flight, light field [63], [71], [72] and LIDAR cameras that directly generate depth images. As showcased by the recent cross-modality fusion schemes [7], [10], [44], adding depth-map on top of RGB image as an extra input leads to superior performance in localizing salient objects on challenging scenes.