I. Introduction
Cameras with stereo vision and structured lighting sensors have transformed mobile and indoor robotics. Using structured light, authors in [1] have generated high-accuracy stereo depth maps. Humans can easily carry out petty tasks but for robots to be able to interact and perform these, it needs to be completely aware of its surrounding. The concept behind human's 3D perception is retinal disparity [2]. It deals with how the distance between our eyes help us gain this vision. This also plays an important role when it comes to identifying, recognizing and interacting with objects around us. Thus having the depth information shall greatly aid robots in executing these tasks which are trivial for humans. Apart from this, a machine must also functionally perceive the scene around it. For instance, based on whether it's inside a house or a school or a corporate office, whether it's in the kitchen or the bedroom influences its interaction with the surrounding. However, due to the intra-class variability and confusing similitude automatic classification of indoor environments is a challenging problem. The Fig. 1 shows the similarity in classroom and computer lab. Though these are completely different locations, they are almost indistinguishable even to the human eyes. Fig. 2 shows the variations in same category of scene. i.e. staffroom. Both are computer rooms but from different academic institutions yet they differ significantly. In such extreme conditions, scene categorization shall still remain an open challenge and might not be solved only with machine learning and computer vision algorithms. Techniques such as geo tagging which was mentioned earlier may come in handy in such cases. Geotags along with recognized scenes may yield accurate results.
Similarity in classroom and computer lab
Variations in same category