I. Introduction
Detecting walkable areas using monocular cameras is crucial for safe and efficient robot navigation. This problem has been extensively studied in computer vision, with various approaches such as obstacle detection [1], free space detection [2], and 3D reconstruction [3]–[5]. Many researchers have used deep learning models to predict pixel-level depth values from images [1], while others have developed vision-based algorithms that gradually expand into free space from the robot's feet [2]. Additionally, some studies have focused on reconstructing 3D geometric models using the robot's onboard camera view sequence [3]–[5]. However, most of the existing studies assume that the floor area is directly visible, which is not always the case in crowded scenarios as shown in Fig. 1. For instance, in crowded scenes like an office where humans and robots collaborate [6], obstacles (e.g., desks) often occlude the floor areas humans are standing on. In such situations, floor detection methods become ineffective, and neither obstacle detection nor 3D reconstruction methods can provide floor information.