I. Introduction
Autonomous robot navigation is a fundamental problem that has received considerable research attention [1]. The basic motion planning problem consists of generating robot trajectories that reach a known desired goal region starting from an initial configuration while avoiding known obstacles. More recently, a new class of planning approaches has been developed that can handle a richer class of tasks than the classical point-to-point navigation and can capture temporal and Boolean specifications. Such tasks can be, e.g., sequencing or coverage [2], data gathering [3], or persistent surveillance [4] and can be modeled using formal languages, such as linear temporal logic (LTL) [5], [6]. Several control synthesis methods have been proposed to design correct-by-construction controllers that satisfy temporal logic specifications assuming robots with known dynamics that operate in known environments [7]–[15]. As a result, these methods cannot be applied to scenarios where the environment is initially unknown, and therefore, online replanning may be required as environmental maps are constructed, resulting in limited applicability. To mitigate this issue, learning-based approaches have also been proposed that consider robots with unknown dynamics operating in fully or partially unknown environments. These approaches learn policies that directly map on-board sensor readings to control commands in a trial-and-error fashion; see, e.g., [16]–[19]. However, learning-based approaches, in addition to being data inefficient, are specialized to the environment and the robotic platform they were trained on.