Journals & Magazines >IEEE Access >Volume: 11

Mobile Service Robot Path Planning Using Deep Reinforcement Learning

Proposed RL-based DQN Path Planner's Initial Learning, Beta-Decay Lifelong Learning and Execution Process Flow.

Abstract:

A mobile service robot operates in a constantly changing environment with other robots and humans. The service environment is usually vast and unknown, and the robot is e...Show More

Metadata

Abstract:

A mobile service robot operates in a constantly changing environment with other robots and humans. The service environment is usually vast and unknown, and the robot is expected to operate continuously for a long period. The environment can be dynamic, leading to the generation of new routes or the permanent blocking of old routes. The traditional path planner that relies on static maps will not suffice for a dynamic environment. This work is focused on developing a reinforcement learning-based path planner for a dynamic environment. The proposed system uses the deep Q-Learning algorithm to learn the initial paths using a topological map of the environment. In an environmental change, the proposed

$\pmb \beta$ -decay transfer learning algorithm trains the agent in the new environment. This algorithm uses experience vs. exploration vs. exploitation-based training depending on the similarity of the old and new environments. The system is implemented on the Robotic Operating System framework and tested using Turtlebot3 mobile robot in the Gazebo simulator. The experimental results show that the reinforcement learning system learns all the routes based on the initial topological map of different service environments with an accuracy of over 98%. A comparative analysis of the

$\pmb \beta$ -decay transfer learning and non-transfer learning agents is performed based on various evaluation metrics. The transfer learning agent converges twice faster than the non-transfer learning agent.

Proposed RL-based DQN Path Planner's Initial Learning, Beta-Decay Lifelong Learning and Execution Process Flow.

Published in: IEEE Access ( Volume: 11)

Page(s): 100083 - 100096

Date of Publication: 04 September 2023

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2023.3311519

Contents

SECTION I.

Introduction

The Mobile Robotics field is one of the most popular domains among robotics researchers. An essential mobile robot subsystem is a navigation system. A navigation system perceives the environment and enables the robot to navigate autonomously to perform autonomous tasks [1]. There is a vast range of applications in which mobile robots are being used, such as surveillance, automation in industries, museum guides, elderly care, hospital care, home assistance, servers in restaurants, and so on. Mobile robots are classified based on their form factor, such as wheeled, legged, flying, and so on [2]. Wheeled mobile robots, in particular, can be effective and simple to be used in flat indoor terrain.

A mobile service robot performs autonomous tasks in service environments like homes, restaurants, hospitals, etc. [3]. A service environment is unique, unlike other environments, based on a few features of the environment. To name a few, vast square footage, moving obstacles (robots and humans) [4], prone to environmental changes over some time, and augmenting (adding extra sensors) the environment to support the autonomous operation is not possible. A path planner is essential for an autonomous mobile robot to plan an efficient path given the source and destination.

A path planner is as good as its understanding of the environment. Initially, the main focus of the path-planning research community was to achieve optimal paths given different levels of environmental awareness. These works are categorized based on the environmental information available to the path planners [5]. One of the main concerns of a service robot’s path planner is environmental changes that will hinder its performance. A path planner uses the initial environment configuration to identify efficient paths, and this configuration will not be valid if the environment changes. Hence a path planner which can dynamically plan the path based on the latest configuration of the environment is necessary to tackle this problem. This type of path planner is classified as a dynamic path planner, which considers environmental changes throughout its lifetime.

A dynamic path planner continuously monitors environmental changes and updates itself to provide efficient paths. Using machine learning (ML) techniques for dynamic path planning has recently gained more traction among researchers. In particular, Reinforcement Learning (RL) [6] is a type of machine-learning technique that uses the Markov Decision Process (MDP) to learn to interact in an environment. The main reason to use RL for path planning is it can be modeled as an MDP intuitively, and the fact that it does not need massive labeled data for training. In most of the work in RL-based path planning, the agent’s state space is considered as images or sensory information. While this is a reasonable approach to defining the state space on an RL agent, the shortcoming is the bigger state space. The bigger the state space, the more time to converge in the learning phase.

Lifelong learning [7] is a factor that is considered to keep any artificial intelligent agent learning and evolving continuously based on the various changes over a long period. Transfer Learning(TL) is an ML technique that uses previous experience to learn and perform well in a similar situation. A TL algorithm can efficiently make the RL agent dynamic with lifelong learning ability. Given the variations in the environment configuration, the transfer learning approach can be used to achieve lifelong learning. In this context, the source states are the old environment’s topological information, and the target states are the new altered environment’s topological information. There will be differences in the state space, and all the other RL-agent parameters will remain the same. TL approaches considering only the state difference between the source task, and target task is minimal. So a novel transfer learning technique that considers only state space difference is proposed in this article.

This work aims to design and develop a Deep Reinforcement Learning (DRL) based path planning framework. The proposed framework uses a Deep Q-Learning algorithm to learn the initial paths based on the topological map of the environment. A novel $\pmb \beta $ -decay TL algorithm is proposed to achieve lifelong learning. This algorithm uses an incremental learning approach to update the RL agent based on the changes in the environment dynamically for lifelong. The key contributions of the work are:

Design and development of a scalable DQL path planning framework for a mobile service robot.
Incorporating $\pmb \beta $ -decay TL algorithm to continuously evolve the RL agent based on environmental changes and achieve lifelong learning.
The RL agent’s scalability and the TL algorithm’s learning efficiency are tested and proved.

The remainder of the paper is presented as follows; a summary of the related work in path planning is described in section II. The problem formulation is defined in section III. Section IV describes the proposed RL and TL framework in detail. Results and analysis of various test cases are depicted in section V. The article is concluded, and future directions are identified in the last section.

SECTION II.

Recent and Relevant Work

The path planning problem can be categorized into two classes of problems, global and local path planning [8]. Global path planning is planning a path from a source to a destination, given the environment map. Local planning is planning the path if an unforeseen scenario (obstacle) occurs while traversing the global path. Rapid development in artificial intelligence and machine learning techniques has led to using it to solve path planning problems [8], [9], [10]. Artificial intelligence techniques like fuzzy logic [11], neural networks [12], and hybrid solutions like neuro-fuzzy inference systems [13] are used for local path planning as there are very efficient in dynamic problem-solving. Evolutionary techniques like particle swarm optimization [14] and genetic algorithms [15] are used for global path planning to optimize the path length and planning time.

One of the recent advancements in path planning research is using Reinforcement learning-based techniques. Unlike other machine learning algorithms, an RL agent can be trained with a minimal dataset. This is one primary reason to use RL in the path planning domain. Q-learning(QL) and Deep Q-Learning (DQL) are two of the most used RL algorithms for path planning. In QL, the Q-value is calculated based on the Bellman equation. In [16] and [17] QL agent is used for planning the path. The drawback in Q-learning is scalability; as the environment size increases, the memory complexity also increases. In DQL, the Q-value is predicted using a neural network, which makes the system scalable. In [18], [19], and [20], planning a path in a grid world environment using DQL is presented. The environment information regarding the obstacle location is available, and the RL agent learns to avoid all the static obstacles and plan the path efficiently. DQL-based algorithm for dynamic environment path planning is proposed in [21] and [22]. A dynamic environment is considered to have moving obstacles, and the obstacle location is unknown. Another approach to explore in an unknown environment is to use sensor values to train the RL agent [23], [24], [25].

One critical factor in any machine-learning algorithm is convergence time, i.e., the time the agent takes to learn the task. There have been considerable efforts to reduce the convergence time of DRL-based path-planning algorithms. In [26], convergence time is reduced by pre-training the RL agent using a 2D simulator environment and later using this experience to train in a real-time 3D environment. A Q-learning system with general information about the goal and current states is proposed in [27]. This knowledge is used efficiently while training to reduce the training time significantly. In article [28], the authors propose the use of RL and particle swarm optimization to increase the agent’s convergence rate.

Researchers have made numerous attempts to solve lifelong learning in various domains. In [29], a notion similar to lifelong learning called concept drift is described, and a solution is proposed to use previous experience to learn new similar concepts. Correspondence learning is yet another attempt to use old experience in a new task efficiently. In [30], correspondence learning uses previously learned skills to solve new tasks in an Atari ping-pong game environment. Transfer learning(TL) is a technique that is applied to evolve machine learning models toward lifelong learning. A TL algorithm transfers the source task knowledge to a target model. In [31], RL and TL tracking in stationary environments is proposed. Key points to be considered while applying transfer knowledge are the source task and target task similarity, mapping between source states and target states, and the type of knowledge to be transferred [32], [33]. Applying TL between the source and target tasks with different states and actions, the challenge lies in mapping the source states to target states. In articles [34] and [35], authors proposed a simple approach to provide the mappings as subject experts manually. While this is a simple solution, there can be practical difficulties in obtaining human experts mapping in all the applications. Inter-task mapping can be considered as required or learned automatically to address this shortcoming. If no explicit mapping is required, the agent attempts to learn the abstraction of the MPD, which are invariant even though actions and state variables change [36], [37]. Another approach is to learn inter-task mapping automatically [38], [39] using novel mapping learning methods. Applying heuristics in transfer learning is proven to have reasonable acceleration in target task learning. In article [40], heuristics is obtained by exploring the environment. Obtaining the heuristics as different cases based on source task learning and using it in target learning is proposed in articles [41], [42].

SECTION III.

Problem Definition

Consider a wheeled mobile robot performing autonomous tasks in an indoor service environment. To perform tasks autonomously, the robot should be capable of navigating from one location to another. A typical service robot is required to work in the environment for a long period, during which the environment configuration can change. Environment configuration is the location of all the static obstacles, doorways, pathways, etc. To cope-up with the environmental changes, the mobile robot should be able to perform lifelong learning. This will enable the robot to plan a path based on the current environment configuration rather than the old environment configuration. This can be achieved using Reinforcement learning.

The RL agent will be the mobile robot, and the environment will be the service environment. The proposed path planner is based on the Deep Q-learning algorithm. The DQL agent will be pre-trained based on the topological map of the service environment. A topological map is a high-level connectivity tree with all the critical locations in the environment. The topological map can be generated manually or automatically as proposed in [43]. The techniques to generate a topological map are out of the scope of this work. There are many techniques in the literature to obtain a topological map of an environment. A few techniques to generate a topological map are explored in our previous works [43], [44], [45]. Fig. 1 depicts a sample environment with paths and its topological map. The Home nodes refer to the charging point for the robot, the “Vx” nodes are waypoint nodes, and the “Dx” nodes are the destination locations. The map is represented in the form of a ternary tree, each node in the map is a location, and all the destination locations are the leaf nodes. Each node in the tree can have a maximum of three branches (left, right, forward). The direction to reach from the parent node to a child node is encoded as the branch of a node. For example, if a parent node has a left child node, then reaching that node is by taking the left direction from the parent node.

FIGURE 1.

Sample environment and its topological map.

MIT Libraries

MIT Libraries

Mobile Service Robot Path Planning Using Deep Reinforcement Learning

Alerts

Abstract:

Metadata

Abstract:

Introduction

Recent and Relevant Work

Problem Definition

Design of the Framework

A. Deep Reinforcement Learning Framework

1) States - S

2) Actions - A

3) Reward - R

4) Q-Value - Q(s, a)

5) DQN Architecture

6) Training

B. $\pmb \beta $ -Decay Lifelong Learning Algorithm

Algorithm 1 Experience Vs. Exploration Vs. Exploitation Pseducode

Results and Discussions

A. DQL Path Planner Pre-Training

1) Selection of Hyper-Parameters

2) House Layout Training

3) Hospital Layout Training

4) Generalized Parameters

5) Testing

B. Transfer Learning and Testing

Conclusion and Future Scope

References

IEEE Account

Purchase Details

Profile Information

Need Help?