Journals & Magazines >IEEE Robotics and Automation ... >Volume: 9 Issue: 2

PoSE: Suppressing Perceptual Noise in Embodied Agents for Enhanced Semantic Navigation

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Embodied agents navigating unknown environments face the challenge of optimizing exploration based on semantic information. Conventional methods, reliant on collected dat...Show More

Metadata

Abstract:

Embodied agents navigating unknown environments face the challenge of optimizing exploration based on semantic information. Conventional methods, reliant on collected data or pre-defined rules, have limitations in scalability and applicability, while pretrained language models based methods focusing on textual modality encounter perceptual noise, which affects decision-making. To mitigate these problems, this paper presents Prompt-based Vision Context Semantic Exploration (PoSE), an innovative method that leverages prior knowledge from vision-language models (VLMs) to suppress perceptual noise. Through its unique design of prompts that use existential logic, PoSE reduces misidentifications of target objects within the observed environment. It also introduces a unique exploration map to translate target locations into exploration coordinates. The proposed methodology is evaluated on the ALFRED benchmark, demonstrating improved performance against previous rule-based and task-specific data-driven exploration policies. Furthermore, PoSE's semantic exploration policy offers enhancements over pretrained language-model based exploration methods that focus on text modality, showcasing its effectiveness and generality.

Published in: IEEE Robotics and Automation Letters ( Volume: 9, Issue: 2, February 2024)

Page(s): 963 - 970

Date of Publication: 25 October 2023

ISSN Information:

DOI: 10.1109/LRA.2023.3327672

Funding Agency:

Contents

I. Introduction

The ability to explore and navigate in unknown environments based on semantic information is a fundamental skill for embodied agents, which are robotic or virtual entities that interact with their surroundings. Embodied agents have various applications in fields such as robotics, video games, and virtual assistants, where they need to operate autonomously in complex and dynamic environments. As shown in Fig. 1, an important task is to explore the environment and locate a target object based on the reasoning context, such as the visual observation and the name of the target object. Prior knowledge about the environment plays a crucial role in optimizing the effectiveness of the exploration process. For instance, this prior knowledge could encompass typical locations where a butter knife is commonly found. As a result, the agent can concentrate its search efforts in areas with a higher probability of locating a butter knife, effectively reducing the overall search space for exploration. Most existing methods obtaining prior knowledge primarily rely on collected data [1], [2], [3], [4] or pre-defined rules [5], [6], [7]. However, these approaches suffer from limitations in terms of scalability and applicability to new, unseen environments. Collecting data-specific information can be expensive and time-consuming, while defining rules for each new task or environment is impractical.

References is not available for this document.

PoSE: Suppressing Perceptual Noise in Embodied Agents for Enhanced Semantic Navigation

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

PoSE: Suppressing Perceptual Noise in Embodied Agents for Enhanced Semantic Navigation

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

Authors

Figures

References

Keywords

Metrics

Supplemental Items

Footnotes

References