Conferences >2022 IEEE International Confe...

Towards Robust Video Text Detection with Spatio-Temporal Attention Modeling and Text Cues Fusion

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Information carried by video text is of great value to various video applications. However, detecting text in videos of- ten faces great challenges due to the widely vari...Show More

Metadata

Abstract:

Information carried by video text is of great value to various video applications. However, detecting text in videos of- ten faces great challenges due to the widely varied appearance of text and the complicated, dynamic video context. In this paper, we propose a robust video text detection network that adaptively combines relevant text cues in multiple frames with spatio-temporal attention and fusion mechanisms, which effectively enhance the accuracy and robustness of video text detection compared to single-frame detection. The network first localizes text region proposals and propagates them across frames with an R-CNN based framework. Then, a Transformer-based cross-frame feature fusion model is employed to attentively select and combine relevant text features, yielding an enhanced representation of text region integrating complementary text cues for robust text candidate prediction. The network achieves competitive text detection performance on standard video text benchmarks, demonstrating the effectiveness of the proposed method.

Published in: 2022 IEEE International Conference on Multimedia and Expo (ICME)

Date of Conference: 18-22 July 2022

Date Added to IEEE Xplore: 26 August 2022

ISBN Information:

ISSN Information:

DOI: 10.1109/ICME52920.2022.9859582

Conference Location: Taipei, Taiwan

Contents

1. Introduction

Text appearing in the large numbers of videos created in people's daily lives contains a wealth of valuable semantic information for a variety of practical applications such as video surveillance, intelligent navigation, autonomous driving, etc. On the other hand, detecting text in a video is often challenging due to the varied shape, size, orientation, and style of the text and degradations like low contrast and blurring.

References is not available for this document.

Towards Robust Video Text Detection with Spatio-Temporal Attention Modeling and Text Cues Fusion

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Towards Robust Video Text Detection with Spatio-Temporal Attention Modeling and Text Cues Fusion

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References