1. Introduction
Text appearing in the large numbers of videos created in people's daily lives contains a wealth of valuable semantic information for a variety of practical applications such as video surveillance, intelligent navigation, autonomous driving, etc. On the other hand, detecting text in a video is often challenging due to the varied shape, size, orientation, and style of the text and degradations like low contrast and blurring.