1 Introduction
Scene text detection (STD) and recognition from natural scene images are important research topics in computer vision [1]. Current text detection and recognition techniques have been deeply applied in many industries such as finance, insurance, medical care, transportation, education, etc. The scenarios involving pictures or videos include e-commerce text translation, user-made content review, content/advertising recommendation distribution, and so on. While these business scenarios need to process tens of billions of data every day, the number of requests for these algorithms is still increasing significantly.