I. Introduction
Scene text recognition (STR) is the task of reading text from natural scenes. The semantic property of text in images embodies additional information that is useful for many practical applications such as intelligent document processing, street signs reading, and product recognition [1]. Because of these important applications, the field has drawn a great amount of attention from researchers and practitioners resulting in the emergence of various robust reading competitions [2]–[6]. STR is challenging because the images are acquired in a generally unstructured and uncontrolled setting. This, in turn, introduces more complexity and variability than traditional optical character recognition (OCR) tasks [1], [7], [8].