1. Introduction
In spite of the progress achieved last years, Arabic Optical Character Recognition (AOCR) continues to pose a major challenge to researchers [2]. This is essentially attributed to its cursivness both in its printed form as well as handwritten form. Arabic script is rather semi-cursive in the sense that a word can be composed of one or some sub-words. Each sub-word is constituted of one or several characters normally connected with a baseline. In some fonts, certain characters can be combined vertically to form a ligature. Besides, the morphology of sub-words shows that Arabic script is consisted of two kinds of information associated respectively, with diacritics and tracings figure 1). Tracing corresponds to characters bodies. It presents variations in horizontal bands and could be naturally divided at least into 3 zones (Figure 1-a): one, omnipresent, associated with the middle band and two zones related to upper and lower extensions. As for diacritics, if existing, they occupy two logical bands situated on all sides of the tracing (Figure1-b). Otherwise, different approaches have been developed for the recognition of Arabic script [2], [6], in spite of that the problem remains still open [7]. Recent applications of Hidden Markov Models (HMM)) to Arabic writing recognition are encouraging and account for the progress in AOCR [6]. However, the 1D HMMs only offer a linear description, whereas the 2D HMMs would lead to a recognition algorithm of exponential complexity [11]. An intermediate solution, is the planar or pseudo 2D HMMs. A PHMM is a HMM whose emission probabilities are also modelled by HMM's [10]. The principal model is composed of super-states associated with emission models called secondary models. Thus, PHMMs take into account, of simplified manner, as well as the horizontal variations that vertical. Indeed, they offer the advantage to be treated as nested 1D models rather than truly 2D, avoiding so the insufficiency of a 1 D modeling and the complexity of a 2D processing. For these reasons, PHMMs seem to be well adapted to the complexities of Arabic script. Furthermore, PHMMs are currently becoming one of the promising models in optical character recognition. Introduced by Levin and al. for digit handwritten recognition [11], PHMMs were successfully used in degraded texts by Agazzi and al. [1]. Bippus [8] has applied PHMM for the recognition of German literal amounts. Saon [14] and Gilloux [10] have used planar architectures for the recognition of handwritten digits. Recently, the PHMM technique has been used for the automatic recognition of form with handwritten fields [13]. In this paper, we will show that PHMMs are well adapted to modeling Arabic script both in its printed form as well as handwritten form. Indeed, according to their definition, the PHMMs technique presents the opportunity to follow efficiently the natural variations in bands of the Arabic script.
Different information zone of the word “”: (a) tracing, (b) diacritics.