Directional Neighborhood Topologies Based Multi-Scale Quinary Pattern for Texture Classification

SECTION I.

Introduction

The surface of objects and materials such as crops in a field, natural scenes, human faces, palmprint, human skin and many others have their own distinctive texture. The texture analysis, which permits to provide constructive information about the structural and spatial arrangement of surfaces, has been extensively studied in the past few years and received considerable attention in many various research subjects such as pattern recognition, computer vision and image processing. Texture classification is suitable for many potential applications for instance face classification and recognition, background subtraction, iris recognition, palmprint recognition [1], pedestrian detection, hyperspectral image classification [2], biomedical image analysis and image retrieval.

In the literature, there are many approaches available for texture analysis with excellent surveys given in [3]–[5]. One can cite human perception-based features [17], random features [15], filter-based techniques like Gaussian Markov random fields [13], Gabor [6] and wavelet [14], co-occurrence matrix-based approaches [7], ranklet transform based approaches [16], fractal analysis based approaches [20], Texton dictionary-based [18], etc. The method proposed in [9] uses Fourier descriptors to extract texture feature in spectrum domain. Even though these above approaches, which take advantage of the merits of spectral and statistical features, ensure improving the ability of texture representation and modeling, their demerits lie in the fact that they increase the computation cost in extracting features. Therefore, local feature extraction methods have been proposed and have additionally and impressively been employed in texture analysis field. The main advantages of the local hand-crafted descriptors are the simplicity in design and not dependent upon a large volume of training data [39].

Among the local feature extraction approaches, local binary pattern (LBP) developed by Ojala et al. [8], have brightened up as one of the most eminent texture descriptor and has gained more attention over the past decades. LBP as an effective texture descriptor is, in fact, highly appreciated by researchers due to its distinctive advantages including its simplicity, its well invariability to monotonic gray level changes and its suitability for real time applications due to its low computing cost. The LBP method, even if originally designed for texture modeling and classification, it showed considerable performance in a wide range of applications like medical and biomedical images analysis, motion detection, image retrieval, face and facial description and recognition, background subtraction, etc. Despite these merits, the basic LBP descriptor has some limitations [49]: (1) the output is very sensitive even if to small change in the input; (2) it is not invariant to image rotation; (3) based on the employed thresholding scheme, it is highly susceptible to noise interference; etc. In order to deal with these limitations and thus enhance the classification performance of LBP, a great number of improved LBP algorithms have been proposed in recent years [21]–[23], [56], [69]. The authors in [5] provided an exhaustive investigation and comprehensive experiments assessing the performance of a great number of old and recent state-of-the-art texture descriptors in face recognition problem. The authors in [54] proposed one dimensional local binary patterns (1DLBP) for stone porosity computing using a new neighborhood topology and structure. Shiv et al. [53] designed local wavelet pattern (LWP) descriptor for medical image retrieval. LWP performs local wavelet decomposition over local neighborhood of pixels to encode the relation among the neighboring pixel. The LWP pattern for the central pixel is computed by comparing its local wavelet decomposed value with those of the neighboring pixels. Chakraborty et al. [47] introduced the local quadruple pattern (LQPAT) descriptor for using in facial image retrieval and recognition. LQPAT computes two micro patterns from the local relationships by encoding relations amongst the neighbors in quadruple space. In another study in [51] presented the center symmetric quadruple pattern (CSQP) descriptor for facial image retrieval and recognition. As for LQPAT, CSQP encodes the facial asymmetry in quadruple space. CSQP computes an eight-bit pattern from 16 pixels in the local neighborhood. Issam et al. [57] proposed local directional ternary pattern (LDTP) for texture classification. LDTP, which exploits both LDP’s and LTP’s concepts, encodes both contrast information and directional pattern features in a compact way based on local derivative variations. Kas et al. [56] proposed mixed neighborhood topology cross decoded patterns (MNTCDP) for face recognition. MNTCDP adopts $5\times 5$ block which allows combining radii (2) and angles (4) compared to $3\times 3$ block supporting only angle variation.

The best position of hand-crafted descriptors has been dominated by LBP-like methods for more than a decade. The need to design an effective local texture descriptor with high discrimination capability is no longer to be demonstrated. Indeed, the development of methods based on local texture descriptors in pattern recognition continue to be designed still today, e.g., median local ternary patterns (MLTP) [50], averaged local binary patterns (ALBP) [58], repulsive-and-attractive local binary gradient contours (RALBGC) [52], local concave-and-convex micro-structure (LCCMSP) [49], attractive-and-repulsive center-symmetric local binary patterns (ARCS-LBP) [63], multi-direction local binary pattern (MDLBP) [37], improved local ternary patterns (ILTP) [55], quaternionic local angular binary pattern (QLABP) [61], selectively dominant local binary patterns (SDLBP) [62], chess pattern (Chess-pat) [59], multi level directional cross binary patterns (MLD-CBP) [60], synchronized rotation local ternary pattern (SRLTP) [38], oriented star sampling structure based multi-scale ternary pattern (O3S-MTP) [40], pattern of local gravitational force (PLGF) [39] and so on.

Even though LBP and its modifications and extensions achieve satisfactory performance, still an alternative technique to enhance the discriminative power in an image for effective texture modeling and representation is essential. In this paper, aiming at the further enhancement of texture classification performance and keeping the simplicity and effectiveness of the traditional LBP and addressing its feebleness, we design a conceptually simple and yet robust model of LBP, named directional neighborhood topologies based multi-scale quinary pattern (DNT-MQP). The designed texture operator has basically the following merits: As it will be shown further, DNT-MQP has low computational complexity and by offering highly-desirable features improves both the discriminative capability of LBP-like methods and their invariance to monotonic illumination changes as well as their robustness to small variations, due to image noise. The main contributions of this paper include the following:

An automatic mechanism for dynamic thresholds estimation for quinary pattern creation process is proposed.
A family of single scale descriptors named directional neighborhood topologies based single-scale quinary pattern (DNT-QP) is developed based on several directional neighborhood topologies (DNT) which are more effective for image texture understanding and analysis than large number of existing methods.
We further extend the obtained single scale DNT-QP descriptors to incorporate multiscale by concatenating them into a single vector feature to build the effective directional neighborhood topologies based multi-scale quinary pattern (DNT-MQP) descriptor which should be more robust and stable.
For performance evaluation, we restrict ourselves to texture classification as basic application of LBP. Extensive experiments on sixteen challenging publicly available texture database is performed.
We provide a fair and systematic comparison and found that the designed texture operator shows superior performance to 34 recent powerful state-of-the-art texture descriptors.

The rest of the paper is organized as follows. Section II briefly introduces some typical existing texture descriptors. Section III presents the designed DNT-MQP descriptor. Comprehensive experimental results and comparative evaluation are given in Section IV. Section V concludes the study and presents some solid future research directions.

SECTION II.

Review of the Existing Representative Texture Methods

In this part of study, we briefly introduce some of representative texture methods reported in the literature, our paper’s core idea, contributions and organization. Given that the designed local texture descriptor is based on local kernel functions, it is appropriate to define the spatial arrangement of the local structure used in this paper. In this work, the choice was focused on the use of the $3\times 3$ sized block given its easiness-of-implementation consideration and which is by far the most required neighborhood especially in real-time applications. The mathematical presentation of the set of gray-scale values of a $3\times 3$ grayscale image patch $\mathbf {I}_{m,n}^{3\times 3}$ around the central pixel $\mathbf {a}_{c}$ of coordinates (m,n) is given in Eq. 1. $\begin{align*} \mathbf {I}_{m,n}^{3\times 3}= \left [{ \begin{array}{ccr} \mathbf {a}_{3} &\quad \mathbf {a}_{2} &\quad \mathbf {a}_{1} \\ \mathbf {a}_{4} &\quad \mathbf {a}_{c} &\quad \mathbf {a}_{0} \\ \mathbf {a}_{5} &\quad \mathbf {a}_{6} &\quad \mathbf {a}_{7} \end{array} }\right]\tag{1}\end{align*}$ View Source $\mathbf {a}_{p}$ is the gray levels of the peripheral pixels (p $\in$ {0, 1,…, P-1}). P=8 is the number of local neighbors.

A. Traditional Local Binary Patterns

LBP method [8] characterizes the local spatial structure and the local contrast of each $3\times 3$ local region in the image by thresholding the intensity of surrounding pixels $\mathbf {a}_{p}; p\in \{0,1,\ldots,7\}$ with the intensity of the central pixel $\mathbf {a}_{c}$ . The value of $\mathbf {a}_{p}$ is turned into binary value 1; otherwise, it is turned into 0. A code of 8 bits is obtained which is transformed into decimal number to get the LBP code. Formally, for a $3\times 3$ local region, the definition of the kernel function of LBP operator is shown in the equation below: $\begin{equation*} f_{\text {LBP}}(\mathbf {I}_{m,n}^{3\times 3}) =\sum _{p=0}^{\mathbf {P}-1} \vartheta (\mathbf {a}_{p}- \mathbf {a}_{c}) \times 2^{p}\tag{2}\end{equation*}$ View Source where $\vartheta$ ( $\cdot$ ) is the Heaviside step function (cf. Eq. (3)). $\begin{align*} \vartheta (\mathbf {x})= \begin{cases} 1, & \text {if} \hspace {0.2cm} \mathbf {x}>=0 \\ 0, & \text {otherwise} \end{cases}\tag{3}\end{align*}$ View Source

Figure 1 shows the standard steps illustrating the process of the LBP feature extraction as well as the quantification of the difference between two texture images using LBP like features.

FIGURE 1.

The standard steps for quantification of the difference between two texture images using LBP like features.

Show All

B. Local Quinary Patterns (LQP)

In local quinary patterns (LQP) [17], the difference of the gray level value between the central pixel $\mathbf {a}_{c}$ and its neighboring pixels $\mathbf {a}_{p}$ is encoded according to five levels (i.e., 2, −1, 0, 1 and 2) calculated using two thresholds $\tau _{1}$ and $\tau _{2}$ . The LQP is thus closely relevant to the LTP [24], the only difference being that the number of coding levels is five in the LQP while in the LTP it is three. The quinary model is divided into four binary models according to the following rule: $\begin{align*} \varphi _{LQP}(\mathbf {a}_{p},\mathbf {a}_{c},\tau _{1},\tau _{2})=\begin{cases} +2& \text {$\mathbf {a}_{p} \geqslant \mathbf {a}_{c}+\tau _{2}$,}\\ +1& \text {$\mathbf {a}_{c}+\tau _{1} \leqslant \mathbf {a}_{p}< \mathbf {a}_{c}+\tau _{2}$,}\\ 0& \text {$\mathbf {a}_{c}-\tau _{1} \leqslant \mathbf {a}_{p}< \mathbf {a}_{c}+\tau _{1}$,}\\ -1& \text {$\mathbf {a}_{c}-\tau _{2} \leqslant \mathbf {a}_{p}< \mathbf {a}_{c}-\tau _{1}$,}\\ -2&\text {Otherwise,}\\ \end{cases} \\{}\tag{4}\end{align*}$ View Source where $\tau _{1}$ and $\tau _{2}$ are two user-specified parameters.

The LQP operator, by using the LTP’s concept, splits each quinary pattern into four parts: $LQP_{-2}$ , $LQP_{-1}$ , $LQP_{+1}$ and $LQP_{+2}$ which can be computed as follows(cf. Eqs. 12, 13, 14 and 15): $\begin{align*} LQP_{-2}(\mathbf {I}_{m,n}^{3\times 3})=&\sum _{p=0}^{7} \vartheta (\mathbf {a}_{c}-\mathbf {a}_{p}-\tau _{2}) \times 2^{p} \tag{5}\\ LQP_{-1}(\mathbf {I}_{m,n}^{3\times 3})=&\sum _{p=0}^{7} \biggl (\vartheta (\mathbf {a}_{p}\!-\!\mathbf {a}_{c}\!+\!\tau _{2}) \!\times \! \vartheta (\mathbf {a}_{c}-\mathbf {a}_{p}-\tau _{1}) \!\biggl) \times 2^{p} \\{}\tag{6}\\ LQP_{+1}(\mathbf {I}_{m,n}^{3\times 3})=&\sum _{p=0}^{7} \biggl (\! \vartheta (\mathbf {a}_{p}\!-\!\mathbf {a}_{c}\!-\!\tau _{1}) \!\times \! \vartheta (\mathbf {a}_{c}\!-\!\mathbf {a}_{p}+\tau _{2}) \biggl) \times 2^{p} \\{}\tag{7}\\ LQP_{+2}(\mathbf {I}_{m,n}^{3\times 3})=&\sum _{p=0}^{7} \vartheta (\mathbf {a}_{p}-\mathbf {a}_{c}-\tau _{2}) \times 2^{p}\tag{8}\end{align*}$ View Source

The four descriptor histograms generated by the four $LQP_{-2}$ , $LQP_{-1}$ , $LQP_{+1}$ and $LQP_{+2}$ operators are finally concatenated to form the final $\mathbf {h}_{LQP}$ future vector as illustrated in Eq. 9. LQP generates $4\times 2^{8}$ possible different patterns. $\begin{equation*} \mathbf {h}_{LQP}=\left \langle{ \mathbf {h}_{LQP_{-2}}, \mathbf {h}_{LQP_{-1}}, \mathbf {h}_{LQP_{+1}}, \mathbf {h}_{LQP_{+2}} }\right \rangle\tag{9}\end{equation*}$ View Source

C. Improved Local Quinary Patterns (ILQP)

Armi et al. [35] proposed the improved local quinary patterns (ILQP) to overcome some disadvantages of LQP and thus improve its performance. The definition of the local quinary pattern in ILQP is the same as in the LQP operator. By contrast, the thresholds $\tau _{1}$ and $\tau _{2}$ are dynamically defined. $\tau _{1}$ and $\tau _{2}$ are computed as: $\begin{equation*} \tau _{1}=median(|lmad-median(lmad)|);\tag{10}\end{equation*}$ View Source where $lmad=\{LocalMAD_{k}| k=1, 2, \ldots, M\times N\}$ , $LocalMAD=median(|G-median(G)|)$ and $G=\{\mathbf {a}_{p}| p=0; 1; \ldots; P-1\}$ $\begin{equation*} \tau _{2}= \frac {1}{M\times N} \sum _{i=1}^{N} \sum _{j=1}^{M} LSV_{i,j};\tag{11}\end{equation*}$ View Source where M and N are the size of input image and $LSV_{i,j}$ is the local significant value of the neighborhood with center pixel c with coordination (i,j), given by $LSV_{c}= \frac {1}{P} \sum _{p=0}^{P-1} (|\mathbf {a}_{c}-\mathbf {a}_{p}|)$ . The quinary pattern can be divided into the four local binary patterns as follows: $\begin{align*} ILQP_{-2}(\mathbf {I}_{m,n}^{3\times 3})=&\sum _{p=0}^{7} \vartheta (\mathbf {a}_{c}-\mathbf {a}_{p}-\tau _{2}) \times 2^{p} \tag{12}\\ ILQP_{-1,-2}(\mathbf {I}_{m,n}^{3\times 3})=&\sum _{p=0}^{7} \biggl [\biggl (\! \vartheta (\mathbf {a}_{p}\!-\!\mathbf {a}_{c}+\tau _{2}) \!\times \! \vartheta (\mathbf {a}_{c}\!-\!\mathbf {a}_{p}\!-\!\tau _{1}) \!\biggr) ~\\&\qquad \quad | ~\vartheta (\mathbf {a}_{c}-\mathbf {a}_{p}-\tau _{2}) \biggr] \times 2^{p} \tag{13}\\ ILQP_{+1,+2}(\mathbf {I}_{m,n}^{3\times 3})=&\sum _{p=0}^{7} \biggl [\biggl (\! \vartheta (\mathbf {a}_{c}\!-\!\mathbf {a}_{p}\!+\!\tau _{2})\times \vartheta (\mathbf {a}_{p}\!-\!\mathbf {a}_{c}\!-\!\tau _{1}) \!\biggr) ~\\&\qquad \quad | ~\vartheta (\mathbf {a}_{p}-\mathbf {a}_{c}-\tau _{2}) \biggr] \times 2^{p} \tag{14}\\ ILQP_{+2}(\mathbf {I}_{m,n}^{3\times 3})=&\sum _{p=0}^{7} \vartheta (\mathbf {a}_{p}-\mathbf {a}_{c}-\tau _{2}) \times 2^{p}\tag{15}\end{align*}$ View Source

Note that the vertical bar — and $\times$ symbol stand for ’OR’ and logical ’AND’ operations, respectively.

SECTION III.

Proposed Method

A. Directional Neighborhood Topologies Based Multi-Scale Quinary Patten

Considering the fact that a texture image is defined as the local spatial variations in pixel orientation and intensities, we design directional neighborhood topologies based multi-scale quinary patten (DNT-MQP) to describe the spatial variations in pixel intensities and orientation in a local neighborhood in the image. The essence of DNT-MQP is to perform local sampling and pattern encoding in the most informative directions contained within texture images. The construction process of the proposed DNT-MQP descriptor involves the following stages:

STAGE #1 (Neighborhood topology): DNT-MQP descriptor considers a unit distance radius since closest neighboring pixels maintains more discriminating information for local texture descriptors keeping thus low computational complexity. Thus, the whole $3\times 3$ grayscale image patch is adopted to design the DNT-MQP method which intends to explore the mutual information with respect to the adjacent neighbors. Consider the center pixel $\mathbf {a}_{c}$ and its 8 neighborhood pixels $\{\mathbf {a}_{0}$ , $\mathbf {a}_{1}$ ,… $\mathbf {a}_{7}\}$ . On the basis of the above consideration, directional neighborhood topologies adopted by DNT-MQP and which are expected to better describe salient local texture structure are conducted as shown in Figure 2. The neighbors of a reference pixel $\mathbf {a}_{\mathbf {k}}$ are categorized according to their angular position relative to that pixel. On the one side, from the first row of Figure 2, it can be inferred that the central pixel $\mathbf {a}_{c}$ is sampled each time with two pixels in each of four directions $\frac {(\mathbf {k} \pi)}{4}$ ; $\mathbf {k}\in$ {0,1,2,3}. We consider four sets $S_{\mathbf {k}; \mathbf {k}\in \{0,1,2,3\}}$ of directional center-symmetric neighboring pixels with respect to directions $\frac {(\mathbf {k} \pi)}{4}$ including the central pixel $\mathbf {a}_{c}$ . Mathematical definition of $S_{\mathbf {k}}$ is given by Eq. 16. $\begin{equation*} Direction \frac {(\mathbf {k} \pi)}{4}\;:S_{\mathbf {k}} =\{\mathbf {a}_{\mathbf {k}}, \mathbf {a}_{c}, \mathbf {a}_{\mathbf {k} +4}\};\tag{16}\end{equation*}$ View Source
On the other side, from the second row of Figure 2, it can be inferred that each peripheral pixel is sampled with two of its sequential peripheral neighboring pixels in each direction. We consider four new sets $\widetilde {S}_{{\mathbf {k}}; \mathbf {k}\in \{0,1,2,3\}}$ of directional sequential peripheral neighbors with respect to directions $\frac {(2\mathbf {k}+1)\pi }{4}$ . Mathematical definition of $\widetilde {S}_{\mathbf {k}}$ is given by Eq. 17. $\begin{equation*} Direction \frac {(2\mathbf {k}+1)\pi }{4}: \widetilde {S}_{\mathbf {k}} =\{\mathbf {a}_{2\mathbf {k}}, \mathbf {a}_{2\mathbf {k} + 1}, \mathbf {a}_{2\mathbf {k} +2}\}\tag{17}\end{equation*}$ View Source
It appears from the literature that the average gray level as well as the median of the grey-scale values are widely accepted statistical parameters for texture analysis. In view of this and aiming at enhancing the thresholding range tolerance and thus finding a code which is insensitive to noise and more robust to illumination changes, several mean and median values (denoted as $\mathbf {a}_{\mu }$ , $\mathbf {a}_{\widetilde {\mu }}$ , $\mathbf {a}_{{S_{\mathbf {k}}}}$ , $\mathbf {a}_{{\widetilde {S}_{\mathbf {k}}}}$ , $\mathbf {a}_{\varrho }$ and $\mathbf {a}_{\widetilde {\varrho }}$ ) are incorporated as virtual pixels in the modeling of the proposed texture model. $\mathbf {a}_{\mu }$ , $\mathbf {a}_{\widetilde {\mu }}$ are respectively the average local and global gray levels of the whole $3\times 3$ square neighborhood and the whole image $I_{M\times {N} }$ , $\mathbf {a}_{{S_{\mathbf {k}}}}$ and $\mathbf {a}_{{\widetilde {S}_{\mathbf {k}}}}$ are the average directional gray levels according to both sets $S_{\mathbf {k}}$ and $\widetilde {S}_{\mathbf {k}}$ and $\mathbf {a}_{\varrho }$ and $\mathbf {a}_{\widetilde {\varrho }}$ are the median of the grey-scale values of the $3\times 3$ square region and the whole image $I_{M\times {N} }$ , respectively (cf. Eqs. 18, 19, 20 and 21). $\begin{align*} \mathbf {a}_{\mu }=&\frac {1}{9}\left({\mathbf {a}_{c}+\sum _{p=0}^{P-1}\mathbf {a}_{p}}\right) \tag{18}\\ \mathbf {a}_{\widetilde {\mu }}=&\frac {1}{M\times {N}}\sum _{i=0}^{M-1}\sum _{j=0}^{N-1}\mathbf {a}_{(i,j)} \\{}\tag{19}\\ Direction \frac {(\mathbf {k} \pi)}{4}\;: \mathbf {a}_{{S_{\mathbf {k}}}}=&\frac {(\mathbf {a}_{\mathbf {k}} + \mathbf {a}_{c}+\mathbf {a}_{\mathbf {k} +4})}{3}\; \tag{20}\\ Direction \frac {(2\mathbf {k}+1)\pi }{4}: \mathbf {a}_{{\widetilde {S}_{\mathbf {k}}}}=&\frac {(\mathbf {a}_{2\mathbf {k}} + 2 \mathbf {a}_{2\mathbf {k} + 1} + \mathbf {a}_{2\mathbf {k} +2})}{4} \\{}\tag{21}\end{align*}$ View Source
The pixel samples of both $S_{\mathbf {k}}$ and $\widetilde {S}_{\mathbf {k}}$ sets as well as the considered virtual pixels are then arranged into two directional neighborhood topologies based Sampling Sets, denoted as $\mathbb {SS}_{1}$ and $\mathbb {SS}_{2}$ (cf. Eqs. 22 and 23). The sampling set $\mathbb {SS}_{1}$ is constructed by considering the four couples of directional center-symmetric pixels { $(\mathbf {a}_{\mathbf {k}}, \mathbf {a}_{\mathbf {k} +4})$ ; $\mathbf {k} \in \{0,1,2,3\}\}$ , the two couples of the average directional gray levels {( $\mathbf {a}_{S_{2\mathbf {k}}}$ , $\mathbf {a}_{S_{2\mathbf {k}+1}}$ ); $\mathbf {k} \in \lbrace 0,1\rbrace$ } and the couple formed by the average local and global gray levels ( $\mathbf {a}_{\mu }$ and $\mathbf {a}_{\widetilde {\mu }}$ ). In contrast, the sampling set $\mathbb {SS}_{2}$ is constructed by considering the four couples of pixels $\{(\mathbf {a}_{2\mathbf {k} +1},\mathbf {a}_{{\widetilde {S}_{\mathbf {k}}}})$ ; $\mathbf {k} \in \{0,1,2,3\}\}$ , the couple formed by the average local and global gray levels ( $\mathbf {a}_{\mu }$ , $\mathbf {a}_{\widetilde {\mu }}$ ) and the couple formed by the median of local and global the grey-scale values $(\mathbf {a}_{\varrho },\mathbf {a}_{\widetilde {\varrho }})$ . Mathematical definitions of both sampling sets $\mathbb {SS}_{1}$ and $\mathbb {SS}_{2}$ are given by Eqs. 22 and 23. $\begin{align*} \mathbb {SS}_{1}=&\lbrace (\mathbf {a}_{\mu },\mathbf {a}_{\widetilde {\mu }}),(\mathbf {a}_{S_{0}},\mathbf {a}_{S_{1}}), (\mathbf {a}_{S_{2}},\mathbf {a}_{S_{3}}),\lbrace (\mathbf {a}_{\mathbf {k}},\mathbf {a}_{\mathbf {k}+4})\rbrace \rbrace \\{}\tag{22}\\ \mathbb {SS}_{2}=&\lbrace (\mathbf {a}_{\varrho },\mathbf {a}_{\widetilde {\varrho }}), (\mathbf {a}_{\mu },\mathbf {a}_{\widetilde {\mu }}), \lbrace (\mathbf {a}_{2\mathbf {k} +1},\mathbf {a}_{{\widetilde {S}_{\mathbf {k}}}})\rbrace \rbrace\tag{23}\end{align*}$ View Source where $\mathbf {k} \in \{0,1,2,3\}$ .
STAGE #2 (Pattern encoding): The local texture relationship between each couple of pixels within both sampling sets $\mathbb {SS}_{1}$ and $\mathbb {SS}_{2}$ and the central pixel, is encoded using a threshold values based LQP like coding scheme. The employed indicator function $\delta _{0}(\cdot,\cdot)$ that converts the couple relationship in quinary form is defined as follows (cf. Eq. 24): $\begin{align*} \delta _{0}(\mathbf {a}_{\mathbf {x}},\mathbf {a}_{\mathbf {y}})= \begin{cases} 2, & \text {if} \hspace {0.1cm} (\mathbf {a}_{\mathbf {x}},\mathbf {a}_{\mathbf {y}})\in \mathbb {SR}^{U}_{\tau _{2}} \\ 1, & \text {if} \hspace {0.1cm} (\mathbf {a}_{\mathbf {x}},\mathbf {a}_{\mathbf {y}})\in \mathbb {SR}^{U}_{\tau _{1}} \\ -1, & \text {if} \hspace {0.1cm} (\mathbf {a}_{\mathbf {x}},\mathbf {a}_{\mathbf {y}})\in \mathbb {SR}^{L}_{\tau _{1}} \\ -2, & \text {if} \hspace {0.1cm}(\mathbf {a}_{\mathbf {x}},\mathbf {a}_{\mathbf {y}})\in \mathbb {SR}^{L}_{\tau _{2}} \\ 0, & \text {otherwise} \hspace {0.1cm} \end{cases}\tag{24}\end{align*}$ View Source where $\tau _{1}$ and $\tau _{2}$ ( $\tau _{2} > \tau _{1}$ ) are two positive user-specified parameters (i.e., thresholds) which are introduced to alleviate the effects of external factors such as noise which destabilizes the patterns. $\mathbb {SR}^{U}_{\tau _{1}}$ , $\mathbb {SR}^{U}_{\tau _{2}}$ , $\mathbb {SR}^{L}_{\tau _{1}}$ and $\mathbb {SR}^{L}_{\tau _{2}}$ are four Sets of pixels Relationship expressed in the following ways: $\begin{align*} \mathbb {SR}^{U}_{\tau _{1}}=&\{(\mathbf {a}_{\mathbf {x}},\mathbf {a}_{\mathbf {y}})\!\in \!\mathbb {SS}_{1} | (\mathbf {a}_{\mathbf {x}}\!\geqslant \! \mathbf {a}_{c}\!+\!\tau _{1})~\& ~(\mathbf {a}_{\mathbf {y}}\!\geqslant \! \mathbf {a}_{c}\!-\!\tau _{1})\} \\{}\tag{25}\\ \mathbb {SR}^{L}_{\tau _{1}}=&\lbrace (\mathbf {a}_{\mathbf {x}},\mathbf {a}_{\mathbf {y}})\!\in \!\mathbb {SS}_{1}|(\mathbf {a}_{\mathbf {x}}\!\leqslant \! \mathbf {a}_{c}-\tau _{1})~\&\;(\mathbf {a}_{\mathbf {y}}\!\leqslant \! \mathbf {a}_{c}\!+\!\tau _{1})\rbrace \\{}\tag{26}\\ \mathbb {SR}^{U}_{\tau _{2}}=&\lbrace (\mathbf {a}_{\mathbf {x}},\mathbf {a}_{\mathbf {y}})\!\in \!\mathbb {SS}_{2}| (\mathbf {a}_{\mathbf {x}}\!\geqslant \! \mathbf {a}_{c}\!+\!\tau _{2})~\&\; (\mathbf {a}_{\mathbf {y}}\!\geqslant \! \mathbf {a}_{c}\!-\!\tau _{2})\rbrace \\{}\tag{27}\\ \mathbb {SR}^{L}_{\tau _{2}}=&\lbrace (\mathbf {a}_{\mathbf {x}},\mathbf {a}_{\mathbf {y}})\in \mathbb {SS}_{2}|(\mathbf {a}_{\mathbf {x}}\!\leqslant \! \mathbf {a}_{c}-\tau _{2})~\&\;(\mathbf {a}_{\mathbf {y}}\!\leqslant \! \mathbf {a}_{c}\!+\!\tau _{2})\rbrace \\{}\tag{28}\end{align*}$ View Source
It is acknowledged that the related information of the locality structure can be provided using positive and negative responses. The proposed descriptor splits then, by using the LQP’s concept, each quinary pattern into four distinct parts: two negative (i.e., lower) and two positive (i.e., upper) parts to generate four binary codes. With this convention, the four thresholding functions that permit to obtain the two lower and the two upper codes can be expressed in the following ways (cf. Eqs. 29, 30, 31 and 32): $\begin{align*} \delta _{+1}(\mathbf {a}_{\mathbf {x}},\mathbf {a}_{\mathbf {y}})= \begin{cases} 1, & \text {if} \hspace {0.1cm} (\mathbf {a}_{\mathbf {x}},\mathbf {a}_{\mathbf {y}})\in \mathbb {SR}^{U}_{\tau _{1}} \\ 0, & \text {otherwise} \hspace {0.1cm} \end{cases} \tag{29}\\ \delta _{-1}(\mathbf {a}_{\mathbf {x}},\mathbf {a}_{\mathbf {y}})= \begin{cases} 1, & \text {if} \hspace {0.1cm} (\mathbf {a}_{\mathbf {x}},\mathbf {a}_{\mathbf {y}})\in \mathbb {SR}^{L}_{\tau _{1}}\\ 0, & \text {otherwise} \hspace {0.1cm} \end{cases} \tag{30}\\ \delta _{+2}(\mathbf {a}_{\mathbf {x}},\mathbf {a}_{\mathbf {y}})= \begin{cases} 1, & \text {if} \hspace {0.1cm} (\mathbf {a}_{\mathbf {x}},\mathbf {a}_{\mathbf {y}})\in \mathbb {SR}^{U}_{\tau _{2}} \\ 0, & \text {otherwise} \hspace {0.1cm} \end{cases} \tag{31}\\ \delta _{-2}(\mathbf {a}_{\mathbf {x}},\mathbf {a}_{\mathbf {y}})= \begin{cases} 1, & \text {if} \hspace {0.1cm} (\mathbf {a}_{\mathbf {x}},\mathbf {a}_{\mathbf {y}})\in \mathbb {SR}^{L}_{\tau _{2}} \\ 0, & \text {otherwise} \hspace {0.1cm} \end{cases}\tag{32}\end{align*}$ View Source
The local information is encoded using two upper and two lower directional binary dual-cross encoders noted as $E_{\mathbb {SR}^{U}_{\tau _{1}}}$ , $E_{\mathbb {SR}^{U}_{\tau _{2}}}$ , $E_{\mathbb {SR}^{L}_{\tau _{1}}}$ and $E_{\mathbb {SR}^{L}_{\tau _{2}}}$ which are based on thresholding functions $\delta _{+1}$ , $\delta _{-1}$ , $\delta _{+2}$ and $\delta _{-2}$ and associated to the four sets of pixels relationship $\mathbb {SR}^{U}_{\tau _{1}}$ , $\mathbb {SR}^{U}_{\tau _{2}}$ , $\mathbb {SR}^{L}_{\tau _{1}}$ and $\mathbb {SR}^{L}_{\tau _{2}}$ , respectively. Formally, the codes produced by the four encoders $E_{\mathbb {SR}^{U}_{\tau _{1}}}$ , $E_{\mathbb {SR}^{U}_{\tau _{2}}}$ , $E_{\mathbb {SR}^{L}_{\tau _{1}}}$ and $E_{\mathbb {SR}^{L}_{\tau _{2}}}$ are computed as: $\begin{align*} E_{\mathbb {SR}^{U}_{\tau _{1}}}(\mathbf {I}_{m,n}^{3\times 3})=&\delta _{+1}(\mathbf {a}_{\mu },\mathbf {a}_{\widetilde {\mu }})\times 2^{6} \\&+\sum _{\mathbf {k}=0}^{1} \delta _{+1}(\mathbf {a}_{S_{2\mathbf {k}}},\mathbf {a}_{S_{2\mathbf {k}+1}})\times 2^{5-\mathbf {k}} \\&+\sum _{k=0}^{3}\delta _{+1}(\mathbf {a}_{\mathbf {k}},\mathbf {a}_{\mathbf {k}+4}) \times 2^{\mathbf {k}} \tag{33}\\ E_{\mathbb {SR}^{L}_{\tau _{1}}}(\mathbf {I}_{m,n}^{3\times 3})=&\delta _{-1}(\mathbf {a}_{\mu },\mathbf {a}_{\widetilde {\mu }})\times 2^{6} \\&+ \sum _{\mathbf {k}=0}^{1} \delta _{-1}(\mathbf {a}_{S_{2\mathbf {k}}},\mathbf {a}_{S_{2\mathbf {k}+1}})\times 2^{5-\mathbf {k}} \\&+ \sum _{\mathbf {k}=0}^{3} \delta _{-1}(a_{\mathbf {k}},a_{\mathbf {k}+4}) \times 2^{\mathbf {k}} \tag{34}\\ E_{\mathbb {SR}^{U}_{\tau _{2}}}(\mathbf {I}_{m,n}^{3\times 3})=&\delta _{+2}(\mathbf {a}_{\mu },\mathbf {a}_{\widetilde {\mu }})\times 2^{5} \\&+\delta _{+2}(\mathbf {a}_{\varrho },\mathbf {a}_{\widetilde {\varrho }})\times 2^{4} \\&+ \sum _{\mathbf {k}=0}^{3}\delta _{+2}(a_{2\mathbf {k}+1},\mathbf {a}_{{\widetilde {S}_{\mathbf {k}}}}) \times 2^{\mathbf {k}} \tag{35}\\ E_{\mathbb {SR}^{L}_{\tau _{2}}}(\mathbf {I}_{m,n}^{3\times 3})=&\delta _{-2}(\mathbf {a}_{\mu },\mathbf {a}_{\widetilde {\mu }})\times 2^{5}+\delta _{-2}(\mathbf {a}_{\varrho },\mathbf {a}_{\widetilde {\varrho }})\times 2^{4} \\&+ \sum _{\mathbf {k}=0}^{3}\delta _{-2}(a_{2\mathbf {k}+1},\mathbf {a}_{{\widetilde {S}_{\mathbf {k}}}}) \times 2^{\mathbf {k}}\tag{36}\end{align*}$ View Source
STAGE# 3 (Features extraction): After encoding each pixel in the input texture image using the four directional binary dual-cross encoders $E_{\mathbb {SR}^{U}_{\tau _{1}}}$ , $E_{\mathbb {SR}^{L}_{\tau _{1}}}$ , $E_{\mathbb {SR}^{U}_{\tau _{2}}}$ and $E_{\mathbb {SR}^{L}_{\tau _{2}}}$ , four code maps are produced. The histograms used as features representing the texture are generated from the obtained four code maps by the following equations: $\begin{align*} \mathbf {h}_{E_{\mathbb {SR}^{U}_{\tau _{1}}}}(\mathbf {k}_{1})=&\sum _{\mathbf {I}_{m,n}^{3\times 3}}\widehat {\delta }(E_{\mathbb {SR}^{U}_{\tau _{1}}}(\mathbf {I}_{m,n}^{3\times 3}),\mathbf {k}_{1}) \tag{37}\\ \mathbf {h}_{E_{\mathbb {SR}^{L}_{\tau _{1}}}}(\mathbf {k}_{1})=&\sum _{\mathbf {I}_{m,n}^{3\times 3}}\widehat {\delta }(E_{\mathbb {SR}^{L}_{\tau _{1}}}(\mathbf {I}_{m,n}^{3\times 3}),\mathbf {k}_{1}) \tag{38}\\ \mathbf {h}_{E_{\mathbb {SR}^{U}_{\tau _{2}}}}(\mathbf {k}_{2})=&\sum _{\mathbf {I}_{m,n}^{3\times 3}}\widehat {\delta }(E_{\mathbb {SR}^{U}_{\tau _{2}}}(\mathbf {I}_{m,n}^{3\times 3}),\mathbf {k}_{2}) \tag{39}\\ \mathbf {h}_{E_{\mathbb {SR}^{L}_{\tau _{2}}}}(\mathbf {k}_{2})=&\sum _{\mathbf {I}_{m,n}^{3\times 3}}\widehat {\delta }(E_{\mathbb {SR}^{L}_{\tau _{2}}}(\mathbf {I}_{m,n}^{3\times 3}),\mathbf {k}_{2})\tag{40}\end{align*}$ View Source where $\mathbf {k}_{1}\in [0,2^{7}]$ is the number of $E_{\mathbb {SR}^{U}_{\tau _{1}}}$ and $E_{\mathbb {SR}^{L}_{\tau _{1}}}$ patterns and $\mathbf {k}_{2}\in [{0,2^{6}}]$ is the number of $E_{\mathbb {SR}^{U}_{\tau _{2}}}$ and $E_{\mathbb {SR}^{L}_{\tau _{2}}}$ patterns. $\widehat {\delta }$ ( $\cdot$ , $\cdot$ ) denotes the Kronecker delta function defined as below: $\begin{align*} \widehat {\delta }(\boldsymbol{\alpha },\boldsymbol{\beta })= \begin{cases} 1, & \text {if} \hspace {0.1cm} \boldsymbol{\alpha }=\boldsymbol{\beta }\\ 0, & \text {otherwise} \hspace {0.1cm} \end{cases}\tag{41}\end{align*}$ View Source
STAGE #4 (Multi-scale scheme): At present, the histograms from single scale analysis realized with the four directional binary dual-cross encoders $E_{\mathbb {SR}^{U}_{\tau _{1}}}$ , $E_{\mathbb {SR}^{L}_{\tau _{1}}}$ , $E_{\mathbb {SR}^{U}_{\tau _{2}}}$ and $E_{\mathbb {SR}^{L}_{\tau _{2}}}$ cannot meet the requirements for ultimate recognition score.
Taking into consideration that various features have various capabilities to demonstrate images and to make them more robust to scale variations, the richer detailed texture information can be captured by multi-scale fusion operation that consists of performing a linear combination of these features. In current study, a novel hybrid histogram is generated by combining the information obtained by $E_{\mathbb {SR}^{U}_{\tau _{1}}}$ , $E_{\mathbb {SR}^{L}_{\tau _{1}}}$ , $E_{\mathbb {SR}^{U}_{\tau _{2}}}$ and $E_{\mathbb {SR}^{L}_{\tau _{2}}}$ encoders into a single vector feature. Note that this new hybrid texture description model is expected to be more effective as it permits to reduce the noise sensitivity and improves the discrimination capability of $E_{\mathbb {SR}^{U}_{\tau _{1}}}$ , $E_{\mathbb {SR}^{L}_{\tau _{1}}}$ , $E_{\mathbb {SR}^{U}_{\tau _{2}}}$ and $E_{\mathbb {SR}^{L}_{\tau _{2}}}$ operators through their complementary informations. The constructed histogram feature vector of multi-scale analysis is illustrated as follows: $\begin{equation*} \mathbf {h}_{\text {DNT-MQP}} = \left \langle{ \mathbf {h}_{E_{\mathbb {SR}^{U}_{\tau _{1}}}}, \mathbf {h}_{E_{\mathbb {SR}^{L}_{\tau _{1}}}}, \mathbf {h}_{E_{\mathbb {SR}^{U}_{\tau _{2}}}}, \mathbf {h}_{E_{\mathbb {SR}^{L}_{\tau _{2}}}} }\right \rangle\tag{42}\end{equation*}$ View Source

where

$\left \langle{ }\right \rangle$

is the concatenation operator.

FIGURE 2.

The adjacent neighbor relation for each of the 8 neighbors. (Top) the four sets of directional center-symmetric neighboring pixels and (Down) the four sets of directional peripheral neighboring pixels.

Show All

B. Proposed Dynamic Thresholds for DNT-MQP

Evidently, a parametric method evaluated with its user-specified parameters optimized over each tested dataset leads to achieving satisfactory classification results. In this paper, to ensure a meaningful and reasonable comparison with parameter-free state-of-the-arts methods, we propose to define locally and dynamically both parameters $\tau _{1}$ and $\tau _{2}$ for quinary pattern creation process of our method. For that, we consider a local image patch of size $3\times 3$ and the neighbor to center difference denoted as $df_{3\times 3}$ (cf. Eq. 43) is first calculated. Then, the mean of all negative difference values $df^{mean^{-}}_{3\times 3}$ and the mean of all positive difference values $df^{mean^{+}}_{3\times 3}$ are calculated from the $df_{3\times 3}$ set using Eq. 44. $\begin{align*} df_{3\times 3}=&[\mathbf {a}_{1}-\mathbf {a}_{c},\mathbf {a}_{2}-\mathbf {a}_{c},\ldots,\mathbf {a}_{7}-\mathbf {a}_{c}] \tag{43}\\ df^{mean^{+}}_{3\times 3}=&\frac {1}{pv} \sum _{k=1}^{pv} df_{k}^{+} \hspace {1cm} df^{mean^{-}}_{3\times 3}=\frac {1}{nv} \sum _{k=1}^{nv} |df_{k}^{-}| \\{}\tag{44}\end{align*}$ View Source where $df_{k}^{+}$ and $df_{k}^{-}$ are, respectively, the positive (i.e., $\mathbf {a}_{k}-\mathbf {a}_{c}\geq 0$ ) and negative (i.e., $\mathbf {a}_{k}-\mathbf {a}_{c} < 0$ ) difference values in the $df_{3\times 3}$ set, $pv$ is the number of $df_{k}^{+}$ elements and $nv$ is the number of $df_{k}^{-}$ elements ( $pv + nv =$ P). Finally, both parameters $\tau _{1}$ and $\tau _{2}$ are measured using the following equation (cf. Eq. 45): $\begin{align*} \tau _{1}=\frac {|df^{mean^{+}}_{3\times 3}-df^{mean^{-}}_{3\times 3}|}{max(df^{mean^{+}}_{3\times 3},df^{mean^{-}}_{3\times 3})} \hspace {0.05cm} \tau _{2}=\frac {df^{mean^{+}}_{3\times 3}+df^{mean^{-}}_{3\times 3}}{min(df^{mean^{+}}_{3\times 3},df^{mean^{-}}_{3\times 3})} \\{}\tag{45}\end{align*}$ View Source

The pseudo-code of the proposed DNT-MQP descriptor is illustrated in algorithm 1.

Algorithm 1 Computing DNT-MQP descriptor

Require:

$I \leftarrow$ input grayscale image $I_{M\times {N} }$ .

Output:

$\mathbf {h}_{\text {DNT-MQP}} \leftarrow$ the multi-scale histogram feature.

1:

Calculate the average global gray levels $\mathbf {a}_{\widetilde {\mu }}$ of the whole image $I_{M\times {N} }$ using Eq. 19

2:

Calculate the median of the grey-scale values $\mathbf {a}_{\widetilde {\varrho }}$ of the whole image $I_{M\times {N} }$ .

3:

for Each image pixel $\mathbf {a}_{c}$ of $I_{M\times {N} }$ do

4:

Consider a local square window $\mathbf {I}_{m,n}^{3\times 3}$ of dimension $3\times 3$ around $\mathbf {a}_{c}$ .

5:

Calculate the neighbor to center difference $df_{3\times 3}$ and then calculate both dynamic thresholds $\tau _{1}$ and $\tau _{2}$ using Eq. 45.

6:

Calculate the average directional gray levels $\mathbf {a}_{{S_{\mathbf {k}}}}$ and $\mathbf {a}_{{\widetilde {S}_{\mathbf {k}}}}$ according to both sets $S_{\mathbf {k}}$ and $\widetilde {S}_{\mathbf {k}}$ (cf. Eqs 16 and 17) using Eqs. 20 and 21.

7:

Calculate (using Eqs. 33, 34, 35 and 36, respectively):

$E_{\mathbb {SR}^{U}_{\tau _{1}}}(\mathbf {I}_{m,n}^{3\times 3}) \leftarrow$ the upper directional binary code based on thresholding function $\delta _{+1}$ and associated to the set of pixels relationship $\mathbb {SR}^{U}_{\tau _{1}}$ .
$E_{\mathbb {SR}^{L}_{\tau _{1}}}(\mathbf {I}_{m,n}^{3\times 3}) \leftarrow$ the lower directional binary code based on thresholding function $\delta _{-1}$ and associated to the set of pixels relationship $\mathbb {SR}^{L}_{\tau _{1}}$ .
$E_{\mathbb {SR}^{U}_{\tau _{2}}}(\mathbf {I}_{m,n}^{3\times 3}) \leftarrow$ the upper directional binary code based on thresholding function $\delta _{+2}$ and associated to the set of pixels relationship $\mathbb {SR}^{U}_{\tau _{2}}$ .
$E_{\mathbb {SR}^{L}_{\tau _{2}}}(\mathbf {I}_{m,n}^{3\times 3}) \leftarrow$ the lower directional binary code based on thresholding function $\delta _{-2}$ and associated to the set of pixels relationship $\mathbb {SR}^{L}_{\tau _{2}}$ .

8:

end for

9:

Calculate (using Eqs. 37, 38, 39 and 40, respectively):

$\mathbf {h}_{E_{\mathbb {SR}^{U}_{\tau _{1}}}} \leftarrow$ histogram feature of $E_{\mathbb {SR}^{U}_{\tau _{1}}}$ code map.
$\mathbf {h}_{E_{\mathbb {SR}^{L}_{\tau _{1}}}} \leftarrow$ histogram feature of $E_{\mathbb {SR}^{L}_{\tau _{1}}}$ code map.
$\mathbf {h}_{E_{\mathbb {SR}^{U}_{\tau _{2}}}} \leftarrow$ histogram feature of $E_{\mathbb {SR}^{U}_{\tau _{2}}}$ code map.
$\mathbf {h}_{E_{\mathbb {SR}^{L}_{\tau _{2}}}} \leftarrow$ histogram feature of $E_{\mathbb {SR}^{L}_{\tau _{2}}}$ code map.

10:

Calculate the multi-scale histogram feature $\mathbf {h}_{\text {DNT-MQP}}$ using Eq. 42.

11:

return $\mathbf {h}_{\text {DNT-MQP}}$

C. Advantages of DNT-MQP and Discussion

As a LBP and LQP texture operators variant, the designed DNT-MQP descriptor preserves the same merits concerning the invariance with the monotonic lighting change as well as the low complexity. Furthermore, DNT-MQP presents other merits that are discussed herein below.

Compared to some recent state-of-the-art texture descriptors like LQPAT [47], LCCMSP [49] and ARCSLBP [63], which suffer from an inborn defect of LBP, DNT-MQP descriptor can better describe local texture characteristics of the image with lower computational complexity. In order to visually show the effectiveness of the coding strategy of DNT-MQP, we illustrate in Figure 3 examples of histogram based-matching of two sample texture images, selected from USPTex database and belonging to two distinct texture classes using DNT-MQP, LQPAT, LCCMSP and ARCSLBP. It is easy to observe that DNT-MQP carries more information than those of the other descriptors which permit to DNT-MQP to have better discriminative ability compared to LQPAT, LCCMSP and ARCSLBP. Moreover, the histograms of both images using DNT-MQP are considered as not similar since the distance (using L1-city block distance) between the two samples, which is 1.1298 is higher compared to LQPAT, LCCMSP and ARCSLBP where their measured distances which are respectively 0.6173, 0.6924 and 0.7119, can be considered similar (lower than those obtained by DNT-MQP). To further highlight its effectiveness, DNT-MQP is applied on various challenging representative texture datasets in the Section IV.

FIGURE 3.

Comparison of DNT-MQP, LCCMSP, ARCSLBP and LQPAT features on two sample texture images representing two distinct texture classes from USPTex dataset, left and right columns present, respectively, the histograms of the images (a) and (b) using the different descriptors.

Show All

SECTION IV.

Experimental Results and Discussion

In this phase of study, we show the effectiveness and validate the performance stability of the proposed DNT-MQP operator with comprehensive tests on sixteen representative widely used texture databases. Furthermore, DNT-MQP is compared to a large number of recent most promising state-of-the-art texture descriptors and several CNN-based features to highlight its provided performance improvement. The evaluated methods are summarized in Table 1. Note that, some of these methods are implemented using the original available codes while for the others we have used our own implementation according to their respective papers and which are fine tuned to get the same output results as given in the published paper. The experiments herein follow the standard evaluation protocol for each tested dataset (i.e., split-sample validation) where 50% of the samples are randomly selected to be used as the training set, and the remaining 50% of the samples are regarded as the testing set. The samples of the test set are then classified through the parameter-free nearest-neighbor rule (1-NN) with L1-city block distance. We repeat each experiment 100 times to remove any bias related to the division of the dataset and the averaged results are considered as estimated accuracies. In what follows, the texture datasets considered in the experiments are first presented and the obtained experimental results are then discussed.

TABLE 1 Texture Descriptors Tested and Compared With Our Proposed Descriptor

A. Texture Datasets

For the sake of verification of the effectiveness and performance stability of DNT-MQP, we carried out extensive tests on sixteen well-known texture databases including Jerry Wu, CUReT, KTH-TIPS, USPTex, Brodatz, KTH-TIPS2b, Bonn BTF, MondialMarmi, TC-00, TC-01, TC-13, Kylberg and XUH databases (the same datasets used in [52], [63]), NewBarkTex¹ and MBT² databases. These well-known datasets for texture classification (recognition) were selected to cover various characteristics in terms of number of samples, number of classes and specific challenges posed by each texture dataset. More information about each dataset is summarized in Table 2. As can be seen from Table 2, each texture dataset has its own specific challenges in terms of rotation, scale, translation, illumination, view angle and other variations, which will allow the performance of the proposed descriptor as well as those of the evaluated state-of-the-art methods to be assessed against these factors.

TABLE 2 Image Datasets Used in This Study. The Table Presents the Properties of Each Dataset, Including the Number of Classes and Variety of Samples in View Point, Scale, Illumination Changes, Rotation, Etc

B. Effects of Different Component Combinations

Generally, it is difficult to fulfill simultaneously several qualities by a single encoder. The trend towards combining several texture representations generated using different encoders seems to be the best way forward. Indeed, the combination can be suitable in order to exploit advantages of each encoders, reduce the problems that arise in each individual method and, then, improve the classification results. There are four main components in the designed DNT-MQP descriptor, such as $E_{\mathbb {SR}^{U}_{\tau _{1}}}$ , $E_{\mathbb {SR}^{L}_{\tau _{1}}}$ , $E_{\mathbb {SR}^{U}_{\tau _{2}}}$ and $E_{\mathbb {SR}^{L}_{\tau _{2}}}$ (denoted as single scale DNT-QP1, DNT-QP2, DNT-QP3, DNT-QP4, respectively in Figure 4) encoders where their combination gives rise to the DNT-MQP descriptor (DNT-MQP= $\big \langle \text {DNT-QP}1, \text {DNT-QP}2, \text {DNT-QP}3, \text {DNT-QP}4 \big \rangle$ ). In this subsection, we evaluate their effects and impact on the overall performance applied alone, in combination two by two (DNT-QP12= $\big \langle \text {DNT-QP}1, \text {DNT-QP}2 \big \rangle$ and DNT-QP34= $\big \langle \text {DNT-QP}3, \text {DNT-QP}4 \big \rangle$ ), three by three (DNT-QP123 = $\big \langle \text {DNT-QP}1, \text {DNT-QP}2, \text {DNT-QP}3 \big \rangle$ and DNT-QP124 = $\big \langle \text {DNT-QP}1, \text {DNT-QP}2, \text {DNT-QP}4 \big \rangle$ ) against the DNT-MQP descriptor using the USPtex, MondialMarmi and NewBarkTex databases. It can be seen from Figure 4 that the classification accuracy has improved every time a new component is added, demonstrating the effectiveness of each component and thus justifying the fact of combining the different components. The DNT-MQP descriptor which is an hybrid texture description model that combines all the four components achieves the best classification results as it extracts complementary texture information from fused components.

FIGURE 4.

Classification results of single-scale image descriptors with their combinations.

Show All

C. Comparative Assessment of Performance

1) Experiment #1: Investigation on Performance Stability

Table 3 depicts the average classification scores (i.e., over the 100 subdivisions) of each tested method and for each texture database as well as the global average performance (GAP) and the mean of standard deviation (mean Std) of each method over all the datasets. According to the results reported in Table 3, we can readily make the following observations:

It is easy to observe that descriptors like LDENP, LDV, DC and $EUL_{\text {LTP}}$ present the lowest performance on almost all the used dataset. Indeed, these methods are often found at the bottom of the ranking as they achieve classification performance which are mostly poor or at least lowest than the other evaluated descriptors.
Results indicate that, except operators like LDENP, LDV, DC and $EUL_{\text {LTP}}$ which perform poorly, all the other evaluated methods show promising results on Brodatz database (dataset 2 in Table 3) where their achieved score are above 96%. Descriptors like LDZP, KLBP, ILQP, RALBGC, LCCMSP, ARCS-LBP, LDTP, MNTCDP as well as the designed DNT-MQP operator manage to differentiate all classes perfectly (score equal to 100%) on Brodatz database, leaving then essentially no room for improvement. This remark is also true for KTH-TIPS database (dataset 9 in Table 3) where DNT-MQP as well as several evaluated state-of-the-art descriptors manage to differentiate all classes perfectly.
It can be found that there is a notable performance drop for all the tested operators on several datasets including USPtex, MondialMarmi, KTH-TIPS2b, TC-13, NewBarkTex and MBT (datasets 1, 3, 4, 7, 10, 13, and 16 in Table 3). The overall classification accuracy achieved on TC-13, NewBarkTex and MBT are below 90%. It is worth mentioning that in this study the 1-NN classifier was used for classification but more complicated machine learning algorithms such as support vector machine (SVM) and extended nearest neighbor (ENN) may improve the overall performance.
It is evident from the results reported in Table 3 that none of the evaluated methods ensure obtaining satisfactory classification results over all the sixteen tested datasets. Considering for example the DC descriptor, it performs nicely on Brodatz, KTH-TIPS (it gets a score of 100%) and Kylberg datasets but shows poor performance on the others tested datasets. We can express the same remark for many other evaluated methods. LETRIST descriptor which is ranked as the 4th best descriptor as will be shown later, it allows achieving good classification results on the majority of the considered datasets but its weaknesses appeared in classifying the images of the NewBarkTex dataset as the achieved score drops dramatically and is about 63% vs 86.98% (difference of around 23.86%) realized with the top 1 descriptor (i.e., the proposed DNT-MQP descriptor). The same comment is true for many other methods such as CSQP, LOOP, LDEBP and so on.
It can also be seen in Table 3 that DNT-MQP realizes a performance that is competitive or better than all the evaluated state-of-the-art descriptors. Furthermore, according to literature review it can be found that DNT-MQP universally is among one of the best descriptors in terms of overall accuracy as it achieves outstanding classification outcomes on nearly all the selected datasets. The striking performance of DNT-MQP can be judged by the fact that it realizes the highest global average performance (GAP) against all the evaluated methods.
Remarkably, DNT-MQP achieves superior classification results against SSN (Spatio-Spectral Networks) descriptor, which was originally designed for color-texture analysis, on USPtex, TC-13, NewBarkTex and MBT color texture datasets. The SSN method, like the majority of the methods oriented to color texture analysis, tends to be more sensitive to illumination and resolution [71]. Also, it chiefly either does not consider spatial relationships between the pixels in the image or give lower weightages. Thereby, dominant color gives the distribution of the features, which may lead to lower accuracy. While, DNT-MQP has several advantages, including invariance with the monotonic lighting change, low complexity and low-dimensionality in feature representation.
Noticeably, considering the ranking between the tested descriptors within each used dataset, DNT-MQP stands out as the best descriptor as it performs consistently and significantly the best for ten texture datasets: USPtex, Brodatz, KTH-TIPS2b, MondialMarmi, TC-00, TC-01,TC-13, KTH-TIPS, NewBarkTex and Kylberg datasets (1-7, 9, 10, and 15, respectively). Furthermore, it is in the top 3 methods on four tested datasets. It is interesting to note that when DNT-MQP does not achieve the highest specific scores (i.e., it is not the top texture descriptor), it provides an interesting competitive average performance compared to the score yielded by the top one texture operator. Considering for example Bonn BTF dataset (number 12) where DNT-MQP has the sixth-highest average performance (i.e., ranked at the 6th position), it allowed to reach a score of 99.25% which is considered as very satisfactory classification result (very close to the score of the descriptors ranked before it: 100% for the top one).
The outstanding results on KTH-TIPS2b, MondialMarmi, TC-00, TC-01, TC-13, Jerry Wu and UMD verify the fact that DNT-MQP can resist the rotation variations. Note that DNT-MQP has good tolerance to rotation against LETRIST and SBP2 which are originally conceived/developped for rotation-invariant texture classification. In particular, DNT-MQP achieves 95.12%, 93.64%, 99.70%, 86.59% on KTH-TIPS2b, MondialMarmi, TC-01 and TC-13, respectively, where the performance improvement over SBP2 and LETRIST on these datasets are of (4.23%,5.04%), (3.93%,6.65%), (2.05%, 1.19%) and (2.35%, 2.94%). Furthermore, DNT-MQP has good tolerance to illumination changes as it provides the best average performance on KTH-TIPS2b, TC-00, TC-01, TC-13, KTH-TIPS, NewBarkTex and UMD datasets and interesting competitive classification results on CUReT, Jerry Wu, Bonn BTF and Kylberg datasets. In addition, the obtained superior and or competitive satisfactory results on, on the one side, UMD, TC-13 and KTH-TIPS2b, and on, on the other side, UMD and XUH, indicate that DNT-MQP has good tolerance to scale changes and viewpoint changes, respectively.

Considering the above discussions, it can be noticed that the designed DNT-MQP operator shows a significant performance stability over the evaluated state-of-the-art texture methods on almost all the considered datasets. Indeed, DNT-MQP has lower oscillation of performance as it keeps a consistent performance throughout various texture databases while most of the literature methods oscillate. The performance stability can also be confirmed by the fact that DNT-MQP has achieved the lowest mean standard deviation, corroborating to its robustness against any bias related to the division of the datasets. These indicate that the combination of single scale DNT-QP features helped to construct a texture operator that performs better on a wide selection of various texture datasets.

TABLE 3 Average Classification Accuracy. The Last Two Columns Represent, Respectively, the Global Average Performance (GAP) and the Mean Standard Deviation of Each Method Over All the Datasets. The SSN Operator as a Color Image Descriptor is Only Evaluated on Color Texture Datasets

2) Experiment #2: Statistical Significance of the Obtained Results in Terms of Accuracy Improvement

The major reason to carry out this experiment is to further validate statistically the obtained classification results via the designed method vs the evaluated state-of-the-art descriptors using the Wilcoxon signed rank test based ranking technique proposed in [49]. The technique is applied on all the pairwise combinations of the 24 evaluated texture descriptors considered in this experiment on the sixteen used texture datasets. Table 4 gathers the obtained ranking results according to the normalized number of victories (number of wins/(number of tested datasets*(number of tested descriptors - 1))) achieved by each evaluated method on all the tested datasets. It is easy to observe from the results shown in Table 4 that the designed DNT-MQP operator is clearly the best performing descriptor in comparison with the evaluated state-of-the-art methods, which corroborate all the result analysis extracted from Table 3. Remarkably, the normalized number of victories achieved by DNT-MQP is 0.80, vs. 0.71 with LETRIST (top $2^{nd}$ ), vs. 0.68 with LCCMSP (top $3^{th}$ ), vs. 0.67 with ARCS-LBP (top $4^{th}$ ), etc. Particularly, if we consider the classification performance of LETRIST (the best second descriptor) as baseline, DNT-MQP texture operator provides about 11.5% improvement over the sixteen tested texture datasets.

TABLE 4 Ranking Outcomes Based on the Normalized Number of Victories Achieved by Each Applied Method on All the Used Datasets After Applying the Wilcoxon-Based Ranking Test

3) Experiment #3: Comparisons With CNN-Based Features

To evaluate in depth the effectiveness of DNT-MQP at the highest level, we also compared its performance to that of the well-known CNN-based features using the pre-trained deep learning models ResNet50, ResNet101, AlexNet, VGG16 and VGG 19, with varying layers in some of them. Note that the outputs of the evaluated CNN-based features are treated as feature vectors that are exploited in a manner similar to how handcrafted features are normally used. Table 5 summarizes the comparison results. Noticeably, it can be found that DNT-MQP reaches the highest average performance compared, similarly to the local descriptors, to all tested CNN-based features on 10 datasets out of 16. Note that, when DNT-MQP is not ranked at the first position, it allowed, as shown in Table 5, to provide an interesting average performance, although not achieving the highest specific scores (i.e., it achieves similar accuracies to the CNNs ranked before it), but with the advantage of DNT-MQP being conceptually much easier to implement and training free. Note that CNNs are more suitable when there is high intraclass variability, whereas local descriptors works better with homogeneous, fine-grained textures with low intraclass variability. Indeed, when the images have similar contents and fewer patterns, the deep models tend to extract redundant feature values which make the classifier perform badly on these features and make different models provide nearly the same performance. On the other hand, when the images have complex contents and various patterns, like in the USPtex dataset, the deep features provide better presentation than the local descriptors [70].

TABLE 5 Performance Comparison of DNT-MQP and CNN-Based Features. The Last Column Represent the Global Average Performance (GAP) of Each Method Over All the Datasets

D. Research Implementation

A laptop, equipped with a 2.10 GHz Core i7 CPU, 8 GB of RAM and having Ubuntu 14.04 trusty operating system was used in the experiments. We implemented all the methods using MATLAB R2013a. The implementation of the used methods needed approximately 97 hours. Figure 5 demonstrates the required time for processing (in minutes) containing time for feature extraction, calculation of distance and also 1-NN classification applied to 23940 images in the 16 datasets used in this study. According to Figure 5, it can be seen that the proposed DNT-MQP method is faster than the two best performing descriptors ranked after it which confirms that DNT-MQP makes an admissible tradeoff between speed and classification performance.

FIGURE 5.

Processing time (in minutes) of the top twelve tested texture descriptors over all considered dataset.

Show All

SECTION V.

Conclusion

Herein, motivated by several state-of-the-art LBP-like methods, a conceptually simple, easy-to-implement yet highly discriminative texture operator so-called Directional Neighborhood Topologies based Multi-scale Quinary Pattern (DNT-MQP) is designed for texture description and classification. DNT-MQP operator captures, thanks to the use of several directional neighborhood topologies, richer detailed and complementary texture information. As it has been proved through the experimental results, DNT-MQP showed its high ability to differentiate, with high precision, different classes of a great number of benchmark texture datasets, while enjoying a low-dimensional representation. Furthermore, the discriminative power of DNT-MQP is proved against 34 recent state-of-the-art methods, indicating that DNT-MQP is a strong candidate for texture modeling. Future work will concern the investigation of the use of other well known classifiers to increase classification performance of the proposed model. In addition, the proposed texture operator holds potential for deployment across high-level challenging applications related to texture classification including dynamic texture classification, background subtraction in complex scenes, object recognition and video-based face analysis and so on.

MIT Libraries

MIT Libraries

Directional Neighborhood Topologies Based Multi-Scale Quinary Pattern for Texture Classification

Abstract:

Metadata

Abstract:

Introduction