Fractal image compression (FIC) is an image coding technology based on the local similarity of image structure. It was proposed by Michael F. Barnsley in 1988 [1], and improved by Arnaud E. Jacquin in 1992 [2]. Jacquin's FIC scheme is called the baseline fractal image compression (BFIC), which uses the partitioned iterated function system (PIFS) to search the matching block pairs without human-computer interactions. Since then, FIC has a quick development and numerous FIC schemes have been published.
Traditional image coding technologies encode an image by pixel-based and statistical methods, and FIC is an interesting attempt at the structure-based image coding. FIC has been used not only in image coding, but also in some interesting image problems [3]–[9] and pattern recognition problems such as facial recognition [10]. However, compared with traditional image coding technologies, fractal compression suffers from the high computational complexity in encoding.
Generally, one image is firstly partitioned into some square blocks in FIC, and then these square blocks compose a set called the pool. According to two kinds of size, an image is partitioned into two different pools. The pool composed by the blocks with larger size is called the domain pool, and the other pool is called the range pool. The cells in the range pool are the blocks to be encoded. The blocks in the domain pool are contracted to the same size as the range blocks, and then FIC takes the domain pool as a virtual codebook. The matching domain block for each range block needs to be searched. Exhaustive search of the matching block pairs costs too much time, which is one of the major difficulties in BFIC.
Lots of works had been conducted to speed up the process of searching the matching pairs in FIC in recent years [11]–[22]. Some schemes were proposed based on a selected part of the domain pool to reduce the search cost for a range block [12]–[14]. For example, based on the spatial correlation on the range and domain blocks, many fast FIC schemes were proposed by searching the matching domain block from the adjacent domain block of the current range block [13] or from the adjacent domain blocks of the domain block which is the matching block for the adjacent range block of the current range block [14]. Classification was also applied in some fast FIC schemes [15]–[18] to classify the domain blocks into some different classes, then the matching block for a range block can be searched from several classes closely associated with the range block. Fisher's fast FIC scheme proposed in 1992 [15] is a typical classification scheme. Some other strategies, such as synthesis with other technologies [19], and sorting with some features [20], [21], were applied to speed up encoding in FIC.
Although numerous schemes were proposed to speed up encoding in FIC, the encoding time is still too long so far. For example, the encoding time for a 512×512 image with 4×4 range blocks is more than 20 seconds in the DRDC scheme proposed by Riccardo Distasi et al. in 2006 [18], and the encoding time for a 256×256 image is more than 2.8 seconds in the DUFC scheme proposed by Yi-Ming Zhou et al. in 2009 [19]. Of course, it should be considered that the number of pairwise block-comparisons for a 512×512 image is 16 times more than the number of pairwise comparisons for a 256×256 image in BFIC with the same partition. Although these schemes should not be compared without the same computer environment, they still show that there is a lot of work to do for speeding up encoding in FIC. Moreover, some schemes, such as the variance-based block-sorting scheme proposed by He et al. in 2004 [20] and the Fisher's 72 classes scheme [15], are more efficient in encoding, but the reconstructed image quality in these schemes cannot be well preserved. Hence, a fast FIC scheme is required to speed up encoding more efficiently and, to preserve the reconstructed image quality better.
In this paper, a novel fractal compression scheme is proposed to meet both the efficiency and the reconstructed image quality requirements. This scheme is based on the fact that the affine similarity between two image blocks is equivalent to the absolute value of Pearson's correlation coefficient (APCC) between them [22], [23]. Firstly, all the domain blocks are classified into 3 classes according to the classification method proposed by Fisher in 1992 [15]. We will prove the Fisher's 3 classes method is an APCC-based method in Section III in this paper. Secondly, the domain blocks are sorted in each class by APCCs between these domain blocks and a preset block. Then the matching domain block for a range block can be searched from the APCC interval in which these domain blocks have the closer APCCs with APCC the range block has (all the APCCs are computed with the preset block). Since both the steps in our scheme are based on APCC which is equivalent to the affine similarity in FIC, the reconstructed image quality is well preserved. Moreover, the encoding time is significantly reduced in our APCC-based FIC scheme.
The rest of this paper is structured as follows. The concept of fractal image compression and the relationship between affine similarity and APCC are briefly introduced in Section II. In Section III, the strategy of choosing and classifying the domain blocks is introduced. In Section IV, we focus on the APCC-based domain block sorting algorithm and the method of searching the block pairs. The preset blocks are trained offline in Section V. Several optimization methods involved in the proposed scheme are introduced in Section VI. The experiment results are shown in Section VII. Finally, the conclusions are drawn in Section VIII.
SECTION II.
BFIC and Affine Similarity
Jacquin's FIC scheme is regarded as the baseline fractal image compression (BFIC), in which the self-similarity of an image is measured by the affine similarity in the partitioned iterated function system (PIFS).
A. The Baseline FIC
As introduced in Section I, an image is firstly partitioned into two pools according to two different sizes $n\times n$ and $2n\times 2n$ in BFIC, one is the range pool filled with the non-overlapping blocks with size $n\times n$, and the other is the domain pool. All domain cells are then contracted into size $n\times n$ by averaging four pixels to one pixel. Then the range cells are the blocks to be encoded, and FIC takes the domain pool as a virtual codebook. The domain pool is called the “virtual codebook” because it is only used during encoding but not during the decoding process.
To improve the reconstructed image quality, eight transformations are applied to all domain blocks to octuple the domain block number. These transformations are $T_{0}$, $T_{1}$, $T_{2}$, $T_{3}$, $T_{4}$, $T_{5}$, $T_{6}$, and $T_{7}$, respectively, as introduced below [24]:
$T_{0}$: Identity.
$T_{1}$: Orthogonal reflection about mid-vertical axis.
$T_{2}$: Orthogonal reflection about mid-horizontal axis.
$T_{3}$: Orthogonal reflection about first diagonal.
$T_{4}$: Orthogonal reflection about second diagonal.
$T_{5}$: Rotation around center of block, through ${+}{\rm 90}^{\circ}$.
$T_{6}$: Rotation around center of block, through ${+}{\rm 180}^{\circ}$.
$T_{7}$: Rotation around center of block, through $+270^{\circ}$.
Subsequently, for an arbitrary range block ${\bf{R}}$, a domain block ${\bf{D}}$ has to be searched so that one affine transformation $s{\bf{D}}{+}o{{\bf 1}}$ exists to minimize the squared $L^{2}$ distance with ${\bf{R}}$, where $s$ and $o$ are the affine scalar parameters and ${{\bf 1}}$ is a block with size $n\times n$ in which all pixels equal to 1, then $${\rm{MSE}}({\bf{R}},s{\bf{D}}+o{{\bf 1}})={\left\Vert{{\bf{R}}-(s{\bf{D}}+o{{\bf 1}})}\right\Vert_{2}}\eqno{\hbox{(1)}}$$View Source
{\rm{MSE}}({\bf{R}},s{\bf{D}}+o{{\bf 1}})={\left\Vert{{\bf{R}}-(s{\bf{D}}+o{{\bf 1}})}\right\Vert_{2}}\eqno{\hbox{(1)}} where ${\left\Vert\cdot\right\Vert_{2}}$ is the two-norm. Finally, the combination of ($s$, $o$, index of ${\bf{D}}$ in the domain pool) constructs the PIFS subsystem of ${\bf{R}}$ in BFIC, and all the subsystems of range cells group into the PIFS of the original image.
Minimizing MSE in (1) by the least-squares method, we can obtain the values of parameters $s$, $o$, and the simplified energy function $H$ as follows [23]:$$\eqalignno{{{\mathop{\max}\limits_{\bf{D}}{\rm H}}}=&\,{\sigma_{\bf{RD}}^{2}\over{\sigma_{\bf{D}}^{2}}}\cr s=&\,{{\sigma_{\bf{RD}}}\over{\sigma_{\bf{D}}^{2}}}\cr o=&\,{\mu_{\bf{R}}}-s{\mu_{\bf{D}}}&{\hbox{(2)}}}$$View Source
\eqalignno{{{\mathop{\max}\limits_{\bf{D}}{\rm H}}}=&\,{\sigma_{\bf{RD}}^{2}\over{\sigma_{\bf{D}}^{2}}}\cr s=&\,{{\sigma_{\bf{RD}}}\over{\sigma_{\bf{D}}^{2}}}\cr o=&\,{\mu_{\bf{R}}}-s{\mu_{\bf{D}}}&{\hbox{(2)}}} where $\mu_{\bf{R}}$ and $\mu_{\bf{D}}$ are the mean intensity of the blocks ${\bf{R}}$ and ${\bf{D}}$, respectively, ${\sigma_{\bf{D}}}$ is the standard deviation of the block ${\bf{D}}$, and ${\sigma_{\bf{RD}}}$ is the covariance between the blocks ${\bf{R}}$ and ${\bf{D}}$. For the range block ${\bf{R}}$, maximizing H in the domain pool is equivalent to minimizing MSE in (1). The three sub-equations in (2) construct the expressions of the BFIC.
The reconstruction of the image is a process of iterative computation. It achieves the attractor of the PIFS, which is a fractal representation of the original image.
B. The Affine Similarity in BFIC
In (2), a hypothesis of ${\sigma_{\bf{D}}}\ne 0$ is made. In fact, if ${\sigma_{\bf{D}}}=0$, the block ${\bf{D}}$ is directly proportional to the block ${{\bf 1}}$. Then the scalar multiplier $s$ in $s{\bf{D}}+o{{\bf 1}}$ can be added to the offset $o$ and set $s=0$. Similarly, if ${\sigma_{\bf{R}}}=0$, the block ${\bf{R}}$ is directly proportional to the block ${{\bf 1}}$, and then the block ${\bf{R}}$ can be losslessly expressed by the block ${{\bf 1}}$. Hence, the parameter $s$ in $s{\bf{D}}+o{{\bf 1}}$ can also be set to 0.
If ${\sigma_{\bf{R}}}\ne 0$ and ${\sigma_{\bf{D}}}\ne 0$, it has been proved in some papers [22], [23] that the affine similarity between two image blocks in FIC is equivalent to the absolute value of the Pearson's correlation coefficient (APCC) between them whether the image measurement is MSE or SSIM [25], [26]. In fact, as in (1), MSE between a range block ${\bf R}$ and a linear expression of the domain block ${\bf{D}}$ can be computed as following [22]:$$\eqalignno{&{\rm{MS}}{{\rm{E}}^{2}}({\bf{R}},{\rm{s}}{\bf{D}}+o{{\bf 1}})=\left\Vert{{\bf{R}}-(s{\bf{D}}+o{{\bf 1}})}\right\Vert_{2}^{2}\cr &=m^{2}(\sigma_{\bf{R}}^{2}-{{\sigma_{\bf{RD}}^{2}}\over{\sigma_{\bf{D}}^{2}}})=m^{2}\sigma_{\bf{R}}^{2}(1-\rho_{\bf{RD}}^{2}) &{\hbox{(3)}}}$$View Source
\eqalignno{&{\rm{MS}}{{\rm{E}}^{2}}({\bf{R}},{\rm{s}}{\bf{D}}+o{{\bf 1}})=\left\Vert{{\bf{R}}-(s{\bf{D}}+o{{\bf 1}})}\right\Vert_{2}^{2}\cr &=m^{2}(\sigma_{\bf{R}}^{2}-{{\sigma_{\bf{RD}}^{2}}\over{\sigma_{\bf{D}}^{2}}})=m^{2}\sigma_{\bf{R}}^{2}(1-\rho_{\bf{RD}}^{2}) &{\hbox{(3)}}} where ${\sigma_{\bf{R}}}$ is the standard deviation of the block ${\bf{R}}$, ${\rho_{\bf{RD}}}={{\sigma_{\bf{RD}}}\over{{\sigma_{\bf{R}}}{\sigma_{\bf{D}}}}}$ is the Pearson's correlation coefficient between ${\bf R}$ and ${\bf{D}}$, and $m=n\times n$ is the number of pixels in the block. In the process of searching a matching domain block for the range block ${\bf R}$, the values of $m$ and $\sigma_{\bf{R}}^{2}$ are invariant. Hence, the most affine similar block for ${\bf R}$ is the block having the largest absolute value of Pearson's correlation coefficient with ${\bf R}$.
SECTION III.
APCC-Based Block Classification
The absolute value of the Pearson's correlation coefficient (APCC) should be an efficient feature to speed up encoding because the affine similarity between two blocks in FIC is equivalent to APCC between them.
If the block numbers in the range pool and domain pool are ${{\bf{R}}_{no}}$ and ${{\bf{D}}_{no}}$, respectively, the number of pairwise comparisons is ${{\bf{R}}_{no}}\times{{\bf{D}}_{no}}$ in the exhausting search in BFIC.
In BFIC, the original domain blocks with size $2n\times 2n$ are firstly contracted as the same size as the range blocks. Then eight transformations are applied to them to octuple the block number in the domain pool. According to the above analysis, octupling the block number in the domain pool leads to octuple the computational complexity in searching the matching pairs. Hence, there is great potential to speed up encoding by reducing the scale of the domain pool.
In 1992, Fisher proposed a FIC classification scheme to speed up encoding, in which a method existed to cut short the domain pool with classifying the domain blocks into 3 classes [15]. We call it as the Fisher's 3 classes method in this paper. In Fisher's scheme, every range or domain block is firstly divided into four sub-blocks according to the mid-horizontal axis and the mid-vertical axis, then all the range and domain blocks are classified into 3 classes according to the means of luminance in the four sub-blocks, or 24 classes according to the order of the variances in four sub-blocks, or 72 classes by combining the 3 classes method and the 24 classes methods. In our scheme, only the Fisher's 3 classes method is used.
In Fisher's 3 classes method, for an arbitrary block in the range or domain pool, suppose the sums of luminance in its four sub-blocks are $a_{1}$, $a_{2}$, $a_{3}$, and $a_{4}$, from the upper left to the lower right, respectively. We have known that every block ${\bf{D}}$ in the domain pool is transformed to eight blocks, ${\bf{D}}$, ${\bf{D}}_{1}$, ${\bf{D}}_{2}$, ${\bf{D}}_{3}$, ${\bf{D}}_{4}$, ${\bf{D}}_{5}$, ${\bf{D}}_{6}$, and ${\bf{D}}_{7}$ by eight transformations $T_{0}$, $T_{1}$, $T_{2}$, $T_{3}$, $T_{4}$, $T_{5}$, $T_{6}$, and $T_{7}$ in BFIC. If $a_{1}$, $a_{2}$, $a_{3}$, and $a_{4}$ vary each other in ${\bf{D}}$, there is one and only one block derived form ${\bf{D}}$ meeting one of the three conditions in (4):$$\eqalignno{{a_{1}}\geq&\,{a_{2}}\geq{a_{3}}\geq{a_{4}}\hfill\cr{a_{1}}\geq&\,{a_{2}}\geq{a_{4}}\geq{a_{3}}\cr{a_{1}}\geq&\,{a_{4}}\geq{a_{2}}\geq{a_{3}}.&{\hbox{(4)}}}$$View Source
\eqalignno{{a_{1}}\geq&\,{a_{2}}\geq{a_{3}}\geq{a_{4}}\hfill\cr{a_{1}}\geq&\,{a_{2}}\geq{a_{4}}\geq{a_{3}}\cr{a_{1}}\geq&\,{a_{4}}\geq{a_{2}}\geq{a_{3}}.&{\hbox{(4)}}} Then block ${\bf{D}}$ is classified into one of the three classes in accordance with the condition which one of its generated block meets in (4), and the corresponding transformation ${T_{\bf{D}}}$ which transforms ${\bf{D}}$ into this class can also be obtained. If some values are kept the same in $a_{1}$, $a_{2}$, $a_{3}$, and $a_{4}$, two or more of the generated blocks from ${\bf{D}}$ may meet two or all of the conditions in (4). In this case, the block can be classified into two or all classes.
Then we have the Lemma 1 as following:
Lemma 1
The Fisher's 3 classes method is an APCC-based block classification method.
Proof
For a block ${\bf{D}}$ in one of the three classes expressed in (4), now we prove that an APCC-based block classification method exists to classify ${\bf{D}}$ into the same class.
Construct three matrixes $C_{1}$, $C_{2}$, and $C_{3}$ as following:$${C_{1}}=\left[{\matrix{{\bf 2}&{\bf 1}\cr{\bf 0}&-{\bf 3}}}\right],{C_{2}}=\left[{\matrix{{\bf 2}&{\bf 1}\cr-{\bf 3}&{\bf 0}}}\right],{C_{3}}=\left[{\matrix{{\bf 2}&{\bf 0}\cr-{\bf 3}&{\bf 1}}}\right]$$View Source
{C_{1}}=\left[{\matrix{{\bf 2}&{\bf 1}\cr{\bf 0}&-{\bf 3}}}\right],{C_{2}}=\left[{\matrix{{\bf 2}&{\bf 1}\cr-{\bf 3}&{\bf 0}}}\right],{C_{3}}=\left[{\matrix{{\bf 2}&{\bf 0}\cr-{\bf 3}&{\bf 1}}}\right]
Each number in the matrixes $C_{1}$, $C_{2}$, and $C_{3}$ denotes a square sub-matrix filling with the number. Suppose the standard deviation is ${\sigma_{\bf{C}}}$ for $C_{1}$, $C_{2}$, and $C_{3}$, and the luminance sums for the four sub-blocks in the block ${\bf{D}}$ are $a_{1}$, $a_{2}$, $a_{3}$, and $a_{4}$, respectively, and the standard deviation of ${\bf{D}}$ is ${\sigma_{\bf{D}}}$.
Next we classify block ${\bf{D}}$ by the APCC-based block classification method in which ${\bf{D}}$ is classified into the class related to the matrix in $C_{1}$, $C_{2}$, and $C_{3}$ which has the maximum APCC with ${\bf{D}}$. Suppose the APCCs between ${\bf{D}}$ and three matrixes are $ACC_{1}$, $ACC_{2}$, and $ACC_{3}$, respectively, then we have$$\eqalignno{AC{C_{1}}=&\,\vert 2{a_{1}}+{a_{2}}-3{a_{4}}\vert/({\sigma_{_{\bf{C}}}}{\sigma_{\bf{D}}})\cr AC{C_{2}}=&\,\vert 2{a_{1}}+{a_{2}}-3{a_{3}}\vert/({\sigma_{_{\bf{C}}}}{\sigma_{\bf{D}}})\cr AC{C_{3}}=&\,\vert 2{a_{1}}+{a_{4}}-3{a_{3}}\vert/({\sigma_{_{\bf{C}}}}{\sigma_{\bf{D}}}).&{\hbox{(5)}}}$$View Source
\eqalignno{AC{C_{1}}=&\,\vert 2{a_{1}}+{a_{2}}-3{a_{4}}\vert/({\sigma_{_{\bf{C}}}}{\sigma_{\bf{D}}})\cr AC{C_{2}}=&\,\vert 2{a_{1}}+{a_{2}}-3{a_{3}}\vert/({\sigma_{_{\bf{C}}}}{\sigma_{\bf{D}}})\cr AC{C_{3}}=&\,\vert 2{a_{1}}+{a_{4}}-3{a_{3}}\vert/({\sigma_{_{\bf{C}}}}{\sigma_{\bf{D}}}).&{\hbox{(5)}}}
For ${a_{1}}\geq{a_{2}}\geq{a_{3}}\geq{a_{4}}$, we have $$AC{C_{1}}-AC{C_{2}}=3({a_{3}}-{a_{4}})/({\sigma_{_{\bf{C}}}}{\sigma_{\bf{D}}})\geq0.\eqno{\hbox{(6)}}$$View Source
AC{C_{1}}-AC{C_{2}}=3({a_{3}}-{a_{4}})/({\sigma_{_{\bf{C}}}}{\sigma_{\bf{D}}})\geq0.\eqno{\hbox{(6)}}
If $2{a_{1}}+{a_{4}}-3{a_{3}}\geq 0$, $$AC{C_{1}}-AC{C_{3}}=({a_{2}}+3{a_{3}}-4{a_{4}})/({\sigma_{_{\bf{C}}}}{\sigma_{\bf{D}}})\geq0.\eqno{\hbox{(7)}}$$View Source
AC{C_{1}}-AC{C_{3}}=({a_{2}}+3{a_{3}}-4{a_{4}})/({\sigma_{_{\bf{C}}}}{\sigma_{\bf{D}}})\geq0.\eqno{\hbox{(7)}}
If $2{a_{1}}+{a_{4}}-3{a_{3}}\leq 0$, $$AC{C_{1}}-AC{C_{3}}=(4{a_{1}}+{a_{2}}-3{a_{3}}-2{a_{4}})/({\sigma_{_{\bf{C}}}}{\sigma_{\bf{D}}})\geq0.\eqno{\hbox{(8)}}$$View Source
AC{C_{1}}-AC{C_{3}}=(4{a_{1}}+{a_{2}}-3{a_{3}}-2{a_{4}})/({\sigma_{_{\bf{C}}}}{\sigma_{\bf{D}}})\geq0.\eqno{\hbox{(8)}}
Hence, for ${a_{1}}\geq{a_{2}}\geq{a_{3}}\geq{a_{4}}$, whether $2{a_{1}}+{a_{4}}-3{a_{3}}$ is greater than zero or not, we have $$\eqalignno{AC{C_{1}}\geq&\, AC{C_{2}}\cr AC{C_{1}}\geq&\,AC{C_{3}}.&{\hbox{(9)}}}$$View Source
\eqalignno{AC{C_{1}}\geq&\, AC{C_{2}}\cr AC{C_{1}}\geq&\,AC{C_{3}}.&{\hbox{(9)}}}
Similarly, for ${a_{1}}\geq{a_{2}}\geq{a_{4}}\geq{a_{3}}$, whether $2{a_{1}}+{a_{4}}-3{a_{3}}$ is greater than zero or not, we have $$\eqalignno{AC{C_{2}}\geq&\, AC{C_{1}}\cr AC{C_{2}}\geq&\,AC{C_{3}}.&{\hbox{(10)}}}$$View Source
\eqalignno{AC{C_{2}}\geq&\, AC{C_{1}}\cr AC{C_{2}}\geq&\,AC{C_{3}}.&{\hbox{(10)}}}
And, for ${a_{1}}\geq{a_{4}}\geq{a_{2}}\geq{a_{3}}$, whether $2{a_{1}}+{a_{2}}-3{a_{4}}$ is greater than zero or not, we have $$\eqalignno{AC{C_{3}}\geq&\, AC{C_{1}}\cr AC{C_{3}}\geq&\,AC{C_{2}}.&{\hbox{(11)}}}$$View Source
\eqalignno{AC{C_{3}}\geq&\, AC{C_{1}}\cr AC{C_{3}}\geq&\,AC{C_{2}}.&{\hbox{(11)}}}
Synthetic analysis bout the inequations (9), (10), and (11) with considering the conditions that the equality holds, the class of block ${\bf{D}}$ classified by the APCC-based block classification method related to $C_{1}$, $C_{2}$, and $C_{3}$ is kept the same as the class classified by the Fisher's 3 classes method.
Because Fisher's 24 classes method and 72 classes method are not APCC-based classification method, only the Fisher's 3 classes method is used in our scheme.
In this paper, the domain pool without enlarging with the eight transformations $T_{0}$, $T_{1}$, $T_{2}$, $T_{3}$, $T_{4}$, $T_{5}$, $T_{6}$, and $T_{7}$ is called the crude domain pool, and the domain pool enlarged with these transformations is called the BFIC domain pool. In addition, we make the codes generated in the encoder of the proposed scheme can be directly used in BFIC decoder.
In our scheme, for simplicity, every block is classified into only one class according to priority order of class 1, class 2, and class 3 expressed in (4) even if some sub-blocks of the block have the same sum of luminance. Then the index of the block ${\bf{D}}$ in the crude domain pool, the transformation on ${\bf{D}}$ to the class it belongs to, and three domain classes filled with the transformed domain blocks are retained to encode an image. Other conditions, such as the redundant generated blocks from the block ${\bf{D}}$, are useless. Each range block is also classified with the same method, and its matching domain block is searched in the same class.
By this step, the total domain block number in the three domain classes is only one eighth of the domain block number in BFIC, which can greatly reduce the pairwise comparisons between the range blocks and domain blocks. Moreover, all the domain and range blocks are classified into 3 classes to reduce the pairwise comparisons further. More importantly, the reconstructed image quality is well preserved because the Fisher's 3 classes method is APCC-based.
SECTION IV.
Sorting Domain Blocks with APCC
To further reduce the computational cost in FIC, a new method to sort the domain blocks with respect to APCC is discussed below. In this paper, ${{\bf{I}}^{{\bf{n}}\times{\bf{n}}}}$ is the symbol of $n\times n$ dimension image space. For example, for the grayscale images, ${{\bf{I}}^{{\bf{n}}\times{\bf{n}}}}$ is the discrete $n\times n$ dimension space with the integer pixels lying in $[0, 255]$. Then a lemma is given as following:
Lemma 2
Suppose ${\bf{R}},{\bf{D}}\in{{\bf{I}}^{{\bf{n}}\times{\bf{n}}}}$. For an arbitrary non-zero variance vector ${\bf{B}}\in{{\bf{I}}^{{\bf{n}}\times{\bf{n}}}}$, if $\left\vert{\rho ({\bf{R}},{\bf{D}})}\right\vert\to 1$, then $\left\vert{\rho({\bf{R}},{\bf{B}})}\right\vert\to\left\vert{\rho ({\bf{D}},{\bf{B}})}\right\vert$. $\rho (\cdot,\cdot)$ is the Pearson's correlation coefficient.
Proof
If APCC of a pair of ${\bf R}$, ${\bf{D}}$, and ${\bf{B}}$ equals to 1, the lemma is obviously correct. If not, take ${\bf R}$, ${\bf{D}}$, and ${\bf{B}}$ as points in m-dimension space $(m=n\times n)$, and suppose the vectors connecting the points ${\bf R}$, ${\bf{D}}$, and ${\bf{B}}$ with the origin are $\mathop{r\kern0pt}\limits^{\rightharpoonup}$, $\mathop{d\kern0pt}\limits^{\rightharpoonup}$ and $\mathop{b\kern0pt}\limits^{\rightharpoonup}$, respectively. We have $$\vert\rho ({\bf{R}},{\bf{D}})\left\vert=\right\vert\rho (-{\bf{R}},{\bf{D}})\left\vert=\right\vert\rho ({\bf{R}},-{\bf{D}})\vert,\eqno{\hbox{(12)}}$$View Source
\vert\rho ({\bf{R}},{\bf{D}})\left\vert=\right\vert\rho (-{\bf{R}},{\bf{D}})\left\vert=\right\vert\rho ({\bf{R}},-{\bf{D}})\vert,\eqno{\hbox{(12)}} so $\vert\rho (\cdot,\cdot)\vert$ is an even function for its variables. Because $\rho({\bf{R}},{\bf{D}})$ equals to the cosine of the angle between $\mathop r\limits^{\rightharpoonup}$ and $\mathop d\limits^{\rightharpoonup}$, $\left\vert{\rho ({\bf{R}},{\bf{D}})}\right\vert$ then equals to the cosine of the minimum angle $\theta$ in both the angles: one is the angle between $\mathop r\limits^{\rightharpoonup}$ and $\mathop d\limits^{\rightharpoonup}$, and the other is between $\mathop r\limits^{\rightharpoonup}$ and ${-}\mathop d\limits^{\rightharpoonup}$. And let $\psi$ be the minimum angle in both the angles: one is the angle between $\mathop r\limits^{\rightharpoonup}$ and $\mathop{b\kern0pt}\limits^{\rightharpoonup}$, and the other is between $\mathop r\limits^{\rightharpoonup}$ and ${-}\mathop{b\kern0pt}\limits^{\rightharpoonup}$, and let $\omega$ be the minimum angle in both the angles: one is the angle between $\mathop d\limits^{\rightharpoonup}$ and $\mathop{b\kern0pt}\limits^{\rightharpoonup}$, and the other is between $\mathop d\limits^{\rightharpoonup}$ and ${-}\mathop{b\kern0pt}\limits^{\rightharpoonup}$, as shown as in Fig. 1. Then we have $\theta$, $\psi$, $\omega\in ({0^{\circ}},{90^{\circ}}]$ and $$\eqalignno{\cos\theta=&\,\left\vert{\rho ({\bf{R}},{\bf{D}})}\right\vert\hfill\cr\cos\psi=&\,\left\vert{\rho ({\bf{R}},{\bf{B}})}\right\vert\cr\cos\omega=&\,\left\vert{\rho ({\bf{D}},{\bf{B}})}\right\vert.&{\hbox{(13)}}}$$View Source
\eqalignno{\cos\theta=&\,\left\vert{\rho ({\bf{R}},{\bf{D}})}\right\vert\hfill\cr\cos\psi=&\,\left\vert{\rho ({\bf{R}},{\bf{B}})}\right\vert\cr\cos\omega=&\,\left\vert{\rho ({\bf{D}},{\bf{B}})}\right\vert.&{\hbox{(13)}}} According to the relationship between $\theta$, $\psi$ and $\omega$, we have $$\vert\theta-\psi\vert\leq\omega\leq{\rm{min}}({\rm{90}}^{\circ},\theta+\psi).\eqno{\hbox{(14)}}$$View Source
\vert\theta-\psi\vert\leq\omega\leq{\rm{min}}({\rm{90}}^{\circ},\theta+\psi).\eqno{\hbox{(14)}}For cosine is a monotonically decreasing function in $[{0^{\circ}},{90^{\circ}}]$, $${\max}(0,{\cos}(\theta+\psi))\leq{\cos}\omega\leq{\cos}(\theta-\psi).\eqno{\hbox{(15)}}$$View Source
{\max}(0,{\cos}(\theta+\psi))\leq{\cos}\omega\leq{\cos}(\theta-\psi).\eqno{\hbox{(15)}}
When $\left\vert{\rho ({\bf{R}},{\bf{D}})}\right\vert\to 1$, we have $\cos\theta\to 1$, $\sin\theta\to 0$, then $$\eqalignno{\mathop{\lim}\limits_{\cos\theta\to 1}{\cos}(\theta+\psi)=&\,\mathop{\lim}\limits_{\cos\theta\to1}{(\cos}\theta{\cos}\psi-{\sin}\theta{\sin}\psi)={\cos}\psi\cr\mathop{\lim}\limits_{\cos\theta\to1}{\cos}(\theta-\psi)=&\,\mathop{\lim}\limits_{\cos\theta\to 1}{(\cos}\theta{\cos}\psi+{\rm{sin}}\theta{\rm{sin}}\psi)={\cos}\psi.&{\hbox{(16)}}}$$View Source
\eqalignno{\mathop{\lim}\limits_{\cos\theta\to 1}{\cos}(\theta+\psi)=&\,\mathop{\lim}\limits_{\cos\theta\to1}{(\cos}\theta{\cos}\psi-{\sin}\theta{\sin}\psi)={\cos}\psi\cr\mathop{\lim}\limits_{\cos\theta\to1}{\cos}(\theta-\psi)=&\,\mathop{\lim}\limits_{\cos\theta\to 1}{(\cos}\theta{\cos}\psi+{\rm{sin}}\theta{\rm{sin}}\psi)={\cos}\psi.&{\hbox{(16)}}} Hence, $$\mathop{\lim}\limits_{\left\vert{\rho(\bf{R},\bf{D})}\right\vert\to 1}\left\vert{\rho (\bf{R},{\bf{B}})}\right\vert=\mathop{\lim}\limits_{\left\vert{\rho (\bf{R},\bf{D})}\right\vert\to 1}\left\vert{\rho (\bf{D},{\bf{B}})}\right\vert.\eqno{\hbox{(17)}}$$View Source
\mathop{\lim}\limits_{\left\vert{\rho(\bf{R},\bf{D})}\right\vert\to 1}\left\vert{\rho (\bf{R},{\bf{B}})}\right\vert=\mathop{\lim}\limits_{\left\vert{\rho (\bf{R},\bf{D})}\right\vert\to 1}\left\vert{\rho (\bf{D},{\bf{B}})}\right\vert.\eqno{\hbox{(17)}}
According to Lemma 2, if the domain block ${\bf{D}}$ is a high-quality matching block for the range block R, for an arbitrary block ${\bf{B}}$ with non-zero variance, APCC between ${\bf{R}}$ and ${\bf{B}}$ is close to APCC between ${\bf{D}}$ and ${\bf{B}}$. Moreover, it can be seen from Fig. 2 that the APCCs of about 70% of the matching blocks are more than 0.9 in the PIFSs of 10 test images with range size 4×4, and the APCCs of about 88.7% of the matching blocks are more than 0.85, and the APCCs of only 2.9% of the matching blocks are less than 0.8. There is no occurrence with APCC less than 0.66 in Fig. 2. Based on Lemma 2 and Fig. 2, a new APCC-based domain block sorting method is proposed: a preset block ${\bf{B}}$ is used to sort the domain blocks by APCCs computed between each domain block and the preset block ${\bf{B}}$. Then for a range block ${\bf{R}}$, its corresponding domain block can be searched in a set of domain blocks in which the APCCs of these domain blocks are close to APCC between ${\bf{R}}$ and ${\bf{B}}$.
Since the domain pool has been simplified and classified into 3 classes by Fisher's 3 classes method, now we need only sort each domain class independently. Then for a range block ${\bf{R}}$, the class to which ${\bf{R}}$ belongs should be computed. Because APCC is an absolute value, both the blocks ${\bf{R}}$ and ${-}{\bf R}$ need to be taken into consideration. Suppose the block ${\bf{R}}$ is transformed into ${{\bf{R}}^{\ast}}$ to match one class defined in (4), and block ${-}{\bf R}$ is transformed into ${(-{\bf R})^{\ast}}$ to match the same class. In fact, ${{\bf{R}}^{\ast}}$ and ${(-{\bf R})^{\ast}}$ always match the same class, but the transformation from ${\bf{R}}$ to ${{\bf{R}}^{\ast}}$ and the transformation from ${-}{\bf R}$ to ${(-{\bf R})^{\ast}}$ are generally different.
Take ${{\bf{R}}^{\ast}}$ as an example, suppose $acc$ is APCC between ${{\bf{R}}^{\ast}}$ and the preset block ${\bf{B}}$. Then a set of domain blocks is chosen for ${{\bf{R}}^{\ast}}$ to search its matching domain block. The set of domain blocks is chosen in the ordered domain class to make APCCs between the preset block ${\bf{B}}$ and these blocks in this set all around $acc$. That is to say, these blocks in this set are selected from an APCC interval centered by $acc$ in this ordered domain class. Then the pairwise comparisons are performed between ${{\bf{R}}^{\ast}}$ and all the blocks in the set to search the corresponding domain block for ${{\bf{R}}^{\ast}}$.
If the matching block for ${{\bf{R}}^{\ast}}$ is ${{\bf{D}}^{\ast}}$ in this set and ${T_{\bf{R}}}({\bf{R}})={{\bf{R}}^{\ast}}$, ${T_{\bf{D}}}({\bf{D}})={{\bf{D}}^{\ast}}$ in which ${\bf{D}}$ is one block in the crude domain pool and ${{\bf{D}}^{\ast}}$ is derived from ${\bf{D}}$, then ${({T_{\bf{R}}})^{-1}}{T_{\bf{D}}}({\bf{D}})$ is the matching domain block for the block ${\bf{R}}$. Because the eight transformations are closed for the inverse operations and compound operations, ${T_{\bf{R}}}$, ${T_{\bf{D}}}$, ${({T_{\bf{R}}})^{-1}}{T_{\bf{D}}}\in\{T_{0}, T_{1}, T_{2}, T_{3}, T_{4},T_{5}, T_{6}, T_{7}\}$. If the index of the block ${\bf{D}}$ in the crude domain pool is $t$, the index of ${({T_{\bf{R}}})^{-1}}{T_{\bf{D}}}({\bf{D}})$ in the BFIC domain pool can be calculated. Suppose it is $index(t,{T_{\bf{D}}}{({T_{\bf{R}}})^{-1}})$. Then the parameters $s$ and $o$ computed in (2), and $index(t,{T_{\bf{D}}}{({T_{\bf{R}}})^{-1}})$ compose the PIFS subsystem for the block ${\bf{R}}$. That is, $(s, o, index(t,{T_{\bf{D}}}{({T_{\bf{R}}})^{-1}}))$.
Because the PIFS subsystem in the proposed scheme has the same form with the PIFS subsystem of BFIC introduced in Subsection A in Section II, the proposed FIC scheme does not change the bit rates and compression ratios involved in the compression of each image compared with BFIC.
In the above analysis we take ${{\bf{R}}^{\ast}}$ as an example. Certainly, ${(-{\bf R})^{\ast}}$ also needs to be considered because APCC is an absolute value. The final matching block for ${\bf{R}}$ should be chosen from both the matching blocks for ${{\bf{R}}^{\ast}}$ and ${(-{\bf R})^{\ast}}$. The flowchart of this scheme is shown in Fig. 3.
For the block ${{\bf{R}}^{\ast}}$, a set of domain blocks is required to search its matching block in the corresponding domain class. In this paper, a simple strategy is used to obtain the set. For each block ${{\bf{R}}^{\ast}}$, we firstly find the block having the closest APCC with the APCC ${{\bf{R}}^{\ast}}$ has in the ordered domain class (all APCCs are computed with the preset block ${\bf{B}}$), then $k$ blocks near the block are selected for the final comparison.
SECTION V.
Training the Preset Block
In the APCC-based block sorting method proposed in this paper in Section IV, a preset block ${\bf{B}}$ is required to compute APCCs with other blocks. In the proposed scheme, the preset blocks are trained offline, and then they are directly applied in the encoding process as auxiliary blocks.
From Lemma 2 in Section IV, each block in ${\bf{I}}^{{\bf{n}}\times{\bf{n}}}$ with non-zero variance can be chosen as ${\bf{B}}$ in theory. However, experiment results indicate that it has a great impact on the visual quality of the reconstructed image to choose different blocks as the preset block ${\bf{B}}$. Because the block ${\bf{D}}$ satisfying $\left\vert{\rho ({\bf{R}},{\bf{D}})}\right\vert\to{1}$ is usually hard to search for ${\bf R}$, it is important to choose a proper block as the preset block ${\bf{B}}$ to search the best approximate ${\bf{D}}$. Moreover, the domain pool is classified into 3 classes and the search of the matching pairs in different classes is absolutely independent. Hence, we can choose different blocks as the preset blocks for different classes.
For a preset block ${\bf{B}}$, to search the matching block for a range block ${\bf R}$, we firstly find these blocks ${\bf{D}}$ meeting the (18) in this domain class:$$\left\vert{\rho (\bf{R},{\bf{B}})}\right\vert\approx\left\vert{\rho ({\bf{D}},{\bf{B}})}\right\vert\eqno{\hbox{(18)}}$$View Source
\left\vert{\rho (\bf{R},{\bf{B}})}\right\vert\approx\left\vert{\rho ({\bf{D}},{\bf{B}})}\right\vert\eqno{\hbox{(18)}}${\bf{B}}$ and ${\bf R}$ are known in the encoding process, then the set of ${\bf{D}}$ meeting (18) forms a structure $C$ of ${\bf R}$ and ${\bf{B}}$ which is a neighborhood of the surface of a double cone whose cone-axis is the line through the origin and the preset block ${\bf{B}}$, and cone-angle is twice the arc cosine of APCC between the blocks ${\bf R}$ and ${\bf{B}}$. However, the blocks ${\bf{D}}$ satisfying (19) only form a structure $l$ of ${\bf R}$ and ${\bf{B}}$ which is a neighborhood of one element of the double cone through the origin and ${\bf R}$. Fig. 4 shows the structures $C$ and $l$ constructed by (18) and (19), respectively.$$\left\vert{\rho ({\bf{R}},{\bf{D}})}\right\vert\to1.\eqno{\hbox{(19)}}$$View Source
\left\vert{\rho ({\bf{R}},{\bf{D}})}\right\vert\to1.\eqno{\hbox{(19)}}
In Section IV, a set containing $k$ domain blocks is selected to search the final matching block for block ${{\bf{R}}^{\ast}}$. The blocks in this set are likely to belong to the structure $C$ of ${{\bf{R}}^{\ast}}$ and ${\bf{B}}$ because they are selected with APCCs close to APCC between ${{\bf{R}}^{\ast}}$ and ${\bf{B}}$. Generally, because the set is selected with a few blocks, the precondition of one block to enter the selected small set is that this block belongs to $C$. However, the blocks in $l$ are just what we need, and $l$ is merely a part of $C$.
If the number of the domain blocks in $l$ is invariable, the opportunity of these blocks in $l$ to enter the selected domain set may reduce when the block number in $C$ increases. Hence, it may need to enlarge the set to search the matching block in $l$, which brings high computational complexity. If the set is not enlarged, the searched matching block may not meet the needs. Since there are usually lots of blocks beside the center block or barycenter block of the domain class, which can lead to a large structure $C$ if we use them as ${\bf{B}}$, both the center or barycenter are not the appropriate blocks to be ${\bf{B}}$, and ${\bf{B}}$ should even be far away from them.
From another view, for its difficulty to search the exact ${\bf{D}}$ to meet (19), the block ${\bf{B}}$ should be chosen to easy to find a not accuracy but acceptable ${\bf{D}}$ from $C$. In this case, $\left\vert{\rho ({\bf{R}},{\bf{D}})}\right\vert<1$ and there is a non-zero angle $\theta$ between the line through the origin and ${\bf R}$ and the line through the origin and ${\bf{D}}$. If block ${\bf{B}}$ lies on the super-plane which passes through the origin to separate the blocks ${\bf R}$ and ${\bf{D}}$ on its two sides and has the same angle $\theta/2$ with the line through the origin and ${\bf R}$ and the line through the origin and ${\bf{D}}$, the block ${\bf{D}}$ can be easily searched. Hence, statistically, block B should be chosen as the barycenter of the domain blocks from this view.
The above discussion draws two opposite conclusions. One asks for the preset block to get away from the barycenter of the domain class, and the other asks for taking the barycenter as the preset block. Finally, we have no any choices but to obtain the preset block ${\bf{B}}$ by training.
The target of training ${\bf{B}}$ is obvious: Find a block ${\bf{B}}$ in a sample set $S$ to maximize the sum of APCCs between each original block and its matching block searched in an interval generated by APCCs with ${\bf{B}}$ except the original block itself. Because the three classes in (4) are absolutely independent in the search process, their preset blocks can also be trained independently. Table I shows the procedure of training the preset block ${\bf{B}}_{1}$ for the first class.
All the data in Table I are defined except for $q$. In our experiment, although it can generate different preset blocks with different $q$, these blocks have almost the same efficiency for encoding. The best preset block trained with $q=200$ is also one of the best preset blocks with $q=300$.
We use a sample set containing 36633 sample blocks with size 4×4 and a sample set containing 77805 sample blocks with size 8×8 to train the preset blocks. The preset blocks trained as in Table I for three classes expressed in (4) with different sizes 4×4 and 8×8 are shown in Fig. 5, respectively.
SECTION VI.
Algorithm Optimization
In Sections III and IV, the novel APCC-based FIC scheme is proposed by two steps. In this section, we introduce several optimization methods in the proposed scheme.
A. Optimization of the Energy Function
For a range block ${\bf R}$, APCC between ${\bf R}$ and a domain block ${\bf{D}}$ can be computed as (20):$$\left\vert{\rho_{\bf{RD}}}\right\vert=\left\vert{{\sigma_{\bf{RD}}}\over{\sigma_{\bf{R}}\sigma_{\bf{D}}}}\right\vert\eqno{\hbox{(20)}}$$View Source
\left\vert{\rho_{\bf{RD}}}\right\vert=\left\vert{{\sigma_{\bf{RD}}}\over{\sigma_{\bf{R}}\sigma_{\bf{D}}}}\right\vert\eqno{\hbox{(20)}}
To search the corresponding domain block for block ${\bf R}$, these APCCs between ${\bf R}$ and some domain blocks need to be computed. In the process, ${\sigma_{\bf{R}}}$ is kept invariant. Hence, the energy function as (20) can be written as (21) to reduce the computational cost:$$H({\bf{R}},{\bf{D}})={{\sigma_{\bf{RD}}^{2}}\over{\sigma_{\bf{D}}^{2}}}\eqno{\hbox{(21)}}$$View Source
H({\bf{R}},{\bf{D}})={{\sigma_{\bf{RD}}^{2}}\over{\sigma_{\bf{D}}^{2}}}\eqno{\hbox{(21)}} If the domain block ${\bf{D}}$ is standardized to make its standard deviation equal to 1, then the maximization of APCC between ${\bf R}$ and ${\bf{D}}$ is just equivalent to maximizing the absolute value of the inner product between them. Here it is unnecessary to standardize the range block ${\bf R}$.
B. Optimization of Block Classification
In the Fisher's 3 classes method, each range or domain block needs to be classified into one of the three classes expressed in (4). For one given block, such as the range block ${\bf R}$, its corresponding class, and the transformation $T_{-{\bf R}}$ which transforms ${\bf R}$ to ${\bf{R}}^{\ast}$, and the transformation $T_{-{\bf R}}$ which transforms ${-}{\bf R}$ to $(-{\bf R})^{\ast}$ need to be considered, in which ${\bf{R}}^{\ast}$ and $(-{\bf R})^{\ast}$ are kept the same meanings as in Section IV.
To find the class to which ${\bf R}$ belongs, and the transformations $T_{\bf{R}}$ and $T_{-{\bf R}}$, the general method is to test the eight transformations introduced in Subsection A in Section II on the block ${\bf R}$ one by one, if one transformation transforms ${\bf R}$ into one class expressed in (4), then this class and the transformation are just what we need. This general method is easy to implement but time-consuming to compute.
According to (4), the class ${\bf R}$ belonged to, the transformation $T_{\bf{R}}$, and the transformation $T_{-{\bf R}}$ are all determined by the luminance sums of the four sub-blocks of ${\bf R}$. In the proposed scheme, we compute them offline and then directly apply these results to classify blocks and find their corresponding transformations. As mentioned in Section IV, we find that ${{\bf{R}}^{\ast}}$ and ${(-{\bf R})^{\ast}}$ always match the same class. If the luminance sums of the four sub-blocks of ${\bf R}$ vary each other, $T_{\bf{R}}$ and $T_{-{\bf R}}$ are always different.
Suppose the luminance sums of the four sub-blocks of ${\bf R}$ are $r_{1}$, $r_{2}$, $r_{3}$, and $r_{4}$, respectively. Some examples about the class, the transformation $T_{\bf{R}}$, and the transformation $T_{-{\bf R}}$ related to different orders of $r_{1}$, $r_{2}$, $r_{3}$, and $r_{4}$ are listed in Table II, in which the class 1, class 2, and class 3 are defined according to (4), respectively. There are 24 cases in total according to the order of $r_{1}$, $r_{2}$, $r_{3}$, and $r_{4}$.
C. The Compound Transformations
In Section IV, if the matching block for ${{\bf{R}}^{\ast}}$ is ${{\bf{D}}^{\ast}}$ and ${T_{\bf{R}}}({\bf{R}})={{\bf{R}}^{\ast}}$, ${T_{\bf{D}}}({\bf{D}})={{\bf{D}}^{\ast}}$, then ${({T_{\bf{R}}})^{-1}}{T_{\bf{D}}}({\bf{D}})$ is the matching domain block for the block ${\bf{R}}$. The compound transformation ${({T_{\bf{R}}})^{-1}}{T_{\bf{D}}}\in\{T_{0}, T_{1}, T_{2}, T_{3}, T_{4},T_{5}, T_{6}, T_{7}\}$ can also be computed offline. Then the results are directly applied in the encoding process. There are 64 cases in total for $({{T_{\bf{R}}})^{-1}}{T_{\bf{D}}}$, with 8 cases in $T_{\bf{R}}$ and 8 cases in ${T_{\bf{D}}}$.
SECTION VII.
Experiment Results and Analysis
To verify the performance of the proposed APCC-based block classification and sorting FIC scheme, the experiments are performed on 16 standard test images with size 512×512, 2 standard test images with size 1024×1024, and a standard test image with size 256×256. The peak signal-to-noise ratio (PSNR) is used to measure the reconstructed image quality for different schemes with different partitions. All the demos in this paper are implemented using C++ language, and the experiments are run on the computer environment: CPU I3–2100 and windows XP OS.
A. Experiment on the Preset Blocks
In Section V, the training method for the preset block is introduced, and the preset blocks for three classes with different sizes are shown in Fig. 5. In our experiment, compared with the center blocks which are the average blocks in each domain class, and the barycenter blocks which are the blocks with the maximum first-moment of APCC in each domain class, and the farthest blocks from the barycenter blocks which have the minimum first-moment of APCC in each domain class, the trained preset blocks offer an overwhelming advantage in generating the reconstructed images with highest PSNR values. Even if some reconstructed images with highest PSNR values are not generated by the trained preset blocks, their PSNR values are very close to PSNR values of the reconstructed images generated by the trained preset blocks. The comparison with different preset blocks in several 512×512 images are shown in Table III, in which the range size is 8×8, and the domain step-length is 8 pixels, and $k=100$.
B. Analysis of Computational Complexity
In the proposed scheme, the executing time in entire encoding process includes four parts. The first part is kept almost the same for natural images with the same size and image partition. It includes the time to construct the range pool and domain pool, and the time to compute the means and variances of domain blocks, and the time to standardize all the domain blocks. The second part is the time to classify and sort the domain blocks in each class. The third part for each range block is the time to compute its mean and variance, and the time of classification, and the time to select its domain set. The last part is the time to perform pairwise comparisons to search the final matching block for each range block.
As an example, the 512×512 image Lena with range size 4×4 and $dstep=8$ is used to discuss the time involved in the encoding process.
The domain pool in the proposed scheme is the crude domain pool defined in Section III, which only has one eighth of the block number of the BFIC domain pool. Hence, the first part of encoding time related to domain blocks is about one eighth of the corresponding time in BFIC. The first part of encoding time related to range blocks is kept the same as BFIC because the range pool in the proposed scheme is as same as in BFIC. For the image Lena mentioned above, the first part of encoding time is about 2.5 ms.
Since the results of classifying blocks with different order of luminance sums in the sub-blocks have been obtained offline as in Table II, the classification of domain blocks needs only compute their luminance sums in the four sub-blocks for each block, which spends little time. In this paper, the quicksort method [27] is used to sort the domain blocks in each class. For the image Lena mentioned above, the second part of encoding time is about 1.5 ms.
The third part encoding time is kept the same for one range block whether how many blocks are chosen in its domain set. The reason is that we only find the beginning index and the end index in the ordered domain class to construct the domain set. For the image Lena mentioned above, this part of encoding time is about 6.9 ms, in which the runtime to construct the domain set by the binary search algorithm is about 3.3 ms.
The main encoding time is still appeared in the last part. We have to perform the pairwise comparisons to search the final matching domain block for a range block. The number of pairwise block-comparisons in which one range block involved is $2k$ with the selected domain set containing $k$ blocks: $k$ for ${\bf R}$ and the other $k$ for ${-}{\bf R}$. Compared with BFIC in which one range block is compared with all the blocks in the BFIC domain pool, the proposed scheme can significantly reduce the computational complexity. For the image Lena mentioned above, the block number in the BFIC domain pool is 32768. If we choose $k$ equal to 20 in the proposed scheme, the number of pairwise block-comparisons in BFIC is as 32768/$40=819.2$ times as that of the proposed scheme. If $k$ is set to 100, the number of pairwise comparisons in BFIC is as $32768/200=163.84$ times as that of the proposed scheme. The last part of encoding time for encoding the image Lena is 15.6 ms with $k=20$, and 65.7 ms with $k=100$.
C. The Relationship between Different Parameters
In this paper, a domain set with $k$ blocks is selected for each range block during encoding. Larger $k$ enlarges the domain set, which can improve the reconstructed image quality, but the computational complexity for encoding an image is also increased. Hence, we must find a balance between the computational cost and the reconstructed image quality.
The domain step-length $dstep$ between two adjacent domain blocks in the image partition also has an important impact on the computational cost in the encoding process. In BFIC, it generates more domain blocks with a smaller domain step-length, which can improve the visual quality of the reconstructed images with increasing the computational cost in searching the matching pairs. However, in the proposed scheme, it can not be directly drawn the same conclusion.
The relationship between the reconstructed image quality, parameter $k$, and the domain step-length is shown in Fig. 6(a). The curves shown in Fig. 6(a) depict the PSNR values of the reconstructed image against the block number $k$ in the domain set for image pepper in four different partitions with $dstep=1$, $dstep=4$, $dstep=8$, and $dstep=16$, respectively. It can be seen that a positive correlation exists between the parameter $k$ and the values of PSNR of the reconstructed images.
Moreover, an interesting trend can be obtained from Fig. 6(a): when $k$ gets a small value, the reconstructed image quality with $dstep=16$ is not worse than the reconstructed image quality with $dstep=8$, and even better than the reconstructed image quality with $dstep=1$ and $dstep=4$. When $k$ gets a little larger, the reconstructed image quality is better in the curve with $dstep=8$. Further analysis indicates that the predominance of $dstep=1$ or $dstep=4$ just can be shown with a larger $k$. The reason can be found from Fig. 4: the blocks in the selected domain set are near the structure $C$, but the blocks we want just lie on or near $l$. If there are too many domain blocks with a smaller $dstep$, the blocks near $l$ may be hard to enter the domain set containing only $k$ blocks, thus the reconstructed image quality is likely to get worse. Certainly, for a large $k$, the scheme can also improve the image quality with increasing the number of the domain blocks.
The curves depicting the encoding time against the parameter $k$ with different image partitions for image pepper are shown in Fig. 6(b). They correspond to $dstep$ equal to 1, 4, 8, and 16, respectively. From Fig. 6(b) we can see that an approximate linear relation exists between the parameter $k$ and the encoding time. Moreover, the time difference becomes larger with larger parameter $k$ for different $dstep$.
According to Fig. 6, PSNRs of the reconstructed images of pepper with $dstep=16$ are very close to PSNRs of the reconstructed images with $dstep=8$ when $k<90$. Hence, if we use a small $k$ to speed up encoding, we can let $dstep=16$ to accelerate encoding further. Generally, it is a good idea to set $dstep$ as the side length or twice of the side length of the range block with a not large $k$ in the proposed scheme.
For image partitions with different block sizes, the conclusions drawn above are also kept. For example, if the range blocks are partitioned with size 4×4, we can encode a 512×512 image with a small $k$ and set $dstep$ as the side length or twice of the side length of the range block.
The reconstructed images for 512×512 images Lena, pepper, and man with range size 4×4 are shown in Fig. 7 for $k=20$ and $dstep=8$, and for $k=100$ and $dstep=8$, respectively. We can see in Fig. 7 that the reconstructed image quality is hard to distinguish in human visual system (HVS) for different encoding time 26.5 ms and 76.6 ms.
D. Analysis About Classification in the Domain Pool
In Section III the domain pool is classified by Fisher's 3 classes method. It seems that the purpose of classification is only to speed up encoding. In fact, it not only speeds up encoding but also improves the reconstructed image quality when $k$ is not large.
If the domain pool is not reduced and classified by the Fisher's 3 classes method, the number of blocks which meet the condition of ${\bf{D}}$ in (18) increases for the range block ${\bf R}$ and preset block ${\bf{B}}$. It may reduce the probability of the blocks meeting (19) to enter the domain set containing only $k$ blocks, which may lead to search a poor matching block for ${\bf R}$. By classifying the domain pool using the Fisher's 3 classes method, the domain blocks are greatly reduced. It retains only one eighth of the blocks in the BFIC domain pool. Moreover, almost all the domain blocks which have higher probability to match with the transformed range blocks are remained according to Fisher's 3 classes method.
A double cone and a conic section perpendicular to the central axis of the double cone are shown in Fig. 8. The edge of the conic section is a circle and each element of the double cone goes through the circle by one point. We project all the domain blocks meeting (18) in BFIC through the lines connecting these blocks and the origin on the super-plane braced by the conic section. As in Fig. 8, the domain blocks meeting (18) in BFIC now all lie on a neighborhood of the circle on the same super-plane. According to the order of the luminance sums in the four sub-blocks, one domain blocks in BFIC can be classified into 24 classes. However, the transformed range blocks are definitely classified into the three classes expressed in (4), which are only one eighth of all the 24 classes. Hence, the domain blocks meeting (19) only lie on a small part near the circle, which means the other 21 domain classes are useless. If we only retain the domain blocks in the three classes in (4) for encoding, little effect could appear in the visual quality of the reconstructed image. Furthermore, the remained domain blocks are classified into three classes to increase the matching probability in each class. By contraries, if the domain blocks in other 21 classes are also retained, the blocks meeting (19) are hard to enter the domain set including only $k$ blocks.
We name the scheme as the only-sorting scheme which directly sorts the domain pool according to Section IV without classifying the domain pool by the Fisher's 3 classes method. Compared with the only-sorting scheme, the block classification and sorting scheme is much more efficient and effective. The curve shown in Fig. 9(a) depicts the relationship of the parameter $k$ between the proposed block classification and sorting scheme and the proposed only-sorting scheme with range block size 8×8 and $dstep=16$ for image pepper. Each point $(k1, k2)$ on the curve means that the reconstructed images have the same PSNR value for $k=k1$ in the block classification and sorting scheme and $k=k2$ in the only-sorting scheme. It is easy to see that the only-sorting scheme needs more blocks in the selected domain set to reach the same PSNR value, the reason is that it must enlarge the domain set to eliminate the impact of too many domain blocks in the only-sorting scheme. The curve for the comparison of the encoding time in two schemes is shown in Fig. 9(b), we can see that the encoding process in the block classification and sorting scheme is much faster than in the only-sorting scheme to reach the same PSNR values.
E. Comparison of Different Schemes
In this subsection, the proposed APCC-based classification and sorting scheme is compared with some other fast FIC schemes: the Fisher's 72 classes scheme (Fisher72) [15], the HOG-based clustering scheme (HOG-FIC) [16], Kovacs's classification scheme (Kovacs) [17], the variance-based sorting scheme (VBFC) [20], the one norm-based kick-out scheme (kickout) [21], and the only-sorting scheme proposed in Section IV in this paper (only-sorting). All these schemes, except kickout scheme, are implemented with the optimized energy function introduced in Subsection A of Section VI, which provides a fair comparison of performance between them.
For a natural image with size 512×512, if the range block is partitioned with size 8×8 and the domain step-length is chosen as 16 pixels, the encoding time is about 2.1 s in BFIC. If the range size is 4×4 and the domain step-length is 8 pixels, the encoding time is about 8.7 s in BFIC. However, MSE energy function is used to speed up encoding in the kickout scheme. Although the encoding time in the kickout scheme is less than the encoding time in BFIC with MSE energy function, the efficiency of the kickout scheme can not catch up with BFIC using the optimized energy function. For example, it needs 2.4 s and 16.4 s, respectively, to encode 512×512 image Lena with range size 4×4 and 8×8.
The comparison of these schemes for 512×512 image Lena with range block size 4×4 and $dstep\,=8$ are shown in Fig. 10.
As shown in Fig. 10, VBFC has high efficiency in encoding, but the reconstructed image quality can not be well preserved.
The Fisher72 scheme just lies a point in Fig. 10, which means it can not keep a balance between the reconstructed image quality and efficiency when we need. In addition, PSNR in Fisher72 scheme is 37.27 dB and the encoding time is 62.2 ms in Fig. 10. In the proposed scheme, PSNR of the decoded image is 37.32 dB with $k=44$, and the encoding time is 41.8 ms. When the encoding time in the proposed scheme is about 61.8 ms which is near the encoding time in the Fisher72 scheme, the parameter $k$ equals to 76 and PSNR of the reconstructed image is up to 37.82 dB. More experiments show that each point in the Fisher72 scheme is below the curve related to the proposed scheme for all the test images.
In Kovacs's scheme, the range and domain blocks are classified according to the polar angle and the normalized root mean square error (NMRS). Because both of the features are not equivalent to the affine similarity in FIC, the reconstructed image quality can not catch up with the proposed scheme when the encoding time is kept the same.
The reconstructed images in the only-sorting scheme have lower values of PSNR when the encoding time is kept the same as in the proposed block classification and sorting scheme.
The HOG-FIC scheme can not encode the image within 600 ms because it needs more time to extract HOG feature and classify blocks by K-means clustering algorithm online.
As shown in Fig. 10, the proposed scheme has a fastest accelerating rate to get close to PSNR in BFIC, and PSNRs in the proposed scheme are always on the top of these schemes when the encoding time is less than 0.6 s. Moreover, it is clear that the reconstructed image quality is very close to the reconstructed image quality in BFIC when the encoding time is about 0.3 s, about 1/29 of the encoding time in BFIC with only 0.3 dB image quality loss compared with BFIC.
The average time/PSNR comparison for different schemes on 16 images with range block size 8×8 and $dstep=16$ is shown in Fig. 11. All the 16 images are with size 512×512, and they are aerial, baboon, barbara, Blonde, boat, bridge, couple, Elaine, F16, goldhill, house, Lena, man, milk drop, pepper, and sailboat, respectively. From Fig. 11 we can also see that the curve related to the proposed block classification and sorting scheme is always on the top of all the points or curves of these schemes.
The overall comparison with a larger time interval for image goldhill with range size 4×4 and $dstep=8$ in different schemes is shown in Fig. 12. Obviously, the proposed scheme is the fastest speed climber from the beginning point. From the starting point to the end point, the curve corresponding to the proposed scheme is always on the top of these fast FIC schemes. The PSNR value of the end point on the curve of the proposed scheme is 37.21 dB. In Kovacas's scheme, when the classification number of polar angle and NMRS are set to 3 and 4, respectively, the encoding time is 1.41 s and PSNR value of the reconstructed image is 37.25 dB, which is the closest value to 37.21 dB. For a 512×512 image, the encoding time of 1.41 s is too long.
Furthermore, for these schemes, such as Fisher72, Kovacs, and VBFC, the encoding time increases greatly for images with larger size. Take the 512×512 and 1024×1024 images Elaine with range size 8×8 and $dstep=16$ for example, the encoding time is 198.4 ms in the Fisher72 scheme to encode 1024×1024 image Elaine, which is about 10.8 times for it to encode 512×512 image Elaine (only 18.3 ms). It spends 4.07 s to encode the 1024×1024 image Elaine when both the classification numbers are set to 10 in Kovacs's scheme, which is as 25.6 times as the encoding time for it to encode the same image with the same classification numbers (only 159 ms). It spends 3.77 s when $k=200$ in VBFC to encode 1024×1024 image Elaine, which is 62.5 times more than the encoding time when $k=200$ for 512×512 image Elaine (only 60.4 ms). In the proposed scheme, the encoding time for $k\,=10$ is 43.3 ms to encode 1024×1024 image Elaine, which is as 4.4 times as 9.9 ms to encode 512×512 image Elaine, and when $k=44$ it is 117 ms to encode 1024×1024 image Elaine, which is as 4.3 times as 27.0 ms to encode 512×512 image Elaine.
It shows PSNRs and the encoding time for images Elaine and man with size 1024×1024 in different schemes in Table IV, from which we can see that PSNR values are still higher in the proposed scheme than other schemes when they have the same encoding time for the two images with size 1024×1024.
The curves with PSNR values of the reconstructed images against the encoding time for the 256×256 image cameraman with range block size 4×4 and $dstep=8$ in different schemes are shown in Fig. 13.
The over-long encoding time in fractal image compression is one of the major difficulties for its application. Although many researches were published to speed up encoding of FIC, the encoding time is still long, and in some schemes the reconstructed image quality becomes unacceptable. Based on the conclusion that the affine similarity between two image blocks in FIC is equivalent to the absolute value of Pearson's correlation coefficient between them, a new FIC scheme is proposed in this paper. Firstly, the Fisher's 3 classes method, which has been proved to be an APCC-based block classification method in this paper, is used to greatly reduce the number of blocks in the domain pool and classify the remaining domain blocks into 3 classes. Secondly, the domain blocks are sorted by APCCs between each domain block and the preset block in each class, and then the matching block for a range block is searched in a domain set selected by APCC with the preset block. Experiment results show that the proposed scheme can greatly reduce the encoding time with preserving the reconstructed image quality well.