Introduction
The ease of manipulating images with increasingly powerful image editing tools has also led to the growing appearance of digitally altered forgeries, which raise serious concerns about the credibility of digital images. Developing image forgery detection and localization techniques, therefore, becomes essential under the circumstances where digital images serve as critical evidence. A variety of approaches have been proposed to detect and localize image forgeries. Active techniques such as digital watermarking [1], [2] are effective in verifying the authenticity of an image, but the requirement of embedding additional information at the creation of the image limits their widespread use. On the contrary, passive or blind techniques rely on the image data itself without requiring any pre-embedded information and thus have attracted great interests over the past two decades. The passive image forgery detection and localization techniques in the literature mainly fall into five categories:
The first category is based on specific statistical properties of natural images such as higher-order wavelet statistics [3], [4], image texture features [5] and residual-based features [6], [7].
The second category includes the techniques that seek the traces left by specific image manipulations such as resampling [8], [9], contrast enhancement [10], [11], median filtering [12], copy-move manipulations [13] and JPEG compression [14]–[16].
The third category relies on the regularities or constraints that make images physically plausible. For instance, anomalies in lighting [17], shadows [18], [19], reflections [20] and perspective constraints [21] have been exploited for exposing image manipulations.
The fourth category exploits the artifacts introduced in the image acquisition pipeline of digital camera. These artifacts can be either caused by specific processing components such as demosaicking artifacts [22], camera response function abnormalities [23], [24], optical aberrations [25] and imaging sensor noise [26]–[28], or caused by the complex interplay of multiple components, such as local noise levels [29].
Inspired by the success of convolutional neural networks (CNN) in various multimedia tasks, the fifth category comprises the techniques that automatically learn the features for image forgery detection or localization using CNN [30]–[32].
Besides the above-mentioned techniques, recent years have seen some works on fusing the outputs of multiple forensic detectors, e.g. under the framework of fuzzy logic [33], the Dempster-Shafer theory of evidence (DSTE) [34], simple logical rules [35], [36] or Bayesian inference [37], but the performance of the overall system depends on that of each individual forensic detector. To localize the forgeries in an image, a prevailing paradigm is to move a regular, typically square, detection window across the image and analyze the forensic clues in the detection window at different locations. Such a sliding-window paradigm allows for the convenient and efficient processing of the image but also has two inherent limitations: 1) The regular shape of the detection window hinders the detection of forgeries in various possible shapes. For instance, a square detection window is not suitable for detecting tree branches or human limbs. 2) Due to the object boundary unawareness of the sliding-window detection scheme, pixels of different natures, e.g. some pixels are forged and others are pristine, are often contained in the same window. Jointly processing those ‘heterogeneous’ pixels without any distinction makes the detection highly error-prone.
The above limitations can be alleviated by image segmentation. Local structure of visual content is typically exploited by segmentation algorithms to divide an image into a number of ‘segments’ or ‘superpixels’, each of which consists of a group of connected pixels that are perceptually or statistically similar. The resultant superpixels provide informative clues for inferring the boundaries of the forged regions. However, given the variability of possible forgeries, the limitations of image segmentation are also apparent: 1) There might be no distinct local structure information in homogeneous regions to guide the segmentation. 2) The boundaries of the forged regions do not always align with the object boundaries. For instance, it is not uncommon to copy some collateral background pixels along with the forged object to make it blended into the background more naturally. In such cases, pixels of different natures, i.e. the collateral forged pixels and their neighboring authentic pixels, might be grouped into the same superpixel and cause further uncertainties.
In this article, we take forgery localization based on photo-response non-uniformity (PRNU) noise as an example and show that it is possible to mitigate the limitations of segmentation merely based on visual content by incorporating the local homogeneity of visually imperceptible clues. The main contributions of our work can be summarized as follows.
We identify the essential criteria that a superpixel segmentation algorithm should satisfy for the task of forgery localization and propose a multi-scale segmentation-based localization scheme in cooperation with some segmentation algorithm that fulfills the identified criteria.
We propose a novel superpixel segmentation algorithm that encodes both visual content and visually imperceptible clues for content forgery localization. Different from all previous works that only use visual boundary information for image segmentation, the proposed algorithm incorporates the information from visually imperceptible forensic clues to guide the segmentation for forgery localization.
We propose a multi-orientation fusion strategy that aggregates the complementary strengths of image segmentation and multi-oriented detection in localizing various types of image forgeries.
We conduct comprehensive experiments on the effects of various segmentation algorithms on different forgery types, i.e. hard-boundary and soft-boundary forgeries, in terms of both region and boundary
scores.F_{1}
The rest of this article is organized as follows. In Section II, we will revisit the background of PRNU-based forgery localization and some related work. In Section III, the details of the proposed segmentation-based and multi-orientation forgery localization schemes will be given. Section IV presents the experimental results and analysis on realistic forgeries. Finally, Section V concludes the work.
Related Work
In this section, we will first revisit the background of the PRNU-based forgery localization. PRNU noise mainly arises from the inhomogeneity of silicon wafers introduced during imaging sensor fabrication and manifests itself as the pixel-to-pixel variation in light sensitivity. It is present in every image and practically unique to the sensor that captures the image. Therefore, its absence can be used to signify the forgery in an image under investigation, provided that the reference PRNU \begin{equation*} \begin{cases} h_{0}: \omega _{i} = \upsilon _{i}\\ h_{1}: \omega _{i} = z_{i} + \upsilon _{i} \end{cases}\tag{1}\end{equation*}
Due to the weak nature of \begin{equation*} \rho _{i} = \frac {\sum _{j \in \Omega _{i}}(\omega _{j}-\bar {\omega })(z_{j}-\bar {z})}{\sqrt {\sum _{j \in \Omega _{i}}(\omega _{j}-\bar {\omega })^{2}}\sqrt {\sum _{j \in \Omega _{i}}(z_{j}-\bar {z})^{2}}},\tag{2}\end{equation*}
\begin{align*} P_{i}=&\frac {p(\rho _{i}|h_{0})}{p(\rho _{i}|h_{0})+p(\rho _{i}|h_{1})} \\=&\left ({1+\exp \left ({\frac {\rho _{i}^{2}}{2\hat {\sigma }_{0}^{2}}-\frac {\left ({\rho _{i}-\hat {\rho }_{i}}\right)^{2}}{2\hat {\sigma }_{1}^{2}}-\ln \frac {\hat {\sigma }_{1}}{\hat {\sigma }_{0}}}\right)}\right)^{-1}.\qquad \tag{3}\end{align*}
In the above sliding window-based framework, the test statistic
To allow for localizing small-sized forgeries, Korus and Huang [28] proposed a segmentation-based strategy that calculates the test statistic
Proposed Methods
A. Segmentation-Based Forgery Localization Scheme
For PRNU-based forgery localization, a straightforward way to exploit image content would be segmenting the image into a number of superpixels according to the visual content and calculating the forgery probability for each superpixel. However, compared to the methods based on regular detection windows that are able to generate pixel-wise probability maps, this will result in a low-resolution forgery probability map because the pixels belonging to the same superpixel are assigned with the same probability. Consequently, the chance of mis-detection for object removal forgeries will be considerably increased if the size of the superpixel is not appropriately specified. For this reason, we apply image segmentation at different scales and fuse the resultant forgery probability maps to form a single informative probability map. The framework of the proposed segmentation-based multi-scale localization scheme is illustrated in Fig. 1.
The segmentation-based forgery localization scheme. For more details about the Adaptive Window (AW) and Multi-Scale (MS) fusion algorithms, please refer to Section IV-C or [28].
1) Multi-Scale Image Segmentation
Given the variety of superpixel segmentation algorithms, not all of them suffice for our purpose. We identify a few important criteria that a superpixel segmentation algorithm should satisfy for the task of forgery localization:
Boundary adherence. This criterion measures the agreement between the boundaries of the objects and the resultant superpixels. A superpixel algorithm with good boundary adherence effectively avoids segmenting different objects or different parts of an object into the same superpixel, thus reducing the risk of generating heterogeneous superpixels containing both pristine and forged pixels for object insert forgeries.
Controllability over superpixel number. Some algorithms do not provide direct control over the number of generated superpixels. As the segmentation needs to be carried out at different scales, easy control over the number of generated superpixels is preferable.
Balanced segmentation granularity. Some superpixel algorithms generate superpixels of heavily unbalanced size. The over-sized superpixels substantially increase the chance of generating heterogeneous superpixels while the under-sized superpixels reduce the reliability of the detection. Thus, an algorithm capable of generating superpixels of balanced size is desirable.
Simple Linear Iterative Clustering (SLIC) [41]: SLIC algorithm iteratively updates superpixel centers and assigns pixels to their closest centers in the 5-dimensional pixel color and coordinate space until the algorithm converges. Its simplicity, computational efficiency and high-quality overall segmentation results make it one of the most widely used superpixel algorithms in various applications.
Entropy Rate Superpixel segmentation (ERS) [42]: ERS algorithm considers each pixel as a vertex in a graph and formulates the segmentation as an optimization problem of graph topology. It incrementally optimizes an objective function consisting of the entropy rate of random walks on the graph and a balancing term to achieve homogeneous and similar-sized superpixels. ERS exhibits remarkable boundary adherence, which is desirable for localizing object insert forgeries.
Extended Topology Preserving Segmentation (ETPS) [43], [44]: ETPS algorithm initially partitions the image into the desired number of regular superpixels and continuously modifies the boundary pixels in a coarse-to-fine manner by optimizing an objective function that encodes information about colors, positions, boundaries and topologies. The diversity of information encoded in the objective function and the efficient coarse-to-fine optimization make ETPS one of the state-of-the-art superpixel algorithms in terms of both segmentation quality and efficiency.
2) Exploiting Homogeneity of PRNU
As mentioned in Section II, only relying on the visual content for image segmentation might become problematic for object removal forgeries where no distinct visual information is available. Although this problem can be mitigated by the multi-scale strategy, it would be beneficial if the additional information from PRNU can be incorporated to guide the segmentation at each scale. The perceptually invisible PRNU not only provides useful clues when salient visual information is unavailable but also serves as a supplement to the visual information to eliminate the ambiguity in regions containing complex patterns or structures. In what follows, we will describe how the homogeneity of PRNU can be integrated to guide the segmentation.
Let \begin{align*} \mathcal {E}(\boldsymbol {s}, \boldsymbol {\mu }, \boldsymbol {c}, \boldsymbol {P})=&\sum _{i}{\mathcal {E}_{col}(s_{i}, c_{s_{i}})}+\lambda _{pos}\sum _{i}{\mathcal {E}_{pos}(s_{i}, \mu _{s_{i}})} \\&+\,\lambda _{b}\sum _{i}\sum _{j\in \mathcal {N}_{8}(i)}{{\mathcal {E}_{b}(s_{i}, s_{j})}}\!+\!\mathcal {E}_{topo}(\boldsymbol {s})+\mathcal {E}_{size}(\boldsymbol {s}) \\&+\,\lambda _{etp}\sum _{i}{\mathcal {E}_{etp}(s_{i}, P_{s_{i}})},\tag{4}\end{align*}
Appearance Coherence:\begin{equation*} \mathcal {E}_{col}(s_{i}, c_{s_{i}}) = ||\mathcal {I}(i)-c_{s_{i}}||_{2}^{2}.\tag{5}\end{equation*}
Shape Regularity:\begin{equation*} \mathcal {E}_{pos}(s_{i}, \mu _{s_{i}}) = ||\mathcal {L}(i)-\mu _{s_{i}}||_{2}^{2}.\tag{6}\end{equation*}
Boundary Length:\begin{equation*} \mathcal {E}_{b}(s_{i},s_{j})= \begin{cases} 1,& \text {if } s_{i} \neq s_{j} \\ 0,& \text {otherwise} \end{cases}\tag{7}\end{equation*}
Topology Preservation: The term
Minimal Size: The term
PRNU Homogeneity:\begin{equation*} \mathcal {E}_{etp}(s_{i}, P_{s_{i}})=\left ({255\cdot H(P_{s_{i}})}\right)^{2},\tag{8}\end{equation*}
We adopt the coarse-to-fine optimization framework in [43] to minimize the objective function in Eq. (4). As shown in Algorithm 1, the algorithm starts by equally partitioning the image into
Demonstration of macropixel division for the coarse-to-fine optimization. The regions enclosed by red lines are superpixels and the regular grids bounded by blue dotted lines are macropixels.
Algorithm 1 Entropy-Guided ETPS Algorithm
Input:
Output:
Partition
Partition each superpixel into regular macropixels of size
Calculate the initial energy, Eq. (4), for each superpixel;
do
if
set
end if
Initialize a FIFO list with all boundary macropixels;
while list is not empty do
Pop out boundary macropixel
if invalid_connectivity(
continue
end if
if
Incrementally update
two superpixels involved;
Append any boundary macropixel
4-connected neighborhood of
end if
end while
if
Bisect or quadrisect any divisible macropixel;
Update maximal macropixel size
else
break
end if
while
First, all the boundary macropixels, i.e. those with at least one adjacent macropixel belonging to a different superpixel, are put into a FIFO priority list and popped out one by one to check if changing the label of the popped macropixel
Unlike other terms in Eq. (4), the PRNU homogeneity term can only be reliably evaluated when the size of macropixel is sufficiently large due to the weak nature of PRNU. The above coarse-to-fine framework makes it possible to reliably integrate the PRNU homogeneity information at coarse levels. We set
To see how the PRNU homogeneity term affects the segmentation, we show an example of object removal forgery in Fig. 3. We use the ratio of the average forgery probability
B. Multi-Orientation Forgery Localization Scheme
1) Multi-Orientation Forgery Detection
Most existing image forgery detectors apply square detection windows. Such an isotropic detection scheme inherently limits the capability to detect arbitrary-shaped and arbitrary-oriented forgeries. For instance, subtle forgeries such as human body limbs or tree branches might be undetectable with a square detection window. Although this issue can be mitigated by the use of detection windows of smaller size, the reliability will also be compromised at smaller scales. To allow for more accurate forgery localization, the various shapes and orientations of the forged regions need to be taken into consideration in the design of the localization framework. Inspired by the great success of faster R-CNN [47] in detecting objects based on anchor boxes, which are a set of predefined bounding boxes of certain scales and aspect ratios, we extend the multi-scale forgery localization scheme in [28] by adopting detection windows of various aspect ratios and orientations at each scale.
Based on the multi-scale framework in [28], we replace the square detection window at each scale with detection windows of multiple aspect ratios and orientations, as illustrated in Fig. 4. To reduce the computation, we only use 5 scales, i.e.
2) Multi-Orientation Fusion
Having obtained 11 candidate forgery probability maps at each scale (corresponding to each of the 11 multi-oriented detection windows), we need a fusion scheme to form a single informative probability map. At first thought, a simple pixel-wise maximum fusion rule, i.e. selecting the largest value of the candidate probabilities for each pixel, will suffice for the task since we aim to detect any possible forgery, but this will also introduce substantial false positives, i.e. mis-labeling pristine pixels as forged. Ideally, the best detection result can be obtained if the detection window is perfectly aligned with the forged region. Thus, it is reasonable to accept the forgery probability calculated with the detection window that agrees best with the segmentation result. Suppose \begin{equation*} P_{i} = rP_{b^{\star }}+(1-r)P_{s_{i}},\tag{9}\end{equation*}
\begin{equation*} b^{\star }= \mathop {\mathrm {arg\,max}} _{b\in \{1,\ldots,11\}}{W_{b}\cap s_{i}}.\tag{10}\end{equation*}
\begin{equation*} K= \lfloor \frac {N}{\xi w^{2}}\rceil,\tag{11}\end{equation*}
Scenario I:
Two forged regions with different color appearances are segmented into two different superpixels
ands_{i} , which often occurs inside a forged region. Suppose the forgery probability obtained with the detection windows_{j} isW_{b^{\star }} and the forgery probabilities forP_{{b^{\star }}} ands_{i} ares_{j} andP_{s_{i}} , respectively. The pixels considered in this scenario are all forged pixels, so we can simply assume thatP_{s_{j}} since the correlation predictor has been designed to account for different color appearances. Therefore, the final forgery probabilityP_{s_{i}} \approx P_{s_{j}} \approx P_{{b^{\star }}} , which means that the fusion is equivalent to selecting the probability corresponding to the detection window that agrees best with the segmentation results.P_{i}\approx P_{{b^{\star }}} Scenario II:
Two neighboring regions, with one forged and the other pristine, have different color appearances and are segmented into two superpixels
ands_{i} . This case usually occurs for object insert forgeries, where the inserted object usually has a different color appearance from the background. In such case, the pixels within the detection window falling outside the forged region lead to an attenuateds_{j} . The above fusion strategy compensates the attenuatedP_{{b^{\star }}} by adding a termP_{{b^{\star }}} . As(1-r)P_{s_{i}} is calculated over forged pixels and is expected to beP_{s_{i}} , it results in a>P_{{b^{\star }}} compensating for the attenuation caused by the pixels falling outside the forged region.P_{i}>P_{{b^{\star }}} Scenario III:
In this scenario, the forged and pristine regions have the same or similar color appearance, which often occurs in the case of object removal forgery. Due to the lack of distinguishable color appearance, some parts of the two regions are quite likely to be segmented into the same superpixel. For instance, the segmentation may end up with two superpixels
ands_{i} separated by the red line. If we assume thats_{j} and windows_{i} are equally possible to contain heterogeneous pixels, this scenario is similar to the detection merely based on regular detection windows. In practice, because most superpixel algorithms will try to utilize as much local information as possible to perform the segmentation, the above fusion is expected to deliver comparable or even better performance than the methods merely based on regular detection windows.W_{b^{\star }}
Three simplified scenarios for multi-orientation forgery probability fusion. The pixel of interest is highlighted in yellow and different color appearances are highlighted in different colors.
The multi-orientation forgery localization scheme. For more details about the Adaptive Window (AW) and Multi-Scale (MS) fusion algorithms, please refer to Section IV-C or [28].
Experiments
A. Datasets
Our experiments were conducted on the realistic image tampering dataset (RTD) [28], [48], which contains 220 realistic forgeries captured by 4 different cameras: Canon 60D, Nikon D90, Nikon D7000, Sony
Hard-boundary Forgeries (HBF): This subset contains the forged images with visually distinguishable boundaries between the pristine and the forged regions. It mainly consists of the forgeries created by object insert, object replacement, color altering, etc.
Soft-boundary Forgeries (SBF): This subset contains the forged images with visually smooth and indistinguishable boundaries between the pristine and the forged regions. It mainly consists of the forgeries created by background texture synthesis or content-aware filling.
Mixed-boundary Forgeries (MBF): This subset contains the images with both hard and soft boundaries between the pristine and the forged regions.
Examples of forged images in hard-boundary forgeries (HBF), soft-boundary forgeries (SBF) and mixed-boundary forgeries (MBF).
B. Evaluation Protocols
The localization performance is commonly evaluated by the region \begin{equation*} F_{r}(G_{i},L_{i}(\tau)) = \frac {2\cdot \mathcal {P}_{r}(G_{i},L_{i}(\tau)) \cdot \mathcal {R}_{r}(G_{i},L_{i}(\tau))}{\mathcal {P}_{r}(G_{i},L_{i}(\tau)) + \mathcal {R}_{r}(G_{i},L_{i}(\tau))},\tag{12}\end{equation*}
\begin{align*} \begin{cases} \bar {F}_{r}(\tau)=\dfrac {1}{\mathcal {S}}\displaystyle \sum \nolimits _{i=1}^{\mathcal {S}} F_{r}(G_{i},L_{i}(\tau))\\[5pt] \hat {F}_{r}=\dfrac {1}{\mathcal {S}}\displaystyle \sum \nolimits _{i=1}^{\mathcal {S}} \underset {\tau \in (0,1)}{\max }F_{r}(G_{i},L_{i}(\tau)). \end{cases}\tag{13}\end{align*}
\begin{equation*} F_{b}\left ({B_{i}, D_{i}(\tau)}\right)=\frac {2\cdot \mathcal {P}_{b}\left ({B_{i}, D_{i}(\tau)}\right) \cdot \mathcal {R}_{b}\left ({B_{i}, D_{i}(\tau)}\right)}{\mathcal {P}_{b}\left ({B_{i}, D_{i}(\tau)}\right) + \mathcal {R}_{b}\left ({B_{i}, D_{i}(\tau)}\right)},\tag{14}\end{equation*}
\begin{align*} \begin{cases} \mathcal {P}_{b}(B_{i}, D_{i}(\tau))=\dfrac {1}{|D_{i}(\tau)|}\displaystyle \sum \nolimits _{z\in D_{i}(\tau)}[\![d(z, B_{i}) < \theta]\!] \\[5pt] \mathcal {R}_{b}(B_{i}, D_{i}(\tau))=\dfrac {1}{|B_{i}|}\displaystyle \sum \nolimits _{z\in B_{i}}[\![d(z, D_{i}(\tau)) < \theta]\!]. \end{cases}\tag{15}\end{align*}
\begin{align*} \begin{cases} \bar {F}_{b}(\tau)=\dfrac {1}{\mathcal {S}}\sum _{i=1}^{\mathcal {S}} F_{b}(B_{i}, D_{i}(\tau))\\[5pt] \hat {F}_{b}=\dfrac {1}{\mathcal {S}}\sum _{i=1}^{\mathcal {S}} \underset {\tau \in (0,1)}{\max }F_{b}(B_{i}, D_{i}(\tau)). \end{cases}\tag{16}\end{align*}
C. Evaluated Algorithms and Parameter Settings
We considered the following single-orientation forgery localization algorithms for the comparison with our proposed schemes:
Simple Thresholding (ST) based algorithm [26]: For ST, we first generated a binarized forgery map by comparing the forgery probability map obtained by a detection window of
px with a threshold128{\times }128 . Then we removed the connected forged regions with pixels fewer than\tau \in [{0,1}] px and applied image dilation with a disk kernel with a radius of 16 px to generate the final decision map. Note that ST algorithm is a single-scale detector with no image content involved in the final decision-making process.64{\times }64 Single-Orientation Multi-Scale (SO+MS) fusion based algorithm [28]: SO+MS formulates the forgery localization as an image labeling problem and solves it by the conditional random field (CRF) model. The data term of the CRF model is the average of the threshold-drifted [16], [28] forgery probability maps obtained with single-oriented detection windows at 7 different scales, and the neighborhood interaction of image content is encoded in the regularization term. The final binary decision map is obtained by optimizing the CRF model.
Single-Orientation Adaptive Window (SO+AW) fusion based algorithm [28]: SO+AW aims to fuse the forgery probability maps obtained with single-oriented detection windows of 7 scales. Starting from the smallest scale, it looks for a sufficiently confident decision (i.e. forgery probability that is far from 0.5) for each location in the image. If the decision at a smaller scale is not confident enough, it proceeds to the next larger scale until a sufficiently confident decision and an agreement between two consecutive scales are reached. Finally, the final binary decision map is obtained by applying the threshold drift strategy and optimizing the CRF model.
Segmentation-Guided (SG) based algorithm [28]: SG algorithm calculates the forgery probability by only considering the pixels with an intensity value close to the central pixel (the average
distance in RGB space less than 15) within a detection window ofL_{1} px. It implements the idea of image segmentation but only exploits the intensity difference between individual pixels. Similarly to SO+AW and SO+MS, the threshold drift strategy and CRF are applied to obtain the final binary decision map.128\times 128
Note that the notations ‘SO+MS’, ‘SO+AW’, ‘SG’ used in this article correspond to the ‘MSF’, ‘AW+’ and ‘SG+’ algorithms in [28]. We use the notations ‘AW’ and ‘MS’ to denote the adaptive window and multi-scale strategies proposed in [28] for multi-scale fusion. For our proposed segmentation-based schemes, we will use the notation ‘SEG+FUSION’ to represent the FUSION
For SO+MS, SO+AW and SG, we used exactly the same parameters as summarized in Table 2 of [28]. For the segmentation algorithms used in this work, their parameters are given as follows:
SLIC [41]: We set both the compactness parameter and the iteration number of pixel assignment and centroid updating to 10.
ERS [42]: As suggested in [42], we set the weighting factor of the balancing term
and the Gaussian kernel bandwidth\lambda ^\prime {=}0.5 for calculating the pixel similarities.\sigma {=}5.0 ETPS [43]: We set the weighting factors of both the shape regularity term and the boundary length term to 0.2, i.e.
and\lambda _{pos}=0.2 .\lambda _{b}=0.2 EG-ETPS: We used exactly the same parameters as ETPS except for the weighting factor
of the PRNU homogeneity term. As can be expected, the setting of\lambda _{etp} will depend on the quality of PRNU. Thus we empirically set\lambda _{etp} .\lambda _{etp}=2.5\cdot \exp (-(R^{2}-1)^{2}/0.3) is the adjusted R-squared coefficient for the correlation predictor trained at the scale ofR^{2} px, which is a good indicator of the quality of PRNU. This results in a64\times 64 for the four cameras in the RTD dataset.\lambda _{etp}\in [{1.8,2.2}]
1) Results for Segmentation-Based Localization Schemes
In this experiment, we applied the adaptive window and multi-scale fusion strategies directly on the probability maps obtained by calculating the forgery probability on each generated superpixel. To make the average superpixel size consistent with the detection window size at each scale, we specified the desired superpixel number
we show the average score curves
Segmentation-based vs. single-orientation forgery localization schemes in terms of average score curves. The
Segmentation-based vs. single-orientation forgery localization schemes in terms of average peak scores. The
As for the comparison between different segmentation algorithms, we can observe that EG-ETPS benefits from the integration of PRNU homogeneity information and slightly outperforms other segmentation algorithms. It is worth mentioning that the benefits of EG-ETPS are mainly reflected in detecting the boundaries of the forgeries, so its overall performance gain over other segmentation algorithms depends on the amount of soft boundaries and is more evident when the performance is measured in terms of boundary measurement
2) Results for Multi-Orientation Localization Schemes
In this experiment, we aim to compare the performance of our proposed multi-orientation localization schemes and the methods based on single-oriented detection windows. The comparison results are shown in Fig. 10 and Fig. 11. Similarly, for easy comparison, we also summarized the highest average measures
Multi-orientation vs. single-orientation forgery localization schemes. The
Multi-orientation vs. single-orientation forgery localization schemes in terms of average peak scores.
We can see that the MS fusion strategy achieves considerably better performance than the AW fusion for our proposed multi-orientation schemes. A closer inspection revealed that AW is more likely to introduce false positives especially in the regions where PRNU is substantially attenuated. In addition, compared to the single-orientation schemes, significant performance improvement can be observed for the multi-orientation localization schemes when the MS fusion strategy is applied, with the highest
Another important observation for the proposed multi-orientation schemes is that the performance gap between the four segmentation algorithms becomes very small. For the segmentation-based localization schemes, the difference between the segmentation algorithms are much more noticeable when localizing the forgeries in the SBF+MBF subset (see the fourth row of Fig. 8). However, for the multi-orientation localization schemes, the capability of localizing soft-boundary forgeries mainly stems from the detection based on multi-oriented detection windows rather than the segmentation algorithms, which narrows the performance gaps between different segmentation algorithms. Some examples of forgery localization can be found in Fig. 12, where we only show the results of EG-ETPS and MS for the proposed multi-orientation forgery localization schemes. Note that for MS, we show the average of the probability maps across different scales to approximate the fused probability map in Fig. 12.
Example forgery localization results. Color coding: green: detected forged regions (true positives); red: detected pristine regions (false positives); blue: mis-detected forged regions (false negatives). More forgery localization results can be found in supplementary materials.
D. Robustness Against Jpeg Compression
One of the main threats to PRNU-based forgery localization is JPEG compression. Thus, another experiment was conducted to evaluate the robustness against JPEG compression. We generated 6 new versions of each image by re-saving the corresponding TIFF images with JPEG quality factors of 100, 95, 90, 85, 80 and 75. We then ran all the localization algorithms on the image set of each version and calculated the performance statistics. To calculate the corresponding forgery probabilities, we used the reference PRNU and correlation predictors trained with TIFF images for JPEG images of different quality levels.
The results on the entire RTD dataset in terms of
Impact of JPEG compression on localization performance. ‘
Conclusion
In this work, we investigated the potential of explicit image segmentation for content forgery localization. We have shown that image segmentation by exploiting the visual content is beneficial for improving the performance of forgery localization based on imperceptible forensic clues, especially for hard-boundary forgeries. While the effectiveness of segmentation merely based on visual content can be compromised for soft-boundary forgeries, such limitation can be mitigated by further integrating the local homogeneity of imperceptible forensic clues to guide the segmentation. To better resolve the issue of detecting soft-boundary forgeries, we also proposed a localization scheme based on the multi-orientation fusion of the forgery probability maps obtained by multi-orientation detection and image segmentation. With the aid of the multi-scale fusion, the multi-orientation detection is effective in detecting soft-boundary forgeries and the segmentation is particularly good at identifying hard-boundary forgeries. Integrating them in a complementary way leads to the superior localization performance for our proposed multi-orientation schemes at the expense of extra computation complexity. Although we used PRNU-based forgery localization as an example in this article, we believe that similar ideas can also apply to forgery detectors based on other forensic clues. Further investigations on the potential of image segmentation in other forensics detectors as well as the combination of them will be conducted in our future work.
Appendix: Incremental Update of Forgery Probability
Appendix: Incremental Update of Forgery Probability
We calculate the forgery probability \begin{equation*} P_{s_{i}}=\left ({1\!+\!\exp \left ({\frac {\rho _{s_{i}}^{2}}{2\hat {\sigma }_{0}^{2}}-\frac {(\rho _{s_{i}}-\hat {\rho }_{s_{i}})^{2}}{2\hat {\sigma }_{1}^{2}}-\ln \frac {\hat {\sigma }_{1}}{\hat {\sigma }_{0}}}\right)}\right)^{-1},\tag{17}\end{equation*}
\begin{equation*} \rho _{s_{i}}=\frac {||\boldsymbol {r}_{s_{i}}\cdot \boldsymbol {n}_{s_{i}}||_{1}}{||\boldsymbol {r}_{s_{i}}||_{2} \cdot ||\boldsymbol {n}_{s_{i}}||_{2}}\tag{18}\end{equation*}
\begin{equation*} \hat {\rho }_{s_{i}}=\boldsymbol {f}_{s_{i}}\hat {\boldsymbol {\theta }}_{s_{i}},\tag{19}\end{equation*}
\begin{equation*} \boldsymbol {f}_{s_{i}}^{(t)} = \boldsymbol {f}_{s_{i}}^{(t-1)} - \frac {N_{j}}{N_{s_{i}}}\left ({\boldsymbol {f}_{s_{i}}^{(t-1)}\pm \boldsymbol {f}_{j} }\right), \tag{20}\end{equation*}