Introduction
Face Recognition Systems (FRS) have provided ubiquitous ways of verifying an identity claim in many applications. FRS have been used in everyday applications from low-security applications such as smartphone unlocking to high-security applications such as identity verification in border control processes. Each of the applications mandate a chosen way of enrolment to FRS where either a supervised enrolment is carried out (for instance in on-boarding at bank premises) or unsupervised enrolment is requested (on-boarding for banking applications from home). While it provides a high degree of flexibility and convenience to users to initiate an enrolment process in an unsupervised manner, this potentially leads to a security risk: Without supervision, a data subject enrolling into the FRS can submit a face image which is manipulated, a printed face image, an image displayed from an electronic screen (e.g., iPad) or a silicone latex face mask [2]. In order to mitigate such attacks at the enrolment level, it is therefore essential to have a robust attack detection mechanism. While a number of works in recent years have been proposed on both conducting such attacks and detecting the attacks in a robust manner for printed attacks, display attacks and mask attacks, in this work we focus on a new kind of attack referred popularly as Morphing Attack.
Face morphing is the process of combining two or more face images to generate a single face image that can resemble visually to all the contributing face images to a greater degree [3]. A good quality morphed face image is also effective in verifying against all contributing subjects by obtaining a comparison score that exceeds the pre-determined threshold (i.e., passes through FRS) [3], [4], [5], [6]. While morphing can be conducted using multiple face images of different subjects, the effectiveness of morphed images is reported when the face images of similar ethnicity, gender and age group are considered [6], [7], [8]. This is primarily due to the fact that a morphed image should not only defeat the FRS but should also provide high visual similarity, in order to convince a human expert in a visual comparison process.
Face morphing attacks threaten FRS due to the current practices in the ID-document application process, where the biometric enrolment is carried out in an unsupervised manner in many countries. Countries like the U.K. and New Zealand allow citizens to upload a digital face image for various applications such as passport renewal [9] and visa application [10]. The capture process for such images is unsupervised. In a similar manner, many Asian countries and European countries (e.g., in The Netherlands [11]) request the applicant to submit a scanned face image for passport/visa/identity-card applications. Given that the images are captured and submitted in an unsupervised setting, the applicant has vast opportunities to upload a morphed image with malicious intent underlining the need for robust Morphing Attack Detection (MAD) mechanisms.
A. Related Works on Face Morph Generation
While morphing attacks have been studied in recent years, most of the attacks are conducted using the morphed images created using facial landmarks-based approaches needing high a degree of supervision to first determine the facial landmarks, thereupon align them and then finally blend them to generate morphs. The common set of procedures for warping/blending includes Free Form Deformation (FFD) [12], [13], Deformation by moving least squares [14], deformation based on mass spring [15], Bayesian framework based morphing [16] and Delaunay triangulation based morphing [17], [18], [19], [20], [21]. Due to inadvertent artefacts caused by pixel/region-based morphing, the images need additional work in refining the signal to create highly realistic morph images. A set of post processing steps are usually included as illustrated in number of works [20], [22], [23]. Generally, some set of post processing techniques such as image smoothing, image sharpening, edge correction, histogram equalization, manual retouching, image enhancement to improve the brightness and contrast are used to eliminate the artefacts generated during the morphing process. In a parallel direction, morphed face images can also be generated using landmarks-based methods available in open-source resources like GIMP/GAP and OpenCV. Morphs generated using GIMP/GAP technique are more efficient with respect to a good quality of the resulting image (i.e., less noticeable artefacts) as pixels are aligned manually. Despite the minimal amount of effort needed for creating morphs using such approaches, a significant amount of effort needs to be dedicated to correcting artefacts. Additionally, commercial solutions like Face Fusion [24] and FantaMorph [25] can also generate good quality morphed images with limited manual intervention. Although some steps can be excluded in creating the morphs, it is very critical to meet the face image quality standards laid out by the International Civil Aviation Organization (ICAO) [26], [27] for electronic Machine Readable Travel Document (eMRTD) and deployment of biometric identification applications.
B. GAN Based Face Morph Generation
In an attempt to overcome the cumbersome efforts of manually creating (semi-automated) morphed images, a fully automated approach using a Generative Adversarial Network (GAN) was proposed by Damer et al. [28]. Unlike the supervision required in the mark-up of landmarks and aligning the face images in a (partially) manual process, GAN-based techniques synthesise morphed images directly by merging two facial images in the latent space. In the work by Damer et al. [28], the proposed MorGAN architecture for morph generation basically employed a generator constituting encoders, decoders and a discriminator. The generator was trained to generate images with the dimension
C. Limitations of GAN Based Face Morph Generation and Our Contributions
While our earlier work [1] indicated that better GAN architectures could result in superior quality morphs and could attack an FRS in general, we also acknowledge the limited threats that exist for Commercial-Off-The-Shelf (COTS) FRS, as merely a subset of morphed images was accepted. Only approximately 50% of the generated morph images were verified successfully against probe images from a contributing subject. Thus the empirical evaluation in our earlier work has shown that the attack was yet not very effective [1] for a COTS FRS [30] and an open-source FRS based on ArcFace [31]. We must state that up to now FRS are not very vulnerable to GAN-based morphing attacks unlike to landmarks-based morphing attacks. With a clear introspection into this aspect, we notice that the resulting morphed images from our earlier work [1] does not retain a high degree of facial similarity to both contributing subjects. With lower similarity to contributing subjects in terms of facial structures, the FRS do not attribute a high comparison score, as anticipated. In other words, the missing enforcement of identity information of contributing subjects will lead to a high visual quality facial image but with lower face similarity to contributing face characteristics.
In an effort to make the attacks stronger such that both subjects can be verified with a good success rate, in this work, we extend our previous architecture to generate morphs by including the identity priors before the generation of morphed faces. We now refer to this approach as MIPGAN (Morphing through Identity Prior driven GAN). We propose two variants of our approach named as MIPGAN-I and MIPGAN-II based on the employed GAN being StyleGAN or StyleGAN2 respectively [29], [32]. With the inclusion of a new loss function in our proposed architecture, we increase the attack success rate against commercial-off-the-shelf (COTS) FRS and deep learning based FRS. Figure 1 shows the example of morphed face images generated using proposed MIPGAN along with outputs of both the variants. To further achieve superior quality face morphs, we also customize the newly designed loss function to account for ghosting and blurring artefacts in an end-to-end manner with no human or manual intervention eliminating the need for a high degree of interaction. As noted in Figure 2, the results from MIPGAN-I and MIPGAN-II is more coherent in retaining structural similarity as compared to our earlier architecture [1]. With the updated architecture to generate high-quality morphs which preserve both identity information and structural correspondence, we evaluate the applicability in creating stronger attacks by creating a large-scale dataset of morphed images by employing the face images derived from the FRGC-V2 face database [33]. The created dataset of 1270 bona fide images and 2500 morphed images is first evaluated to measure the attack success rate by verifying the morphed images against the contributing subjects using a commercial FRS from Cognitec [30]. In addition to measuring the attack success rate for digital images, we also extend our work by printing and scanning (re-digitizing) the dataset. We check the consistency of the attack success rate, unlike our earlier work which was limited to an investigation on digital images alone [1]. We also include the experiments on assessing the impact of compression (down to 15kb following ICAO guidelines) of printed and scanned face images that simulate the real-life e-passport application scenario. The key motivation to extend our work in this direction is, to mimic the passport application process that is operated in many European countries and Asian countries, which all accept printed-and-scanned facial images in the application process for an identity document (e.g., passports).
With the extensive experimental results indicating a highly satisfactory attack success rate, we also evaluate a set of MAD algorithms to benchmark the detection capabilities. To this extent, we evaluate two state-of-the-art MAD approaches on digital morphed images, re-digitized and compressed morphed images after re-digitizing. Thus, we comprehensively cover the potential morphing attacks in the digital domain and the re-digitized domain. While we note the earlier works [1] arguing that attacks in the digital domain can be detected by studying the cues such as residual noise in morphing [34], patterns of noise from morphed images, histogram features of textures or the deep features [4], we also investigate the MAD capabilities for re-digitized images which do not exhibit the similar features (residual noise) as the print-scan process eliminates the digital cues and presents another set of variations. Specifically, given the nature of the dataset in which we have only a single suspected morphed image, for which we must determine either the morph or the bona fide class, we resort to Single Image based MAD (S-MAD) approaches using two recent but robust approaches using hybrid and ensemble features [34], [35], [36], [37].
We therefore present a summary of contributions of this work as listed below:
We present a novel approach of generating morphed face images through GAN architecture with enforced identity priors and a customized novel loss function to generate highly realistic images which we refer as MIPGAN (Morphing through Identity Prior driven GAN). We present two variants of the proposed approach for generating attacks with a high success rate.
The proposed approach (both variants) is benchmarked to measure the attack success rate by verifying COTS and deep learning based FRS through studying the vulnerability using a newly generated dataset from our proposed architecture which is referred as MIPGAN Face Morph Dataset.
Human observer analysis for detecting morphs generated by the proposed and existing morphing attack methods is presented.
Analysis of the perceptual quality metrics to illustrate the visual quality of the generated morph images is presented.
Extensive experiments on three different data types such as (a) digital morphed images (b) print-scan morphed image (c) print-scan morphed images with compression are presented to cover the full spectrum of passport application process under morphing attacks.
The generated images are also benchmarked against the existing MAD approaches both in digital form and the re-digitized form to provide the insights on detection challenges of SOTA approaches. We also present a generalizability study on MAD schemes by training one kind of morph generation and testing on a different kind of morph generation approach to indicate directions to future works.
In the rest of the paper, Section II describes the new architecture along with the newly designed loss function to generate high-quality morphs. Section III provides the details on the quantitative experiments indicating the vulnerability of FRS and the detection challenge. With the set of remarks and future works in this direction, we draw the conclusion in Section V.
Proposed Morphed Face Generation
Figure 3 presents the block diagram of the proposed morphed face image generation using MIPGAN. The proposed method is based on the end-to-end optimization using a new loss function that can preserve the identity of the generated morphed face image through enforced identity priors. The proposed MIPGAN framework is designed independently on two different GAN models based on StyleGAN [29] and StyleGAN2 [32] model. We refer to the proposed scheme with StyleGAN as MIPGAN-I and with StyleGAN2 as MIPGAN-II respectively. Given the face images from the accomplice (\begin{equation*} L^{\prime }_{M} = \frac {w_{1}*L_{1}^{\prime } + w_{2}*L_{2}^{\prime }}{2}, \tag{1}\end{equation*}
Block diagram of the proposed MIPGAN for generating high quality morphed face images.
A. Proposed Loss Function
The proposed loss function is based on both perceptual fidelity, quality and identity factors that can facilitate high-quality face morph generation. The common issue with the GAN-based morph generation is the presence of ghost artefacts and blurring issues. We employ the perceptual loss with multiple layers to eliminate such effects as given by Eqn. (2).\begin{align*} Loss_{Perceptual}=&\frac {1}{2}\sum _{i} \frac {1}{N_{i}}\left \|{F_{i}\left ({I_{1}}\right)-F_{i}\left ({I^{\prime }_{M}}\right)}\right \|^{2}_{2} \\&+\,\,\frac {1}{2}\sum _{i} \frac {1}{N_{i}}\left \|{F_{i}\left ({I_{2}}\right)-F_{i}\left ({I^{\prime }_{M}}\right)}\right \|^{2}_{2}, \tag{2}\end{align*}
The main goal of this paper is to generate the morphed face images that can significantly attack FRS. In order to achieve this, we have introduced the identity loss function based on the feedback from FRS. We employ Arcface [31] - a deep learning based FRS because of its robust and accurate performance to obtain feedback on generated morphed face images. Specifically, we employ a pre-trained embedding extractor with \begin{equation*} Loss_{Identity}=\frac {\left ({1-\frac {\vec {v}_{1} \cdot \vec {v}_{M}}{\Vert \vec {v}_{1}\Vert \Vert \vec {v}_{M}\Vert }}\right)+\left ({1-\frac {\vec {v}_{2} \cdot \vec {v}_{M}}{\Vert \vec {v}_{2}\Vert \Vert \vec {v}_{M}\Vert }}\right)}{2}, \tag{3}\end{equation*}
To further prove the loss function is differential for the morphed embedding vector \begin{align*}&Loss_{Identity}=\frac {\left ({1-\frac {\sum _{d} x_{d} z_{d}}{\Vert \vec {v}_{1}\Vert \Vert \vec {v}_{M}\Vert }}\right)+\left ({1-\frac {\sum _{d} y_{d} z_{d}}{\Vert \vec {v}_{2}\Vert \Vert \vec {v}_{M}\Vert }}\right)}{2},\tag{4}\\&\frac {\partial Loss_{Identity}}{\partial z_{d}} = 1-\frac {x_{d}}{2\Vert \vec {v}_{1}\Vert }\frac {\partial }{\partial z_{d}}\left ({\frac {z_{d}}{\sqrt {z_{d}^{2}+\sum _{d'\neq d}z_{d'}^{2}}}}\right) \\&\qquad -\,\,\frac {y_{d}}{2\Vert \vec {v}_{2}\Vert }\frac {\partial }{\partial z_{d}}\left ({\frac {z_{d}}{\sqrt {z_{d}^{2}+\sum _{d'\neq d}z_{d'}^{2}}}}\right),\tag{5}\\&\frac {\partial }{\partial z_{d}}\left ({\frac {z_{d}}{\sqrt {z_{d}^{2}+\sum _{d'\neq d}z_{d'}^{2}}}}\right) = \frac {1}{\sqrt {z_{d}^{2}+\sum _{d'\neq d}z_{d'}^{2}}} \\&\qquad +\,\,\frac {2z_{d}^{2}}{-2\left ({z_{d}^{2}+\sum _{d'\neq d}z_{d'}^{2}}\right)^{\frac {3}{2}}}=\frac {\sum _{d'\neq d}z_{d'}^{2}}{\left ({z_{d}^{2}+\sum _{d'\neq d}z_{d'}^{2}}\right)^{\frac {3}{2}}} \\&\frac {\partial Loss_{Identity}}{\partial z_{d}}=1-\frac {\left ({\frac {x_{d}}{2\Vert \vec {v}_{1}\Vert }+\frac {y_{d}}{2\Vert \vec {v}_{2}\Vert }}\right) \sum _{d'\neq d}z_{d'}^{2}}{\left ({z_{d}^{2}+\sum _{d'\neq d}z_{d'}^{2}}\right)^{\frac {3}{2}}}.\tag{6}\end{align*}
\begin{align*}&\lim _{\Delta z_{d} \to 0}\frac {\partial Loss_{Identity}\left ({z'_{d}+\Delta z_{d}}\right)}{\partial z_{d}}\\&\quad =\lim _{\Delta z_{d} \to 0}\left ({1-\frac {\left ({\frac {x_{d}}{2\Vert \vec {v}_{1}\Vert }+\frac {y_{d}}{2\Vert \vec {v}_{2}\Vert }}\right) \sum _{d'\neq d}z_{d'}^{2}}{\left ({\left ({z'_{d}+\Delta z_{d}}\right)^{2}+\sum _{d'\neq d}z_{d'}^{2}}\right)^{\frac {3}{2}}}}\right)\\&\quad =1-\frac {\left ({\frac {x_{d}}{2\Vert \vec {v}_{1}\Vert }+\frac {y_{d}}{2\Vert \vec {v}_{2}\Vert }}\right) \sum _{d'\neq d}z_{d'}^{2}}{\left ({z^{'2}_{d}+\sum _{d'\neq d}z_{d'}^{2}}\right)^{\frac {3}{2}}}\\&\quad =\frac {\partial Loss_{Identity}\left ({z'_{d}}\right)}{\partial z_{d}}.\end{align*}
It is interesting to note that the identity loss based on the Arcface feature extractor model is trained to maximize the face class separability and thus is more sensitive to face attributes. Hence, only optimising the identity loss cannot achieve the same reconstruction performance as the perceptual loss but applying it on the face region can effectively control the generated attributes to be recognized as both subjects.
To solve the imbalance between different subjects, we introduce an identity difference loss as given by Eqn. (7).\begin{equation*} Loss_{ID-Diff}=\left |{\left ({1-\frac {\vec {v}_{1}\cdot \vec {v}_{M}}{\Vert \vec {v}_{1}\Vert \Vert \vec {v}_{M}\Vert }}\right) -\left ({1-\frac {\vec {v}_{2}\cdot \vec {v}_{M}}{\Vert \vec {v}_{2}\Vert \Vert \vec {v}_{M}\Vert }}\right)}\right |.\qquad \tag{7}\end{equation*}
Finally, in order to improve the structural visibility of the generated morphed face image, we also apply the Multi-Scale Structural Similarity (MS-SSIM) loss \begin{align*} l(x,y)=&\frac {\left ({2\mu _{x}2\mu _{y}+(K_{1}L)^{2}}\right)}{\mu _{x}^{2}+\mu _{y}^{2}+\left ({K_{1}L}\right)^{2}}, \\ c(x,y)=&\frac {\left ({2\sigma _{x}2\sigma _{y}+(K_{2}L)^{2}}\right)}{\sigma _{x}^{2}+\sigma _{y}^{2}+ \left ({K_{2}L}\right)^{2}}, \\ s(x,y)=&\frac {\left ({\sigma _{xy}+\frac {\left ({K_{2}L}\right)^{2}}{2}}\right)} {\sigma _{x}\sigma _{y}+\frac {\left ({K_{2}L}\right)^{2}}{2}}, \tag{8}\end{align*}
\begin{align*} MSSSIM(x,y)=&\left [{l_{J}(x,y)}\right]^{\alpha _{J}} \cdot \prod _{j=1}^{J} \left [{c_{j}(x,y)}\right]^{\beta _{j}}\left [{s_{j}(x,y)}\right]^{\gamma _{j}}, \\ L_{MS-SSIM}=&\frac {1}{2}\left ({1-MSSSIM\left ({I_{1},I'_{M}}\right)}\right) \\&+\,\,\frac {1}{2}\left ({1-MSSSIM\left ({I_{2},I'_{M}}\right)}\right), \tag{9}\end{align*}
Thus, the proposed loss function can be formulated as:\begin{align*} Loss=&\lambda _{1} Loss_{Perceptual}+\lambda _{2} Loss_{Identity} \\&+\,\,\lambda _{3} Loss_{MS-SSIM} + \lambda _{4} Loss_{ID-Diff}, \tag{10}\end{align*}
B. Training and Optimization
The training and optimization of the proposed method are carried out on Tensorflow version 1.13 and version 1.14 for StyleGAN and StyleGAN2, respectively. The optimization is carried out using NVIDIA GTX 1070 8 GB GPU with CUDA version 10.0 and CUDNN version 7.5 and NVIDIA Tesla P100 PCIE 16 GB GPU. The Adam optimizer with hyper-parameters
Figure 4 illustrates the qualitative results of the proposed MIPGAN framework based on StyleGAN and StyleGAN2. Further, the qualitative results of the existing methods based on StyleGAN [1] and MorGAN [28] are provided alongside for the convenience of the reader in the same figure. It is interesting to note that the proposed MIPGAN generated face morph images indicate both perceptual and geometric features correspondence to both contributing subjects (for instance, malicious actor and accomplice).
Experiments and Results
This section presents and discusses the experimental protocols, datasets, and quantitative results of the proposed face morphing technique. The images generated from the proposed MIPGAN-I and MIPGAN-II architectures are compared with the state-of-the-art techniques based on both facial landmarks [7] and StyleGAN based morph generation [1]. The effectiveness of the face morphing generation is quantitatively evaluated by benchmarking the vulnerability of the COTS FRS and deep learning based FRS for generated morphed face images. Further, we also evaluate the morphing attack detection potential by evaluating the generated morphed face images using the most recent and robust MAD techniques.
A. MIPGAN Face Morph Dataset
We employ the face images from FRGC-V2 face database [33] to generate the MIPGAN Face Morph Dataset consisting of morphed face images using both state-of-the-art and the proposed MIPGAN technique. We have selected 140 unique data subjects from the FRGC dataset by considering the high-quality face images captured in constrained conditions that resemble the passport image quality. Among 140 data subjects, 47 data subjects are female and 93 data subjects are male. Each data subject has a variable size of 7–21 additional captured samples, resulting for the whole dataset to have 1270 samples corresponding to 140 data subjects. We employ three different face morph generation techniques based on facial landmarks constrained by Delaunay triangulation with blending [7] we term this as Landmarks-I, landmarks-based techniques with automatic post processing and color equalisation [44], we term this as Landmarks-II and StyleGAN [1]. We do not consider MorGAN [28], [47] based face morph generation as it was earlier demonstrated that MorGAN does not generate ICAO compliant images and thus makes COTS FRS not vulnerable [1]. All the samples are pre-processed to meet the ICAO standards [27] and morphing is carried out by following the guidelines outlined earlier [7], [8], i.e., careful selection of subjects based on gender and similarity score using a FRS, in order to have realistic attacks.
To effectively evaluate the proposed method’s quantitative performance and the existing techniques, we create three different types of attacks from morphed images, such as Digital morphed images: Morphed face images that are obtained from the morph generation process in the digital domain. Print-scanned morphed images: The digital morphed and bona fide images are printed and then scanned (or re-digitized) to simulate the passport application process. We have employed a DNP-DS820 [48] dye-sublimation photo printer to generate the prints of the digital morphed and bona fide face images in this work. The use of a dye-sublimation photo printer guarantees high-quality photo printing (generally used for a passport application) and makes sure that printed photos are free from dotted patterns (or individual droplets of ink) that are resulting from the printing process of conventional printers. Each of these printed photos is then scanned (or re-digitized) using the Canon office scanner to have 300 dpi as suggested in ICAO standards [27]. Print-scanned compressed morphed images: The printed and scanned images (both morphed and bona fide) are compressed to have a size of 15kb that makes it suitable to store in the e-passport. This process reflects the real-life scenario of face image storage in passport systems. Thus, the overall dataset has
B. Vulnerability Analysis
This section presents the vulnerability analysis of the proposed morphed face generation techniques to quantify the impact of our efficient attacks on FRS. We quantify the attack success for five different FRS including two Commercial-off-the-Shelf (COTS) FRS and three deep-learning-based open-source FRS. The COTS FRS include the Cognitec FRS (Version 9.4.2) [30] 2 and Neurotechnology (Version 10) [50] and the set of open-source FRS includes Arcface [31], VGGFace [49] and LCNN-29 [51]. The operational threshold for all 5 FRS is set at False Match Rate (FMR) of 0.1% following the guidelines of Frontex [52].
The vulnerability is assessed using two metrics Mated Morphed Presentation Match Rate (MMPMR) [8] and Fully Mated Morphed Presentation Match Rate (FMMPMR) [1] based on the threshold provided by Cognitec FRS. For a given morph image \begin{align*}&FMMPMR \\&\quad = \frac {1}{P} \sum _{M,P}^{} {\left ({S1_{M}^{P} > \tau }\right) \&\& \left ({S2_{M}^{P} > \tau }\right) \ldots \&\& \left ({Sk_{M}^{P} > \tau }\right)}, \\ {}\tag{11}\end{align*}
Further, to effectively analyse the vulnerability, we also present the results using Relative Morph Match Rate (RMMR) defined as follows [8]:\begin{align*} RMMR\left ({\tau }\right)_{MMPMR}=&1+\left ({MMPMR\left ({\tau }\right)}\right) -\left [{1-FNMR\left ({\tau }\right)}\right] \\ \tag{12}\\ RMMR \left ({\tau }\right)_{FMMPMR}=&1 + \left ({FMMPMR\left ({\tau }\right)}\right) -\left [{1-FNMR\left ({\tau }\right)}\right] \\ {}\tag{13}\end{align*}
The obtained success rate, or alternatively the vulnerability of FRS is provided in Tables I, II, III, IV, and V corresponding to Cognitec [30], VGGFace [49], Arcface [31], Neurotechnology (Version 10) [50] and LCNN-29 [51] respectively. The vulnerability analysis is carried out on 5 different morph generation methods that include facial landmarks (Landmarks-I) with image smoothing as the post-processing operation [7], Facial landmarks (Landmarks-II) with automatic image retouching and color equalisation [44], existing GAN based face morphing method based on StyleGAN [1] and proposed MIPGAN variants (MIPGAN-I and MIPGAN-II). Based on the obtained results, the following are the concrete observations:
The FNMR corresponding to five different FRS is equal to 0. Therefore, the value of the RMMR is equal to MMPMR or FMMPMR. This indicates that the FRS systems are accurate on our face datasets employed in this work.
Among the five FRS, the highest vulnerability is noted for Arcface [31], which is vulnerable to all five kinds of face morphing attack methods.
Among COTS FRS, the Cognitec FRS indicates a higher vulnerability on all five types of face morphing attack methods compared to Neurotechnology FRS.
Among five different morph generation methods, Landmark-I indicates the highest vulnerability on all five other FRS.
The proposed face morphing methods MIPGAN-I and MIPGAN-II consistently indicate the highest vulnerability, when compared to the existing method based on StyleGAN [1]. This indicates the high quality of morphs generated using the proposed MIPGAN-I and MIPGAN-II methods.
The proposed MIPGAN-I and MIPGAN-II methods also indicate a higher vulnerability than the Landmark-II technique for morph generation with four different FRS.
Among the two different metrics (MMPMR and FMMPMR), the proposed FMMPMR indicates a lower vulnerability than MMPMR consistently as FMMPMR imposes a strict selection of attack images, unlike MMPMR.
MIPGAN-I based morphed images show a marginally better performance in attacking FRS than images generated by MIPGAN-II.
C. Perceptual Image Quality Analysis
This section presents quantitative results of the proposed morphed image generation techniques using the perceptual image quality metrics PSNR and SSIM. Both of these metrics are computed based on the reference image. Morphed face images are generated based on parent face images from two contributory data subjects. Therefore, we used the parent face images from both contributory data subjects as the reference image against which the given morphed image is assessed and we average the obtained image quality scores for both parent images. Table VI indicates the quantitative results of both PSNR and SSIM on four different types of face morph generation mechanism in the digital format. Based on the obtained results, it can be observed that:
There is little deviation in the perceptual image quality metrics computed on all four different types of face morph generation mechanisms.
The proposed MIPGAN-I and MIPGAN-II methods indicate a slightly better image quality when compared to the StyleGAN [1] based face morphing method.
The proposed MIPGAN-I and facial landmarks-based methods [44] indicate a similar image quality.
Figures 6 and 7 indicate the box plots of the PSNR and SSIM quality scores. These results further indicate that the perceptual quality of the proposed MIPGAN-I and MIPGAN-II is superior to the existing state-of-the-art method based on StyleGAN [1].
Box plots of PSNR values computed from different face morph generation methods (digital version).
Box plots of SSIM values computed from different face morph generation methods (digital version).
D. Human Observer Analysis
In this section, we discuss the quantitative detection performance of human observations regarding morphed face images, which are generated using MIPGAN-I and MIPGAN-II. To this extent, we have designed and developed a Web-portal to evaluate the human morph detection performance reflecting both single image-based morphing attack detection scenario (S-MAD) and differential morphing attack detection scenario (D-MAD). We have used only digital samples of both bona fide and morphed face images as the proposed MIPGAN is used to generate the images in the digital domain. Figure 8 (a) shows the screenshot of the Web-portal for S-MAD in which the human observer needs to decide whether the displayed image is a morphed face image or a bona fide image by looking at one single image at a time. Correspondingly, Figure 9 (a) presents the screenshot for D-MAD experiment where the observer needs to detect whether the unknown image is morphed given a trusted bona fide image as a reference. We have selected a total of 90 images where 15 images are from each group corresponding to bona fide, two different types of facial landmarks based morphing such as Landmarks-I [7] and Landmarks-II [44], StyleGAN [1] based face morphing, MIPGAN-I and MIPGAN-II based face morphing. To make the testing robust, all 90 chosen images correspond to unique data subjects and there is no repetition of data subjects. To avoid gender bias by participants, we have selected a near equal distribution of male and female data subjects in each group. We have chosen 90 images considering the time constraints required to assess these images for human observers. It was important that observers do not loose focus while conducting the detection experiments.
(a) Example of screen shot used for differential human observer study (b) Quantitative results.
Figure 8 (b) shows the quantitative results of S-MAD obtained from 56 human observers, including 14 experienced and 42 inexperienced observers. The experienced observers’ group consists of researchers working in face morphing attack detection and as ID expert’s in border control, while the non-experienced group consists of students and other computer science professionals. As noticed from the Figure 8 (b) following are the main observations:
Detection performance of the bona fide images indicates better detection performance by both experienced and non-experienced group when compared to the morphed face image. The experienced group indicates the detection performance with an accuracy of 97.14%, while the non-experienced group indicates the detection performance with an accuracy of 79.21%.
Human observers with experience in face morphing demonstrate higher detection accuracy on four different face morph generation mechanisms than the inexperienced group.
Among the four different morphing types, the experienced group indicates that the detection of the landmarks-based morphing is challenging compared to other morphing mechanisms (deep learning-based).
Human observers with no experience in face morphing are marginally good in detecting the landmarks-based face morph images compared to other types of face morphing techniques. MIPGAN-I exhibits more challenging morph images to detect as compared to other morph generation methods.
Based on the obtained results, it can be noted that the human observers with good experience in face morphing can detect morphed images with an accuracy of 88.25% while the human observer with no knowledge of face morphing shows the challenge to detect the morphed face images with a detection accuracy of 64.31%.
The overall results from 56 human observers indicate that detecting morphed face images is challenging. Further, it is also interesting to note that detecting different face morphing types is also challenging.
For the quantitative results of D-MAD, 5 experienced observers and 10 inexperienced observers have participated. As shown in Figure 9 (b), the following observations are illustrated:
In the scenario of D-MAD, the group with relevant experiences achieved an overall 86% accuracy, which is better than 81% for the inexperienced group. However, this difference is much less than the difference in S-MAD, which means that the reference image can help inexperienced observers to identify the morphs.
Morphs generated by Landmark-II present a significant challenge as compared to other morph generation mechanisms in D-MAD. This may be attributed to a more natural skin texture appearance (comparing with GAN-based mechanisms) and fewer artefacts (comparing with Landmark-I) and observers focusing less on its minor artefacts in the pairwise comparison.
It is also interesting to see that the performances of experienced observers on detecting Landmark-II (80.95% and 72.00%), StyleGAN (90.48% and 88.00%), MIPGAN-II (90.95% and 86.67%), and bona fide images (90% and 88.00%) are lower than their performance in S-MAD. We believe this is because experienced observers do not pay critical attention to tolerable difference between the trusted reference image and the unknown comparison image.
E. Ablation Study
In order to measure the impact of the loss functions in the proposed approach, we conduct an extensive ablation study. The proposed loss function combines four different entities such as: perceptual loss (
Table VII indicates the quantitative performance of the ablation study using a vulnerability analysis for both the COTS-FRS from Cognitec and for the open-source Arcface FRS with the proposed MIPGAN-I and MIPGAN-II methods. The ablation study is carried out on the digital morphed images generated using both MIPGAN-I and MIPGAN-II Methods. Figures 10 and 11 shows the qualitative performance of the ablation study on both MIPGAN-I and MIPGAN-II, respectively. Based on the obtained results, the following are the main observations:
Each term in our proposed loss function (see Eq. (10)) contributes to posing a greater challenge to a FRS for both proposed MIPGAN-I and MIPGAN-II morph generation frameworks.
Among the four other loss functions that we have used, the
is critical in improving the proposed method’s performance. Discarding the perceptual loss has resulted in a degrading performance in both qualitative (see Figures 10 (d) and 11 (d)) and quantitative results.$Loss_{Perceptual}$ The use of identity loss (
) also indicates the importance of improving the quantitative performance of the proposed method.$Loss_{Identity}$ The
also contributes to both qualitative and quantitative improvements of the morphs generated by the proposed method.$Loss_{MS-SSIM}$
Qualitative results of ablation study using proposed MIPGAN-I (a)
Qualitative results of ablation study using proposed MIPGAN-II (a)
F. Hyper-Parameters Study
This section presents both qualitative and quantitative results on the selection of hyper-parameters (
Table VIII shows the qualitative performance and Figure 12 shows the qualitative performance of the hyper-parameter study. Based on the obtained results, it can be noted that the increase in the value of
Qualitative results of Hyper-parameters study on both MIPGAN-I and MIPGAN-II (a)
G. Morphing Attack Detection Potential
Considering the success rate of the newly generated dataset, we naturally choose to evaluate the morphing attack detection performance to also validate the robustness of existing MAD mechanisms. Additionally, we investigate recent works about general face manipulation detection [53], [54], [55] and some results are shown in the supplementary material. In this work, we focus on single image based morphing attack detection (S-MAD) as it perfectly suits our dataset. MAD has been widely addressed in the literature by developing the techniques based on both deep learning [56], [57], [58], [59], [60] and non-deep learning [19], [61], [62] approaches. Readers can refer to [63] for an exclusive survey on face MAD. Owing to the recent works detailing the applicability of Hybrid features [35] and Ensemble features [36] in detecting morphing attacks, we choose to benchmark both Hybrid features [35] and Ensemble features [36]. While the Hybrid features [35] resort to extracting features using both scale space and color space combined with multiple classifiers, Ensemble features [36] employ a variety of textural features in conjunction with a set of classifiers. In common both approaches evaluate a wide variety of MAD mechanisms in a holistic manner supported by empirical results [35], [36]. In addition, the Hybrid features [35] mechanisms are also validated against the ongoing NIST FRVT MORPH challenge [37] with the best performance in detecting printed and scanned morph images justifying our selection of algorithm to benchmark the newly composed database.
The reporting of MAD performance is following the ISO/IEC metrics [64] namely the Attack Presentation Classification Error Rate (APCER (%)) which defines the proportion of attack images (morph images) incorrectly classified as bona fide images and the Bona fide Presentation Classification Error Rate (BPCER (%)) in which bona fide images incorrectly classified as attack images are counted [64] along with the Detection Equal Error Rate (D-EER (%)). To evaluate the generated morphed face image’s attack potential, we have sub-divided the newly generated database into two sets for training and testing that consists of independent data subjects with no overlap between the splits. The training set includes 690 bona fide images and 1190 morphed images. The testing set consists of 580 bona fide and 1310 morphed images. To effectively evaluate the performance of the MAD reflecting a real-life scenario, we report the results on both intra (training and testing dataset from the same morph generation approach) and inter (training on one type of morphing techniques and testing on another type of morphing techniques) evaluation of MAD mechanisms. Extensive experiments are performed on digital, print-scan and print-scan with compression data types to provide an in-depth analysis of the S-MAD performance. Tables IX, X, XI, XII, and XIII presents the quantitative results of MAD mechanisms on morph generation methods together with the SOTA morph generation techniques. Based on the results obtained from the intra-dataset experiments, we make some concrete observations as listed below:
The intra-dataset evaluation indicates that the morphing attacks are detected with a good success rate irrespective of the type of generation.
In general, the attack detection success rate is high with digital data when compared to print-scan and print-scan compression.
Among the different types of morph generation techniques, the Landmark-II based morph generation shows the highest error rates. The attack images created using StyleGAN and proposed MIPGAN can be efficiently detected using both the employed approaches with high accuracy. This can be attributed to the noises that are synthesized using GANs due to the computational modifications performed on the latent space in GAN-based morph generation methods.
In the following, we discuss the important observations based on the results obtained from inter-dataset MAD analysis:
The performance of the MAD techniques are degraded on all five different case studies as indicated in the Tables IX, X, XI, XII, and XIII.
Training MAD algorithms with one type of landmarks-based method did not show the improvement in detection performance of another kind of landmarks-based morph generation method.
When MAD mechanisms are trained using the Landmarks-I [7] method, the degraded performance is noted for all other morph generation methods except for the StyleGAN [1] based approach. This fact is also noted when we train the MAD techniques using StyleGAN [1] generated samples and test it with Landmarks-I [7] samples. Thus, the StyleGAN [1] based morph generation is easy to detect even when MAD mechanisms are not trained using the images from same morph generation scheme.
When MAD algorithms are trained using Landmarks-II [44] samples, MAD algorithms indicate degraded performance on all other morph generation techniques.
When MAD mechanisms are trained using the proposed MIPGAN-I generated samples. The MAD mechanisms indicate an excellent detection performance on MIPGAN-II samples. However, the detection performance of MAD methods is deceived with other morph generation techniques.
It is interesting to note that when MAD mechanisms are trained using MIPGAN-I/MIPGAN-II, higher detection accuracy can be observed for print-scan and print-scan with compression data when compared to digital morph data. A possible reason is that the noise generated together with the morphed images using the proposed MIPGAN-I/MIPGAN-II can approximate the generated noise resulting from the print-scan and print-scan compression process.
Based on the results of the inter-database MAD analysis, the detection of Landmarks-II [44] samples are challenging.
Limitations of Current Work and Potential Future Works
Despite this work presenting a new approach to generate strong morphing attacks, which are empirically evaluated using COTS FRS, our work has a few noted limitations. In the current scope of work, we evaluate the impact of print and scan (re-digitizing) using one printer reflecting a realistic scenario. The MAD mechanism employed in this work has not been investigated with a wide range of printers and scanners that may impact the MAD performance. While we assert that the MAD performance may not vary extremely, when tested with a wider combination of printers and scanners, that empirical evaluation is yet to be conducted in future works.
A second aspect is that the proposed approach needs pre-selection of ethnicity for generating stronger attacks. Figure 13 shows example morphed face images generated using the proposed method using MIPGAN-I and MIPGAN-II that fail to get verified to contributing subjects when ethnicity pre-selection is not performed [7]. We notice that the selection of contributing subjects plays an important role with the proposed method to generate stronger attacks with MIPGAN. It is our assertion that the selection of contributing subjects with similar geometric structures (particularly ethnicity and age) can improve the performance of the proposed system, but that aspect needs further investigation.
Examples of morphed images that failed to attack FRS (a) morphed face images generated using proposed MIPGAN-I (b) morphed face images generated using proposed MIPGAN-II.
Conclusion
Addressing the limitations of generating the strong and severe morphing attacks using GAN, we have proposed a new architecture for generating face morphed images in this work. The proposed approach (MIPGAN with two variants) for devising strong morphing attacks uses identity prior driven GAN with a customized loss exploiting perceptual quality and identity factors to generate realistic images that can strongly threaten FRS. In order to validate the attack potential of the proposed morph generation method, we have created a new dataset consisting of 30, 000 morphed images and 15, 240 bona fide images. Both COTS and deep learning based FRS were evaluated empirically to measure the success rate of the new approach and vulnerability was reported indicating the applicability of the new approach and newly generated database. In a similar direction, the dataset is also validated for detection performance by studying two state-of-art MAD mechanisms. Despite the high attack detection success rate by employed MAD, we note that the morphed images generated by MIPGAN can severely threaten FRS in a present state without MAD in FRS.
ACKNOWLEDGMENT
This text reflects only the author’s views and the Commission is not liable for any use that may be made of the information contained therein.