Journals & Magazines >IEEE Transactions on Biometri... >Volume: 3 Issue: 3

MIPGAN—Generating Strong and High Quality Morphing Attacks Using Identity Prior Driven GAN

Abstract:

Face morphing attacks target to circumvent Face Recognition Systems (FRS) by employing face images derived from multiple data subjects (e.g., accomplices and malicious ac...Show More

Metadata

Abstract:

Face morphing attacks target to circumvent Face Recognition Systems (FRS) by employing face images derived from multiple data subjects (e.g., accomplices and malicious actors). Morphed images can be verified against contributing data subjects with a reasonable success rate, given they have a high degree of facial resemblance. The success of morphing attacks is directly dependent on the quality of the generated morph images. We present a new approach for generating strong attacks extending our earlier framework for generating face morphs. We present a new approach using an Identity Prior Driven Generative Adversarial Network, which we refer to as MIPGAN (Morphing through Identity Prior driven GAN). The proposed MIPGAN is derived from the StyleGAN with a newly formulated loss function exploiting perceptual quality and identity factor to generate a high quality morphed facial image with minimal artefacts and with high resolution. We demonstrate the proposed approach’s applicability to generate strong morphing attacks by evaluating its vulnerability against both commercial and deep learning based Face Recognition System (FRS) and demonstrate the success rate of attacks. Extensive experiments are carried out to assess the FRS’s vulnerability against the proposed morphed face generation technique on three types of data such as digital images, re-digitized (printed and scanned) images, and compressed images after re-digitization from newly generated MIPGAN Face Morph Dataset. The obtained results demonstrate that the proposed approach of morph generation poses a high threat to FRS.

Published in: IEEE Transactions on Biometrics, Behavior, and Identity Science ( Volume: 3, Issue: 3, July 2021)

Page(s): 365 - 383

Date of Publication: 14 April 2021

Electronic ISSN: 2637-6407

DOI: 10.1109/TBIOM.2021.3072349

Funding Agency:

Contents

SECTION I.

Introduction

Face Recognition Systems (FRS) have provided ubiquitous ways of verifying an identity claim in many applications. FRS have been used in everyday applications from low-security applications such as smartphone unlocking to high-security applications such as identity verification in border control processes. Each of the applications mandate a chosen way of enrolment to FRS where either a supervised enrolment is carried out (for instance in on-boarding at bank premises) or unsupervised enrolment is requested (on-boarding for banking applications from home). While it provides a high degree of flexibility and convenience to users to initiate an enrolment process in an unsupervised manner, this potentially leads to a security risk: Without supervision, a data subject enrolling into the FRS can submit a face image which is manipulated, a printed face image, an image displayed from an electronic screen (e.g., iPad) or a silicone latex face mask [2]. In order to mitigate such attacks at the enrolment level, it is therefore essential to have a robust attack detection mechanism. While a number of works in recent years have been proposed on both conducting such attacks and detecting the attacks in a robust manner for printed attacks, display attacks and mask attacks, in this work we focus on a new kind of attack referred popularly as Morphing Attack.

Face morphing is the process of combining two or more face images to generate a single face image that can resemble visually to all the contributing face images to a greater degree [3]. A good quality morphed face image is also effective in verifying against all contributing subjects by obtaining a comparison score that exceeds the pre-determined threshold (i.e., passes through FRS) [3], [4], [5], [6]. While morphing can be conducted using multiple face images of different subjects, the effectiveness of morphed images is reported when the face images of similar ethnicity, gender and age group are considered [6], [7], [8]. This is primarily due to the fact that a morphed image should not only defeat the FRS but should also provide high visual similarity, in order to convince a human expert in a visual comparison process.

Face morphing attacks threaten FRS due to the current practices in the ID-document application process, where the biometric enrolment is carried out in an unsupervised manner in many countries. Countries like the U.K. and New Zealand allow citizens to upload a digital face image for various applications such as passport renewal [9] and visa application [10]. The capture process for such images is unsupervised. In a similar manner, many Asian countries and European countries (e.g., in The Netherlands [11]) request the applicant to submit a scanned face image for passport/visa/identity-card applications. Given that the images are captured and submitted in an unsupervised setting, the applicant has vast opportunities to upload a morphed image with malicious intent underlining the need for robust Morphing Attack Detection (MAD) mechanisms.

A. Related Works on Face Morph Generation

While morphing attacks have been studied in recent years, most of the attacks are conducted using the morphed images created using facial landmarks-based approaches needing high a degree of supervision to first determine the facial landmarks, thereupon align them and then finally blend them to generate morphs. The common set of procedures for warping/blending includes Free Form Deformation (FFD) [12], [13], Deformation by moving least squares [14], deformation based on mass spring [15], Bayesian framework based morphing [16] and Delaunay triangulation based morphing [17], [18], [19], [20], [21]. Due to inadvertent artefacts caused by pixel/region-based morphing, the images need additional work in refining the signal to create highly realistic morph images. A set of post processing steps are usually included as illustrated in number of works [20], [22], [23]. Generally, some set of post processing techniques such as image smoothing, image sharpening, edge correction, histogram equalization, manual retouching, image enhancement to improve the brightness and contrast are used to eliminate the artefacts generated during the morphing process. In a parallel direction, morphed face images can also be generated using landmarks-based methods available in open-source resources like GIMP/GAP and OpenCV. Morphs generated using GIMP/GAP technique are more efficient with respect to a good quality of the resulting image (i.e., less noticeable artefacts) as pixels are aligned manually. Despite the minimal amount of effort needed for creating morphs using such approaches, a significant amount of effort needs to be dedicated to correcting artefacts. Additionally, commercial solutions like Face Fusion [24] and FantaMorph [25] can also generate good quality morphed images with limited manual intervention. Although some steps can be excluded in creating the morphs, it is very critical to meet the face image quality standards laid out by the International Civil Aviation Organization (ICAO) [26], [27] for electronic Machine Readable Travel Document (eMRTD) and deployment of biometric identification applications.

B. GAN Based Face Morph Generation

In an attempt to overcome the cumbersome efforts of manually creating (semi-automated) morphed images, a fully automated approach using a Generative Adversarial Network (GAN) was proposed by Damer et al. [28]. Unlike the supervision required in the mark-up of landmarks and aligning the face images in a (partially) manual process, GAN-based techniques synthesise morphed images directly by merging two facial images in the latent space. In the work by Damer et al. [28], the proposed MorGAN architecture for morph generation basically employed a generator constituting encoders, decoders and a discriminator. The generator was trained to generate images with the dimension $64\times 64$ pixels which is a key limiting factor of the attack, as most commercial FRS will reject images that do not meet the ICAO standard that requires a minimum Inter-Eye Distance (IED) of 90 pixels. The empirical evaluation of generated morph images using MorGAN in a vulnerability analysis against two commercial FRS indicated that those MorGAN morphs fail to meet both quality standards and the verification threshold of the FRS [1]. Motivated to address the deficiency of the MorGAN architecture, in our recent work [1]¹ we proposed an approach based on the StyleGAN architecture [29] to increase the spatial dimension to $1024\times 1024$ and thus to improve the face image quality. Unlike the previous approach of MorGAN [28], StyleGAN [1] achieves better spatial resolution by embedding the images in the intermediate latent space. With the increased spatial dimension of resulting morphed images from our recently proposed architecture, we not only demonstrated that the images meet quality standards but also have a reasonable success rate when attacking commercial FRS [1].

C. Limitations of GAN Based Face Morph Generation and Our Contributions

While our earlier work [1] indicated that better GAN architectures could result in superior quality morphs and could attack an FRS in general, we also acknowledge the limited threats that exist for Commercial-Off-The-Shelf (COTS) FRS, as merely a subset of morphed images was accepted. Only approximately 50% of the generated morph images were verified successfully against probe images from a contributing subject. Thus the empirical evaluation in our earlier work has shown that the attack was yet not very effective [1] for a COTS FRS [30] and an open-source FRS based on ArcFace [31]. We must state that up to now FRS are not very vulnerable to GAN-based morphing attacks unlike to landmarks-based morphing attacks. With a clear introspection into this aspect, we notice that the resulting morphed images from our earlier work [1] does not retain a high degree of facial similarity to both contributing subjects. With lower similarity to contributing subjects in terms of facial structures, the FRS do not attribute a high comparison score, as anticipated. In other words, the missing enforcement of identity information of contributing subjects will lead to a high visual quality facial image but with lower face similarity to contributing face characteristics.

In an effort to make the attacks stronger such that both subjects can be verified with a good success rate, in this work, we extend our previous architecture to generate morphs by including the identity priors before the generation of morphed faces. We now refer to this approach as MIPGAN (Morphing through Identity Prior driven GAN). We propose two variants of our approach named as MIPGAN-I and MIPGAN-II based on the employed GAN being StyleGAN or StyleGAN2 respectively [29], [32]. With the inclusion of a new loss function in our proposed architecture, we increase the attack success rate against commercial-off-the-shelf (COTS) FRS and deep learning based FRS. Figure 1 shows the example of morphed face images generated using proposed MIPGAN along with outputs of both the variants. To further achieve superior quality face morphs, we also customize the newly designed loss function to account for ghosting and blurring artefacts in an end-to-end manner with no human or manual intervention eliminating the need for a high degree of interaction. As noted in Figure 2, the results from MIPGAN-I and MIPGAN-II is more coherent in retaining structural similarity as compared to our earlier architecture [1]. With the updated architecture to generate high-quality morphs which preserve both identity information and structural correspondence, we evaluate the applicability in creating stronger attacks by creating a large-scale dataset of morphed images by employing the face images derived from the FRGC-V2 face database [33]. The created dataset of 1270 bona fide images and 2500 morphed images is first evaluated to measure the attack success rate by verifying the morphed images against the contributing subjects using a commercial FRS from Cognitec [30]. In addition to measuring the attack success rate for digital images, we also extend our work by printing and scanning (re-digitizing) the dataset. We check the consistency of the attack success rate, unlike our earlier work which was limited to an investigation on digital images alone [1]. We also include the experiments on assessing the impact of compression (down to 15kb following ICAO guidelines) of printed and scanned face images that simulate the real-life e-passport application scenario. The key motivation to extend our work in this direction is, to mimic the passport application process that is operated in many European countries and Asian countries, which all accept printed-and-scanned facial images in the application process for an identity document (e.g., passports).

Fig. 1.

Results from StyleGAN based face morphing [1] and the proposed MIPGAN (a) Contributing subject 1 (b) StyleGAN [1] (c) Proposed method (d) Contributing subject 2.

Show All

Fig. 2.

Details of segmented components in morphs generated by earlier method based on StyleGAN [1] and proposed MIPGAN (a) StyleGAN [1] (b) MIPGAN-I (c) MIPGAN-II.

Show All

With the extensive experimental results indicating a highly satisfactory attack success rate, we also evaluate a set of MAD algorithms to benchmark the detection capabilities. To this extent, we evaluate two state-of-the-art MAD approaches on digital morphed images, re-digitized and compressed morphed images after re-digitizing. Thus, we comprehensively cover the potential morphing attacks in the digital domain and the re-digitized domain. While we note the earlier works [1] arguing that attacks in the digital domain can be detected by studying the cues such as residual noise in morphing [34], patterns of noise from morphed images, histogram features of textures or the deep features [4], we also investigate the MAD capabilities for re-digitized images which do not exhibit the similar features (residual noise) as the print-scan process eliminates the digital cues and presents another set of variations. Specifically, given the nature of the dataset in which we have only a single suspected morphed image, for which we must determine either the morph or the bona fide class, we resort to Single Image based MAD (S-MAD) approaches using two recent but robust approaches using hybrid and ensemble features [34], [35], [36], [37].

We therefore present a summary of contributions of this work as listed below:

We present a novel approach of generating morphed face images through GAN architecture with enforced identity priors and a customized novel loss function to generate highly realistic images which we refer as MIPGAN (Morphing through Identity Prior driven GAN). We present two variants of the proposed approach for generating attacks with a high success rate.
The proposed approach (both variants) is benchmarked to measure the attack success rate by verifying COTS and deep learning based FRS through studying the vulnerability using a newly generated dataset from our proposed architecture which is referred as MIPGAN Face Morph Dataset.
Human observer analysis for detecting morphs generated by the proposed and existing morphing attack methods is presented.
Analysis of the perceptual quality metrics to illustrate the visual quality of the generated morph images is presented.
Extensive experiments on three different data types such as (a) digital morphed images (b) print-scan morphed image (c) print-scan morphed images with compression are presented to cover the full spectrum of passport application process under morphing attacks.
The generated images are also benchmarked against the existing MAD approaches both in digital form and the re-digitized form to provide the insights on detection challenges of SOTA approaches. We also present a generalizability study on MAD schemes by training one kind of morph generation and testing on a different kind of morph generation approach to indicate directions to future works.

In the rest of the paper, Section II describes the new architecture along with the newly designed loss function to generate high-quality morphs. Section III provides the details on the quantitative experiments indicating the vulnerability of FRS and the detection challenge. With the set of remarks and future works in this direction, we draw the conclusion in Section V.

SECTION II.

Proposed Morphed Face Generation

Figure 3 presents the block diagram of the proposed morphed face image generation using MIPGAN. The proposed method is based on the end-to-end optimization using a new loss function that can preserve the identity of the generated morphed face image through enforced identity priors. The proposed MIPGAN framework is designed independently on two different GAN models based on StyleGAN [29] and StyleGAN2 [32] model. We refer to the proposed scheme with StyleGAN as MIPGAN-I and with StyleGAN2 as MIPGAN-II respectively. Given the face images from the accomplice ($I_{1}$ ) (contributing subject 1) and the malicious ($I_{2}$ ) (contributing subject 2) data subjects, we predict the corresponding latent vectors $L_{1}^{\prime }$ and $L_{2}^{\prime }$ in the first step. In this work, we have employed the open-source pre-trained prediction models trained to predict the corresponding latent vector given an input image. Hence, $L_{1}^{\prime }$ and $L_{2}^{\prime }$ are predictions from the final output layer of the model, which is further reshaped. Since MIPGAN-I and MIPGAN-II are based on pre-trained StyleGAN [29] and StyleGAN2 [38] model respectively, we used two different open-source pre-trained models for prediction. Both of the prediction models employ $ResNet50$ [39] as backbone. The model for MIPGAN-I (StyleGAN) uses one convolution layer and two tree-connected layers [40] to map the output of $ResNet50$ into the final latent vector with the size of (18, 512). In comparison, the model for MIPGAN-II (StyleGAN2) just uses one fully-connected layer to achieve the mapping. The predicted latent vectors thus provide the initialization for the morphed face generation that is obtained using a weighted linear average of $L_{1}^{\prime }$ and $L_{2}^{\prime }$ as follows:\begin{equation*} L^{\prime }_{M} = \frac {w_{1}*L_{1}^{\prime } + w_{2}*L_{2}^{\prime }}{2}, \tag{1}\end{equation*} View Source where $w_{1}$ and $w_{2}$ indicate the weights, which we have chosen to be $w_{1} = w_{2} = 1$ . Equal weights are selected as shown in earlier work [41] where the morphing images generated with equal weights pose higher vulnerability to COTS FRS. Finally, $L^{\prime }_{M}$ is passed through the synthesis network (independently from StyleGAN [29] and StyleGAN2 [32] model) to generate the corresponding morphed image $I^{\prime }_{M}$ that has a resolution of $1024 \times 1024$ pixels. The generated morphed face image $I^{\prime }_{M}$ is then optimized using the proposed loss function to generate the high quality morphed face image. In the following section, we discuss the loss function to optimism the latent vector obtained using Equation (1).

Fig. 3.

Block diagram of the proposed MIPGAN for generating high quality morphed face images.

Show All

A. Proposed Loss Function

The proposed loss function is based on both perceptual fidelity, quality and identity factors that can facilitate high-quality face morph generation. The common issue with the GAN-based morph generation is the presence of ghost artefacts and blurring issues. We employ the perceptual loss with multiple layers to eliminate such effects as given by Eqn. (2).\begin{align*} Loss_{Perceptual}=&\frac {1}{2}\sum _{i} \frac {1}{N_{i}}\left \|{F_{i}\left ({I_{1}}\right)-F_{i}\left ({I^{\prime }_{M}}\right)}\right \|^{2}_{2} \\&+\,\,\frac {1}{2}\sum _{i} \frac {1}{N_{i}}\left \|{F_{i}\left ({I_{2}}\right)-F_{i}\left ({I^{\prime }_{M}}\right)}\right \|^{2}_{2}, \tag{2}\end{align*} View Source where $N_{i}$ denotes the number of features in layer $i$ and $F_{i}$ denotes features in layer $i$ of the perceptual network (VGG-16 in our case). For the combination of perceptual layers, we choose $conv1_{1}$ , $conv1_{2}$ , $conv2_{2}$ , $conv3_{3}$ inspired by [42]. Compared with the original combination of layers $conv1_{2}$ , $conv2_{2}$ , $conv3_{3}$ , $conv4_{3}$ [43], our design measures low-level features instead of high-level features like style of an image and is closer to our goal of morphing faces with high quality.

The main goal of this paper is to generate the morphed face images that can significantly attack FRS. In order to achieve this, we have introduced the identity loss function based on the feedback from FRS. We employ Arcface [31] - a deep learning based FRS because of its robust and accurate performance to obtain feedback on generated morphed face images. Specifically, we employ a pre-trained embedding extractor with $ResNet50$ as the backbone to extract the unit embedding vectors and define the identity loss by their cosine distance to improve the morph generation process as given by Eqn. (3).\begin{equation*} Loss_{Identity}=\frac {\left ({1-\frac {\vec {v}_{1} \cdot \vec {v}_{M}}{\Vert \vec {v}_{1}\Vert \Vert \vec {v}_{M}\Vert }}\right)+\left ({1-\frac {\vec {v}_{2} \cdot \vec {v}_{M}}{\Vert \vec {v}_{2}\Vert \Vert \vec {v}_{M}\Vert }}\right)}{2}, \tag{3}\end{equation*} View Source where $\vec {v}_{1}$ , $\vec {v}_{2}$ , $\vec {v}_{M}$ respectively denotes the embedding vectors which are extracted from image $I_{1}, I_{2}, I^{\prime }_{M}$ respectively.

To further prove the loss function is differential for the morphed embedding vector $\vec {v}_{M}$ , we define $x_{d}, y_{d}, z_{d}$ to be the value of vector $\vec {v}_{1}$ , $\vec {v}_{2}$ , $\vec {v}_{M}$ in dimension $d$ respectively and $d' \neq d$ to be other dimensions except $d$ . The expanded identity loss function and its partial derivative are:\begin{align*}&Loss_{Identity}=\frac {\left ({1-\frac {\sum _{d} x_{d} z_{d}}{\Vert \vec {v}_{1}\Vert \Vert \vec {v}_{M}\Vert }}\right)+\left ({1-\frac {\sum _{d} y_{d} z_{d}}{\Vert \vec {v}_{2}\Vert \Vert \vec {v}_{M}\Vert }}\right)}{2},\tag{4}\\&\frac {\partial Loss_{Identity}}{\partial z_{d}} = 1-\frac {x_{d}}{2\Vert \vec {v}_{1}\Vert }\frac {\partial }{\partial z_{d}}\left ({\frac {z_{d}}{\sqrt {z_{d}^{2}+\sum _{d'\neq d}z_{d'}^{2}}}}\right) \\&\qquad -\,\,\frac {y_{d}}{2\Vert \vec {v}_{2}\Vert }\frac {\partial }{\partial z_{d}}\left ({\frac {z_{d}}{\sqrt {z_{d}^{2}+\sum _{d'\neq d}z_{d'}^{2}}}}\right),\tag{5}\\&\frac {\partial }{\partial z_{d}}\left ({\frac {z_{d}}{\sqrt {z_{d}^{2}+\sum _{d'\neq d}z_{d'}^{2}}}}\right) = \frac {1}{\sqrt {z_{d}^{2}+\sum _{d'\neq d}z_{d'}^{2}}} \\&\qquad +\,\,\frac {2z_{d}^{2}}{-2\left ({z_{d}^{2}+\sum _{d'\neq d}z_{d'}^{2}}\right)^{\frac {3}{2}}}=\frac {\sum _{d'\neq d}z_{d'}^{2}}{\left ({z_{d}^{2}+\sum _{d'\neq d}z_{d'}^{2}}\right)^{\frac {3}{2}}} \\&\frac {\partial Loss_{Identity}}{\partial z_{d}}=1-\frac {\left ({\frac {x_{d}}{2\Vert \vec {v}_{1}\Vert }+\frac {y_{d}}{2\Vert \vec {v}_{2}\Vert }}\right) \sum _{d'\neq d}z_{d'}^{2}}{\left ({z_{d}^{2}+\sum _{d'\neq d}z_{d'}^{2}}\right)^{\frac {3}{2}}}.\tag{6}\end{align*} View Source For any value $z_{d}=z'_{d}$ , it is obvious that:\begin{align*}&\lim _{\Delta z_{d} \to 0}\frac {\partial Loss_{Identity}\left ({z'_{d}+\Delta z_{d}}\right)}{\partial z_{d}}\\&\quad =\lim _{\Delta z_{d} \to 0}\left ({1-\frac {\left ({\frac {x_{d}}{2\Vert \vec {v}_{1}\Vert }+\frac {y_{d}}{2\Vert \vec {v}_{2}\Vert }}\right) \sum _{d'\neq d}z_{d'}^{2}}{\left ({\left ({z'_{d}+\Delta z_{d}}\right)^{2}+\sum _{d'\neq d}z_{d'}^{2}}\right)^{\frac {3}{2}}}}\right)\\&\quad =1-\frac {\left ({\frac {x_{d}}{2\Vert \vec {v}_{1}\Vert }+\frac {y_{d}}{2\Vert \vec {v}_{2}\Vert }}\right) \sum _{d'\neq d}z_{d'}^{2}}{\left ({z^{'2}_{d}+\sum _{d'\neq d}z_{d'}^{2}}\right)^{\frac {3}{2}}}\\&\quad =\frac {\partial Loss_{Identity}\left ({z'_{d}}\right)}{\partial z_{d}}.\end{align*} View Source Hence, for any dimension of $d$ , the partial derivative of the identity loss function is continuous.

It is interesting to note that the identity loss based on the Arcface feature extractor model is trained to maximize the face class separability and thus is more sensitive to face attributes. Hence, only optimising the identity loss cannot achieve the same reconstruction performance as the perceptual loss but applying it on the face region can effectively control the generated attributes to be recognized as both subjects.

To solve the imbalance between different subjects, we introduce an identity difference loss as given by Eqn. (7).\begin{equation*} Loss_{ID-Diff}=\left |{\left ({1-\frac {\vec {v}_{1}\cdot \vec {v}_{M}}{\Vert \vec {v}_{1}\Vert \Vert \vec {v}_{M}\Vert }}\right) -\left ({1-\frac {\vec {v}_{2}\cdot \vec {v}_{M}}{\Vert \vec {v}_{2}\Vert \Vert \vec {v}_{M}\Vert }}\right)}\right |.\qquad \tag{7}\end{equation*} View Source With the idea of the Lagrange multiplier, it adds a constraint to the optimization process to force the cosine distance between morph embedding and each of the two reference embeddings to be the same. Since $Loss_{ID-Diff}$ is usually small with a value less than 1, we apply $L1$ loss on the difference of two cosine distance terms to avoid the vanishing gradient problem.

Finally, in order to improve the structural visibility of the generated morphed face image, we also apply the Multi-Scale Structural Similarity (MS-SSIM) loss $L_{MS-SSIM}$ to measure the similarity in structure [45]. Given two discrete non-negative signals (images in our case) $x$ and $y$ , luminance, contrast and structure comparison measures were given by $l, c, s$ as computed using Eqn. (8).\begin{align*} l(x,y)=&\frac {\left ({2\mu _{x}2\mu _{y}+(K_{1}L)^{2}}\right)}{\mu _{x}^{2}+\mu _{y}^{2}+\left ({K_{1}L}\right)^{2}}, \\ c(x,y)=&\frac {\left ({2\sigma _{x}2\sigma _{y}+(K_{2}L)^{2}}\right)}{\sigma _{x}^{2}+\sigma _{y}^{2}+ \left ({K_{2}L}\right)^{2}}, \\ s(x,y)=&\frac {\left ({\sigma _{xy}+\frac {\left ({K_{2}L}\right)^{2}}{2}}\right)} {\sigma _{x}\sigma _{y}+\frac {\left ({K_{2}L}\right)^{2}}{2}}, \tag{8}\end{align*} View Source where $\mu _{x}, \sigma _{x}$ and $\sigma _{xy}$ denotes the mean of $x$ , the variance of $x$ and the covariance of $x$ and $y$ respectively. $L$ is the dynamic range of the signal and $K_{1} \ll 1, K_{2} \ll 1$ are two constant scalars. The MSSSIM loss $L_{MS-SSIM}$ is further defined by Eqn. (9).\begin{align*} MSSSIM(x,y)=&\left [{l_{J}(x,y)}\right]^{\alpha _{J}} \cdot \prod _{j=1}^{J} \left [{c_{j}(x,y)}\right]^{\beta _{j}}\left [{s_{j}(x,y)}\right]^{\gamma _{j}}, \\ L_{MS-SSIM}=&\frac {1}{2}\left ({1-MSSSIM\left ({I_{1},I'_{M}}\right)}\right) \\&+\,\,\frac {1}{2}\left ({1-MSSSIM\left ({I_{2},I'_{M}}\right)}\right), \tag{9}\end{align*} View Source where $j=1,2,\ldots,J$ represents the $j^{th}$ scale and $\alpha _{j},\beta _{j}$ and $\gamma _{j}$ are the factors of relative importance. As suggested in [45], we also set $\alpha _{j} = \beta _{j} = \gamma _{j}$ , $\sum _{j=1}^{J}\gamma _{j}=1$ and use the resulting parameters $\beta _{1}=\gamma _{1}=0.0448, \beta _{2}=\gamma _{2}=0.2856, \beta _{3}=\gamma _{3}=0.3001, \beta _{4}=\gamma _{4}=0.2363, \alpha _{5}=\beta _{5}=\gamma _{5}=0.1333$ .

Thus, the proposed loss function can be formulated as:\begin{align*} Loss=&\lambda _{1} Loss_{Perceptual}+\lambda _{2} Loss_{Identity} \\&+\,\,\lambda _{3} Loss_{MS-SSIM} + \lambda _{4} Loss_{ID-Diff}, \tag{10}\end{align*} View Source where $\lambda _{1}$ , $\lambda _{2}$ , $\lambda _{3}$ and $\lambda _{4}$ are the hyper-parameters that are set to achieve both stable and generalized convergence. In this work, we empirically set $\lambda _{1} = 0.0002$ , $\lambda _{2} = 10$ , $\lambda _{3} = 1$ and $\lambda _{4} = 1$ .

B. Training and Optimization

The training and optimization of the proposed method are carried out on Tensorflow version 1.13 and version 1.14 for StyleGAN and StyleGAN2, respectively. The optimization is carried out using NVIDIA GTX 1070 8 GB GPU with CUDA version 10.0 and CUDNN version 7.5 and NVIDIA Tesla P100 PCIE 16 GB GPU. The Adam optimizer with hyper-parameters $\beta _{1}=0.9$ , $\beta _{2}=0.999$ and $\epsilon =1\times 10^{-8}$ as recommended in the original paper [46] is employed on this work. The list of morphing pairs is generated in advance with careful considerations to gender. During each optimization process of 150 iterations, the learning rate is initially set to $\eta = 0.03$ with an exponential decay per 6 iterations of $\eta _{new}=\eta *0.95$ .

Figure 4 illustrates the qualitative results of the proposed MIPGAN framework based on StyleGAN and StyleGAN2. Further, the qualitative results of the existing methods based on StyleGAN [1] and MorGAN [28] are provided alongside for the convenience of the reader in the same figure. It is interesting to note that the proposed MIPGAN generated face morph images indicate both perceptual and geometric features correspondence to both contributing subjects (for instance, malicious actor and accomplice).

Fig. 4.

Qualitative results of proposed MIPGAN together with existing GAN based face morph generation methods (a) Landmark-I [7] (b) Landmark-II [44] (c) StyleGAN [1] (d) MorGAN [28] (e) Proposed method.

Show All

SECTION III.

Experiments and Results

This section presents and discusses the experimental protocols, datasets, and quantitative results of the proposed face morphing technique. The images generated from the proposed MIPGAN-I and MIPGAN-II architectures are compared with the state-of-the-art techniques based on both facial landmarks [7] and StyleGAN based morph generation [1]. The effectiveness of the face morphing generation is quantitatively evaluated by benchmarking the vulnerability of the COTS FRS and deep learning based FRS for generated morphed face images. Further, we also evaluate the morphing attack detection potential by evaluating the generated morphed face images using the most recent and robust MAD techniques.

A. MIPGAN Face Morph Dataset

We employ the face images from FRGC-V2 face database [33] to generate the MIPGAN Face Morph Dataset consisting of morphed face images using both state-of-the-art and the proposed MIPGAN technique. We have selected 140 unique data subjects from the FRGC dataset by considering the high-quality face images captured in constrained conditions that resemble the passport image quality. Among 140 data subjects, 47 data subjects are female and 93 data subjects are male. Each data subject has a variable size of 7–21 additional captured samples, resulting for the whole dataset to have 1270 samples corresponding to 140 data subjects. We employ three different face morph generation techniques based on facial landmarks constrained by Delaunay triangulation with blending [7] we term this as Landmarks-I, landmarks-based techniques with automatic post processing and color equalisation [44], we term this as Landmarks-II and StyleGAN [1]. We do not consider MorGAN [28], [47] based face morph generation as it was earlier demonstrated that MorGAN does not generate ICAO compliant images and thus makes COTS FRS not vulnerable [1]. All the samples are pre-processed to meet the ICAO standards [27] and morphing is carried out by following the guidelines outlined earlier [7], [8], i.e., careful selection of subjects based on gender and similarity score using a FRS, in order to have realistic attacks.

To effectively evaluate the proposed method’s quantitative performance and the existing techniques, we create three different types of attacks from morphed images, such as Digital morphed images: Morphed face images that are obtained from the morph generation process in the digital domain. Print-scanned morphed images: The digital morphed and bona fide images are printed and then scanned (or re-digitized) to simulate the passport application process. We have employed a DNP-DS820 [48] dye-sublimation photo printer to generate the prints of the digital morphed and bona fide face images in this work. The use of a dye-sublimation photo printer guarantees high-quality photo printing (generally used for a passport application) and makes sure that printed photos are free from dotted patterns (or individual droplets of ink) that are resulting from the printing process of conventional printers. Each of these printed photos is then scanned (or re-digitized) using the Canon office scanner to have 300 dpi as suggested in ICAO standards [27]. Print-scanned compressed morphed images: The printed and scanned images (both morphed and bona fide) are compressed to have a size of 15kb that makes it suitable to store in the e-passport. This process reflects the real-life scenario of face image storage in passport systems. Thus, the overall dataset has $2500 \times 3$ (types of morph data) $\times 4$ types of morph generation technique = 30, 000 morph samples and $1270 \times 3$ (types of morph data) $\times 4$ types of morph generation technique = 15, 240 bona fide samples. Figure 5 illustrates the three data types of attacks that are used to evaluate the effectiveness of the proposed method and the existing methods of face morph generation. It is evident that the visual quality of the images vary largely for different attack types (for instance, the digital data attack indicates the best quality and print-scan with compression indicates the lowest quality).

Fig. 5.

Illustration of morphing in digital, print-scan and print-scan compression data (a) Contributing subject 1 (b) Landmark-I [7] (c) Landmark-II [44] (d) StyleGAN [1] (e) MIPGAN-I (f) MIPGAN-II (g) Contributing subject 2.

Show All

B. Vulnerability Analysis

This section presents the vulnerability analysis of the proposed morphed face generation techniques to quantify the impact of our efficient attacks on FRS. We quantify the attack success for five different FRS including two Commercial-off-the-Shelf (COTS) FRS and three deep-learning-based open-source FRS. The COTS FRS include the Cognitec FRS (Version 9.4.2) [30] ² and Neurotechnology (Version 10) [50] and the set of open-source FRS includes Arcface [31], VGGFace [49] and LCNN-29 [51]. The operational threshold for all 5 FRS is set at False Match Rate (FMR) of 0.1% following the guidelines of Frontex [52].

The vulnerability is assessed using two metrics Mated Morphed Presentation Match Rate (MMPMR) [8] and Fully Mated Morphed Presentation Match Rate (FMMPMR) [1] based on the threshold provided by Cognitec FRS. For a given morph image $M_{I_{1,2}}$ obtained using two subjects, we compute the vulnerability by enrolling $M_{I_{1,2}}$ and verifying it against probe images from the corresponding contributing subjects $I_{1}$ and $I_{2}$ . The obtained comparison scores $S_{1}$ and $S_{2}$ for both probe images $I_{1}$ and $I_{2}$ against the morphed image $M_{I_{1,2}}$ indicates the threat to FRS, if and only if both $S_{1}$ and $S_{2}$ cross the actual verification threshold at FMR = 0.1%. The corresponding metric FMMPMR [1], [41] is therefore computed as:\begin{align*}&FMMPMR \\&\quad = \frac {1}{P} \sum _{M,P}^{} {\left ({S1_{M}^{P} > \tau }\right) \&\& \left ({S2_{M}^{P} > \tau }\right) \ldots \&\& \left ({Sk_{M}^{P} > \tau }\right)}, \\ {}\tag{11}\end{align*} View Source where $P = {1, 2, \ldots, p}$ represent the number of attempts made by presenting all probe images of the contributing subjects against the $M^{th}$ morphed image, $K = {1, 2, \ldots, k}$ represents the number of composite image constitute to generate the morphed image (in our case $K=2$ ), $Sk_{M}^{P}$ represents the comparison score of the $K^{th}$ contributing subject obtained with $P^{th}$ attempt corresponding to $M^{th}$ morphed image and $\tau $ represents the threshold value corresponding to FMR = 0.1%. When compared to MMPMR, the FMMPMR will consider both pair-wise comparison of contributory subjects and the number of attempts. In order to also establish the relationship with respect to earlier metrics, we also report the vulnerability using MMPMR [8].

Further, to effectively analyse the vulnerability, we also present the results using Relative Morph Match Rate (RMMR) defined as follows [8]:\begin{align*} RMMR\left ({\tau }\right)_{MMPMR}=&1+\left ({MMPMR\left ({\tau }\right)}\right) -\left [{1-FNMR\left ({\tau }\right)}\right] \\ \tag{12}\\ RMMR \left ({\tau }\right)_{FMMPMR}=&1 + \left ({FMMPMR\left ({\tau }\right)}\right) -\left [{1-FNMR\left ({\tau }\right)}\right] \\ {}\tag{13}\end{align*} View Source where, FNMR indicates the False Reject Rate (FNMR) of the FRS under consideration obtained at the threshold $\tau $ . In this work, $\tau $ represents the value corresponding to FMR = 0.1%. Since we have evaluated 5 different FRS systems, we have computed FNMR corresponding to these FRS to calculate the RMMR. Note that, in Equation (12) and (13) if FNMR = 0 then RMMR corresponds to MMPMR/FMMPMR.

The obtained success rate, or alternatively the vulnerability of FRS is provided in Tables I, II, III, IV, and V corresponding to Cognitec [30], VGGFace [49], Arcface [31], Neurotechnology (Version 10) [50] and LCNN-29 [51] respectively. The vulnerability analysis is carried out on 5 different morph generation methods that include facial landmarks (Landmarks-I) with image smoothing as the post-processing operation [7], Facial landmarks (Landmarks-II) with automatic image retouching and color equalisation [44], existing GAN based face morphing method based on StyleGAN [1] and proposed MIPGAN variants (MIPGAN-I and MIPGAN-II). Based on the obtained results, the following are the concrete observations:

The FNMR corresponding to five different FRS is equal to 0. Therefore, the value of the RMMR is equal to MMPMR or FMMPMR. This indicates that the FRS systems are accurate on our face datasets employed in this work.
Among the five FRS, the highest vulnerability is noted for Arcface [31], which is vulnerable to all five kinds of face morphing attack methods.
Among COTS FRS, the Cognitec FRS indicates a higher vulnerability on all five types of face morphing attack methods compared to Neurotechnology FRS.
Among five different morph generation methods, Landmark-I indicates the highest vulnerability on all five other FRS.
The proposed face morphing methods MIPGAN-I and MIPGAN-II consistently indicate the highest vulnerability, when compared to the existing method based on StyleGAN [1]. This indicates the high quality of morphs generated using the proposed MIPGAN-I and MIPGAN-II methods.
The proposed MIPGAN-I and MIPGAN-II methods also indicate a higher vulnerability than the Landmark-II technique for morph generation with four different FRS.
Among the two different metrics (MMPMR and FMMPMR), the proposed FMMPMR indicates a lower vulnerability than MMPMR consistently as FMMPMR imposes a strict selection of attack images, unlike MMPMR.
MIPGAN-I based morphed images show a marginally better performance in attacking FRS than images generated by MIPGAN-II.

TABLE I Quantitative Evaluation of Vulnerability of COTS Cognitec-FRS [30] From Various Morph Generation Approaches. Note That, Since FNMR = 0 @ FMR = 0.1% for Cognitec-FRS [30] Following Eq. (12) and (13), the Value of RMMR is Equal to MMPMR/FMMPMR. Therefore, We Have Not Entered RMMR Separately in the Table Above

TABLE II Quantitative Evaluation of Vulnerability of VGGFace2 [49] FRS From Various Morph Generation Approaches. Note That, Since FNMR = 0 @ FMR = 0.1% for VGGFace2 [49] Following Eq. (12) and (13), the Value of RMMR is Equal to MMPMR/FMMPMR. Therefore, We Have Not Entered RMMR Separately in the Table Above

TABLE III Quantitative Evaluation of Vulnerability of Arcface [31] FRS From Various Morph Generation Approaches. Note That, Since FNMR = 0 @ FMR = 0.1% for Arcface [31] Following Eq. (12) and (13), the Value of RMMR is Equal to MMPMR/FMMPMR. Therefore, We Have Not Entered RMMR Separately in the Table Above

TABLE IV Quantitative Evaluation of Vulnerability of COTS Neurotec [50] FRS From Various Morph Generation Approaches. Note That, Since FNMR = 0 @ FMR = 0.1% for COTS Neurotec [50] Following Eq. (12) and (13), the Value of RMMR is Equal to MMPMR/FMMPMR. Therefore, We Have Not Entered RMMR Separately in the Table Above

TABLE V Quantitative Evaluation of Vulnerability of LCNN-29 [51] FRS From Various Morph Generation Approaches. Note That, Since FNMR = 0 @ FMR = 0.1% for LCNN-29 [51] Following Eq. (12) and (13), the Value of RMMR is Equal to MMPMR/FMMPMR. Therefore, We Have Not Entered RMMR Separately in the Table Above

C. Perceptual Image Quality Analysis

This section presents quantitative results of the proposed morphed image generation techniques using the perceptual image quality metrics PSNR and SSIM. Both of these metrics are computed based on the reference image. Morphed face images are generated based on parent face images from two contributory data subjects. Therefore, we used the parent face images from both contributory data subjects as the reference image against which the given morphed image is assessed and we average the obtained image quality scores for both parent images. Table VI indicates the quantitative results of both PSNR and SSIM on four different types of face morph generation mechanism in the digital format. Based on the obtained results, it can be observed that:

There is little deviation in the perceptual image quality metrics computed on all four different types of face morph generation mechanisms.
The proposed MIPGAN-I and MIPGAN-II methods indicate a slightly better image quality when compared to the StyleGAN [1] based face morphing method.
The proposed MIPGAN-I and facial landmarks-based methods [44] indicate a similar image quality.
Figures 6 and 7 indicate the box plots of the PSNR and SSIM quality scores. These results further indicate that the perceptual quality of the proposed MIPGAN-I and MIPGAN-II is superior to the existing state-of-the-art method based on StyleGAN [1].

TABLE VI Morph Image Quality Analysis Using PSNR and SSIM With 95% Confidence Interval

Fig. 6.

Box plots of PSNR values computed from different face morph generation methods (digital version).

Show All

Fig. 7.

Box plots of SSIM values computed from different face morph generation methods (digital version).

Show All

D. Human Observer Analysis

In this section, we discuss the quantitative detection performance of human observations regarding morphed face images, which are generated using MIPGAN-I and MIPGAN-II. To this extent, we have designed and developed a Web-portal to evaluate the human morph detection performance reflecting both single image-based morphing attack detection scenario (S-MAD) and differential morphing attack detection scenario (D-MAD). We have used only digital samples of both bona fide and morphed face images as the proposed MIPGAN is used to generate the images in the digital domain. Figure 8 (a) shows the screenshot of the Web-portal for S-MAD in which the human observer needs to decide whether the displayed image is a morphed face image or a bona fide image by looking at one single image at a time. Correspondingly, Figure 9 (a) presents the screenshot for D-MAD experiment where the observer needs to detect whether the unknown image is morphed given a trusted bona fide image as a reference. We have selected a total of 90 images where 15 images are from each group corresponding to bona fide, two different types of facial landmarks based morphing such as Landmarks-I [7] and Landmarks-II [44], StyleGAN [1] based face morphing, MIPGAN-I and MIPGAN-II based face morphing. To make the testing robust, all 90 chosen images correspond to unique data subjects and there is no repetition of data subjects. To avoid gender bias by participants, we have selected a near equal distribution of male and female data subjects in each group. We have chosen 90 images considering the time constraints required to assess these images for human observers. It was important that observers do not loose focus while conducting the detection experiments.

Fig. 8.

(a)Example of screen shot used for human observer study (b) Quantitative results.

Show All

Fig. 9.

(a) Example of screen shot used for differential human observer study (b) Quantitative results.

Show All

Figure 8 (b) shows the quantitative results of S-MAD obtained from 56 human observers, including 14 experienced and 42 inexperienced observers. The experienced observers’ group consists of researchers working in face morphing attack detection and as ID expert’s in border control, while the non-experienced group consists of students and other computer science professionals. As noticed from the Figure 8 (b) following are the main observations:

Detection performance of the bona fide images indicates better detection performance by both experienced and non-experienced group when compared to the morphed face image. The experienced group indicates the detection performance with an accuracy of 97.14%, while the non-experienced group indicates the detection performance with an accuracy of 79.21%.
Human observers with experience in face morphing demonstrate higher detection accuracy on four different face morph generation mechanisms than the inexperienced group.
Among the four different morphing types, the experienced group indicates that the detection of the landmarks-based morphing is challenging compared to other morphing mechanisms (deep learning-based).
Human observers with no experience in face morphing are marginally good in detecting the landmarks-based face morph images compared to other types of face morphing techniques. MIPGAN-I exhibits more challenging morph images to detect as compared to other morph generation methods.
Based on the obtained results, it can be noted that the human observers with good experience in face morphing can detect morphed images with an accuracy of 88.25% while the human observer with no knowledge of face morphing shows the challenge to detect the morphed face images with a detection accuracy of 64.31%.
The overall results from 56 human observers indicate that detecting morphed face images is challenging. Further, it is also interesting to note that detecting different face morphing types is also challenging.

For the quantitative results of D-MAD, 5 experienced observers and 10 inexperienced observers have participated. As shown in Figure 9 (b), the following observations are illustrated:

In the scenario of D-MAD, the group with relevant experiences achieved an overall 86% accuracy, which is better than 81% for the inexperienced group. However, this difference is much less than the difference in S-MAD, which means that the reference image can help inexperienced observers to identify the morphs.
Morphs generated by Landmark-II present a significant challenge as compared to other morph generation mechanisms in D-MAD. This may be attributed to a more natural skin texture appearance (comparing with GAN-based mechanisms) and fewer artefacts (comparing with Landmark-I) and observers focusing less on its minor artefacts in the pairwise comparison.
It is also interesting to see that the performances of experienced observers on detecting Landmark-II (80.95% and 72.00%), StyleGAN (90.48% and 88.00%), MIPGAN-II (90.95% and 86.67%), and bona fide images (90% and 88.00%) are lower than their performance in S-MAD. We believe this is because experienced observers do not pay critical attention to tolerable difference between the trusted reference image and the unknown comparison image.

E. Ablation Study

In order to measure the impact of the loss functions in the proposed approach, we conduct an extensive ablation study. The proposed loss function combines four different entities such as: perceptual loss ($Loss_{Perceptual}$ ), identity loss ($Loss_{Identity}$ ), identity difference ($Loss_{ID-Diff}$ ) and Multi-Scale Structural Similarity (MS-SSIM) loss ($Loss_{MS-SSIM}$ ). The main contribution of our work is to use identity information, which can be considered as a specific high-level feature, to measure the loss. However, high-level features also mean that it is hard for the gradient descent algorithm to ensure a good convergence during the optimization process. Therefore, we have introduced the perceptual loss that can measure relatively low-level features in addition to MS-SSIM and identity difference loss to effectively control the optimization process to generate a high-quality morphed image. We perform the ablation study by discarding each term in the loss function iteratively. We benchmark the vulnerability using COTS FRS (Cognitec FRS (Version 9.4.2)) and the open-source ArcFace FRS, as the proposed approach is dedicated to generating high-quality morphed images.

Table VII indicates the quantitative performance of the ablation study using a vulnerability analysis for both the COTS-FRS from Cognitec and for the open-source Arcface FRS with the proposed MIPGAN-I and MIPGAN-II methods. The ablation study is carried out on the digital morphed images generated using both MIPGAN-I and MIPGAN-II Methods. Figures 10 and 11 shows the qualitative performance of the ablation study on both MIPGAN-I and MIPGAN-II, respectively. Based on the obtained results, the following are the main observations:

Each term in our proposed loss function (see Eq. (10)) contributes to posing a greater challenge to a FRS for both proposed MIPGAN-I and MIPGAN-II morph generation frameworks.
Among the four other loss functions that we have used, the $Loss_{Perceptual}$ is critical in improving the proposed method’s performance. Discarding the perceptual loss has resulted in a degrading performance in both qualitative (see Figures 10 (d) and 11 (d)) and quantitative results.
The use of identity loss ($Loss_{Identity}$ ) also indicates the importance of improving the quantitative performance of the proposed method.
The $Loss_{MS-SSIM}$ also contributes to both qualitative and quantitative improvements of the morphs generated by the proposed method.

TABLE VII Vulnerability - Ablation Study on the Proposed Loss Function. Here, ✓ Indicates the Selected and $\times$ Indicates the Not Selected Loss Function in the Ablation Study

$Table VII- Vulnerability - Ablation Study on the Proposed Loss Function. Here, ✓ Indicates the Selected and $\times$ Indicates the Not Selected Loss Function in the Ablation Study$

$Fig. 10. - Qualitative results of ablation study using proposed MIPGAN-I (a) $Loss_{ID-Diff}$ (b) $Loss_{Identity}$ (c) $Loss_{MS-SSIM}$ (d) $Loss_{Perceptual}$ .$

Fig. 10.

Qualitative results of ablation study using proposed MIPGAN-I (a) $Loss_{ID-Diff}$ (b) $Loss_{Identity}$ (c) $Loss_{MS-SSIM}$ (d) $Loss_{Perceptual}$ .

Show All

$Fig. 11. - Qualitative results of ablation study using proposed MIPGAN-II (a) $Loss_{ID-Diff}$ (b) $Loss_{Identity}$ (c) $Loss_{MS-SSIM}$ (d) $Loss_{Perceptual}$ .$

Fig. 11.

Qualitative results of ablation study using proposed MIPGAN-II (a)$Loss_{ID-Diff}$ (b) $Loss_{Identity}$ (c) $Loss_{MS-SSIM}$ (d) $Loss_{Perceptual}$ .

Show All

F. Hyper-Parameters Study

This section presents both qualitative and quantitative results on the selection of hyper-parameters ($\lambda _{1}$ , $\lambda _{2}$ , $\lambda _{3}$ , and $\lambda _{4}$ ) in the proposed loss function employed in both MIPGAN-I and MIPGAN-II. Based on the ablation study reported in Section III-E, we have noticed that the perceptual loss is the vital component of our loss function (see Eq. (10)) and the other three terms can be used as constraints during the optimization. Therefore, the first step is to study the generated morphed face images’ attack strength by increasing and decreasing the value of $\lambda _{1}$ . Among the remaining three terms, we have also noticed from the ablation study that the identity loss ($Loss_{Identity}$ ) is contributing more towards generating a high-quality morph compared to the other two-loss functions ($loss_{MS-SSIM}$ , $Loss_{ID-Diff} $ ). We analyze the importance of identity loss ($Loss_{Identity}$ ) with respect to the other two loss functions ($Loss_{MS-SSIM}$ , $Loss_{ID-Diff} $ ) by increasing the value of $\lambda _{3}$ and/or $\lambda _{3}$ and decreasing the value of $\lambda _{2}$ . Further, we have also noticed from the ablation study that the loss functions $loss_{MS-SSIM}$ and $Loss_{ID-Diff} $ are less important and numerically very small. Therefore, we did not conduct studies on decreasing the values of $\lambda _{3}$ and $\lambda _{4}$ . Altogether, we have tested four different cases of changing the hyper-parameter values to generate the morphed face images. These generated morphed face images are benchmarked against the proposed hyper-parameter values through the vulnerability analysis using both COTS FRS (Cognitec FRS (Version 9.4.2)) and open-source ArcFace FRS.

Table VIII shows the qualitative performance and Figure 12 shows the qualitative performance of the hyper-parameter study. Based on the obtained results, it can be noted that the increase in the value of $\lambda _{1}$ and $\lambda _{3}$ shows comparable results with the proposed weighting schemes. However, based on our empirical study on hyper-parameters, we noted that: if we set $\lambda _{1}$ and $\lambda _{2}$ with equal weights, then, during the optimization, the generated morph image will soon become roughly similar to both contributing subjects. This will quickly reduce identity loss ($Loss_{Identity}$ ) to a minimal value and loose its importance in the optimization. Hence, we set a larger factor to the identity loss compared with other loss terms measuring high-level features to ensure our most important constraint term is still effective in the later stage of optimization. Further, both $\lambda _{3}$ and $\lambda _{4}$ can make the optimization goal more comprehensive but setting a large factor will obstruct the convergence. Especially setting high values to $\lambda _{4}$ will end up with an image not similar to both subjects. Therefore, the selection of the proposed hyper-parameters confirms the generation of a high-quality morphed image but also aids for effective and comprehensive optimization.

TABLE VIII Quantitative Results of Hyper-Parameters Study

$Fig. 12. - Qualitative results of Hyper-parameters study on both MIPGAN-I and MIPGAN-II (a) $\lambda _{1}$ (b) $\lambda _{2}$ (c) $\lambda _{3}$ (d) $\lambda _{4}$ .$

Fig. 12.

Qualitative results of Hyper-parameters study on both MIPGAN-I and MIPGAN-II (a)$\lambda _{1}$ (b) $\lambda _{2}$ (c) $\lambda _{3}$ (d) $\lambda _{4}$ .

Show All

G. Morphing Attack Detection Potential

Considering the success rate of the newly generated dataset, we naturally choose to evaluate the morphing attack detection performance to also validate the robustness of existing MAD mechanisms. Additionally, we investigate recent works about general face manipulation detection [53], [54], [55] and some results are shown in the supplementary material. In this work, we focus on single image based morphing attack detection (S-MAD) as it perfectly suits our dataset. MAD has been widely addressed in the literature by developing the techniques based on both deep learning [56], [57], [58], [59], [60] and non-deep learning [19], [61], [62] approaches. Readers can refer to [63] for an exclusive survey on face MAD. Owing to the recent works detailing the applicability of Hybrid features [35] and Ensemble features [36] in detecting morphing attacks, we choose to benchmark both Hybrid features [35] and Ensemble features [36]. While the Hybrid features [35] resort to extracting features using both scale space and color space combined with multiple classifiers, Ensemble features [36] employ a variety of textural features in conjunction with a set of classifiers. In common both approaches evaluate a wide variety of MAD mechanisms in a holistic manner supported by empirical results [35], [36]. In addition, the Hybrid features [35] mechanisms are also validated against the ongoing NIST FRVT MORPH challenge [37] with the best performance in detecting printed and scanned morph images justifying our selection of algorithm to benchmark the newly composed database.

The reporting of MAD performance is following the ISO/IEC metrics [64] namely the Attack Presentation Classification Error Rate (APCER (%)) which defines the proportion of attack images (morph images) incorrectly classified as bona fide images and the Bona fide Presentation Classification Error Rate (BPCER (%)) in which bona fide images incorrectly classified as attack images are counted [64] along with the Detection Equal Error Rate (D-EER (%)). To evaluate the generated morphed face image’s attack potential, we have sub-divided the newly generated database into two sets for training and testing that consists of independent data subjects with no overlap between the splits. The training set includes 690 bona fide images and 1190 morphed images. The testing set consists of 580 bona fide and 1310 morphed images. To effectively evaluate the performance of the MAD reflecting a real-life scenario, we report the results on both intra (training and testing dataset from the same morph generation approach) and inter (training on one type of morphing techniques and testing on another type of morphing techniques) evaluation of MAD mechanisms. Extensive experiments are performed on digital, print-scan and print-scan with compression data types to provide an in-depth analysis of the S-MAD performance. Tables IX, X, XI, XII, and XIII presents the quantitative results of MAD mechanisms on morph generation methods together with the SOTA morph generation techniques. Based on the results obtained from the intra-dataset experiments, we make some concrete observations as listed below:

The intra-dataset evaluation indicates that the morphing attacks are detected with a good success rate irrespective of the type of generation.
In general, the attack detection success rate is high with digital data when compared to print-scan and print-scan compression.
Among the different types of morph generation techniques, the Landmark-II based morph generation shows the highest error rates. The attack images created using StyleGAN and proposed MIPGAN can be efficiently detected using both the employed approaches with high accuracy. This can be attributed to the noises that are synthesized using GANs due to the computational modifications performed on the latent space in GAN-based morph generation methods.

In the following, we discuss the important observations based on the results obtained from inter-dataset MAD analysis:

The performance of the MAD techniques are degraded on all five different case studies as indicated in the Tables IX, X, XI, XII, and XIII.
Training MAD algorithms with one type of landmarks-based method did not show the improvement in detection performance of another kind of landmarks-based morph generation method.
When MAD mechanisms are trained using the Landmarks-I [7] method, the degraded performance is noted for all other morph generation methods except for the StyleGAN [1] based approach. This fact is also noted when we train the MAD techniques using StyleGAN [1] generated samples and test it with Landmarks-I [7] samples. Thus, the StyleGAN [1] based morph generation is easy to detect even when MAD mechanisms are not trained using the images from same morph generation scheme.
When MAD algorithms are trained using Landmarks-II [44] samples, MAD algorithms indicate degraded performance on all other morph generation techniques.
When MAD mechanisms are trained using the proposed MIPGAN-I generated samples. The MAD mechanisms indicate an excellent detection performance on MIPGAN-II samples. However, the detection performance of MAD methods is deceived with other morph generation techniques.
It is interesting to note that when MAD mechanisms are trained using MIPGAN-I/MIPGAN-II, higher detection accuracy can be observed for print-scan and print-scan with compression data when compared to digital morph data. A possible reason is that the noise generated together with the morphed images using the proposed MIPGAN-I/MIPGAN-II can approximate the generated noise resulting from the print-scan and print-scan compression process.
Based on the results of the inter-database MAD analysis, the detection of Landmarks-II [44] samples are challenging.

TABLE IX Quantitative Performance of MAD - Training- Landmarks-I [7]

TABLE X Quantitative Performance of MAD - Training- Landmarks-II [44]

TABLE XI Quantitative Performance of MAD - Training- StyleGAN [1]

TABLE XII Quantitative Performance of MAD - Training- MIPGAN-I

TABLE XIII Quantitative Performance of MAD - Training- MIPGAN-II

SECTION IV.

Limitations of Current Work and Potential Future Works

Despite this work presenting a new approach to generate strong morphing attacks, which are empirically evaluated using COTS FRS, our work has a few noted limitations. In the current scope of work, we evaluate the impact of print and scan (re-digitizing) using one printer reflecting a realistic scenario. The MAD mechanism employed in this work has not been investigated with a wide range of printers and scanners that may impact the MAD performance. While we assert that the MAD performance may not vary extremely, when tested with a wider combination of printers and scanners, that empirical evaluation is yet to be conducted in future works.

A second aspect is that the proposed approach needs pre-selection of ethnicity for generating stronger attacks. Figure 13 shows example morphed face images generated using the proposed method using MIPGAN-I and MIPGAN-II that fail to get verified to contributing subjects when ethnicity pre-selection is not performed [7]. We notice that the selection of contributing subjects plays an important role with the proposed method to generate stronger attacks with MIPGAN. It is our assertion that the selection of contributing subjects with similar geometric structures (particularly ethnicity and age) can improve the performance of the proposed system, but that aspect needs further investigation.

Fig. 13.

Examples of morphed images that failed to attack FRS (a) morphed face images generated using proposed MIPGAN-I (b) morphed face images generated using proposed MIPGAN-II.

Show All

SECTION V.

Conclusion

Addressing the limitations of generating the strong and severe morphing attacks using GAN, we have proposed a new architecture for generating face morphed images in this work. The proposed approach (MIPGAN with two variants) for devising strong morphing attacks uses identity prior driven GAN with a customized loss exploiting perceptual quality and identity factors to generate realistic images that can strongly threaten FRS. In order to validate the attack potential of the proposed morph generation method, we have created a new dataset consisting of 30, 000 morphed images and 15, 240 bona fide images. Both COTS and deep learning based FRS were evaluated empirically to measure the success rate of the new approach and vulnerability was reported indicating the applicability of the new approach and newly generated database. In a similar direction, the dataset is also validated for detection performance by studying two state-of-art MAD mechanisms. Despite the high attack detection success rate by employed MAD, we note that the morphed images generated by MIPGAN can severely threaten FRS in a present state without MAD in FRS.

ACKNOWLEDGMENT

This text reflects only the author’s views and the Commission is not liable for any use that may be made of the information contained therein.

References is not available for this document.

MIPGAN—Generating Strong and High Quality Morphing Attacks Using Identity Prior Driven GAN

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction