Introduction
Fingerprint recognition and identification represent a critical frontier in the fields of biometric technology and forensic science. Traditional fingerprint acquisition typically requires controlled environments and ideal conditions to ensure the capture of high-quality fingerprints. However, in the real world, partial, distorted, or smudged fingerprints are often encountered, as shown in Figure 1, making the identification process inherently challenging. Recent advancements in the domains of digital imaging, machine learning, and computer vision algorithms have introduced innovative solutions to address these challenges. By algorithmically completing and verifying identity information in incomplete fingerprint images, it is possible to significantly enhance the efficiency of automated fingerprint recognition systems. As the demand for rapid and accurate identification continues to grow across various sectors, from law enforcement to secure access control, the importance of optimizing and improving incomplete fingerprint identification technologies becomes increasingly evident.
On different types of fingerprint data, there are problems of incomplete fingerprints and background noise interference, which affect the accuracy of fingerprint recognition. From left to right, rolled fingerprint, snapped fingerprint, latent fingerprint.
Currently, the identification of incomplete fingerprints depends on the discriminability of extracted fingerprint features. To enhance the discriminability of fingerprint features, fingerprint enhancement and fingerprint recovery are two crucial techniques. Fingerprint enhancement [1], [2] is an image processing technique that aims to eliminate noise, blur, and distortion in fingerprint images, thereby highlighting the unique characteristics of fingerprint textures and improving the quality and clarity of fingerprint images. Fingerprint recovery [3], [4], [5] is a data recovery technique that aims to fill in missing or incomplete parts of fingerprint images to recover lost detail features, thereby making the fingerprint image more comprehensive and suitable for accurate identification. Both techniques play important roles in improving the identification accuracy of incomplete fingerprints, but existing methods still face respective challenges when dealing with incomplete fingerprints in practical scenarios.
Existing fingerprint enhancement techniques [2], [6], [7] have limitations in providing new authentication information for incomplete fingerprints, as they are unable to generate enhanced images of missing parts or discover new detail points. These techniques primarily focus on improving the quality and clarity of existing fingerprint features rather than providing additional identifying information. Moreover, current fingerprint completion methods [5], [8] are largely based on interpolating or extrapolating from the original fingerprint image, which often leads to inaccurate and unreliable results. These methods struggle to distinguish between critical identity features and background noise in the fingerprint image, resulting in completed content that contains a significant amount of erroneous information. This limits their effectiveness in accurately identifying individuals based on incomplete fingerprints.
To address these challenges, we propose a network specifically designed for incomplete fingerprints: the Finger Recovery Transformer (FingerRT). This network aims to solve the issue of insufficient identity feature information in incomplete fingerprint recognition. Integrating fingerprint feature enhancement techniques and the completion capabilities based on a visual Transformer, FingerRT demonstrates exceptional completion abilities on rolled, snapped, and latent fingerprints, effectively improving the recognition accuracy of incomplete fingerprints. FingerRT is built on the visual Transformer architecture and includes two main stages: an autoencoder stage for fingerprint denoising and enhancement, and a recovery stage for fingerprint completion utilizing the generative capabilities of the Transformer.
Furthermore, to further regulate the determinacy of fingerprint generation, we introduce a series of loss functions to control the generation outcomes. These constraints operate on three levels: consistency of features before and after completion, image-level consistency before and after completion, and periodic consistency between two completions. The multi-loss function constraints ensure the accuracy of fingerprint completion. More importantly, we introduce key identity recognition features from the fingerprint domain, minutiae, and orientation fields, for the first time in the Transformer architecture. Based on minutiae information, we propose a minutiae attention mechanism that helps the Transformer network focus more on areas dense with minutiae during completion, thereby increasing the discriminability of the completed parts. Based on orientation field information, we propose a loss constraint based on orientation consistency and continuity, ensuring the texture continuity and consistency of the completed part, and generating high-quality fingerprint textures.
In summary, our contributions are three-fold:
We introduce a novel network architecture based on Autoencoder and Transformer models for incomplete fingerprint restoration tasks. By removing environmental noise interference and completing the identity feature information of incomplete fingerprints, we effectively improve the recognition accuracy of incomplete fingerprints on multiple datasets.
FingerRT effectively utilizes key identity feature information of fingerprints, including orientation fields and minutiae points. The integration of orientation field information ensures the texture continuity and directional consistency of the completed fingerprint regions, while minutiae point information effectively distinguishes between key feature areas and non-key areas in fingerprints.
We have effectively completed incomplete fingerprint feature completion work on various types of fingerprints, including rolled, snapped, and latent fingerprints. The accuracy and effectiveness of FingerRT have been verified through multiple evaluation metrics.
Related Work
A. Fingerprint Recovery and Enhancement
Over the past few decades, there has been significant attention given to the research in fingerprint recovery, aimed at repairing damaged fingerprint images to enhance fingerprint recognition performance. Presently, methods for fingerprint image recovery can be broadly categorized into two primary classes: filtering methods and neural networks.
Within the realm of filtering methods, techniques such as Gabor filtering [9], circular Gabor filtering [10], logarithmic Gabor filtering [11], and curvelet Gabor filtering [12] are employed. Their primary objective lies in the correction and enhancement of impaired fingerprint images, especially those exhibiting distinct curvilinear structures. Notable studies such as the work by Feng and Jain [13] have utilized detail templates for fingerprint reconstruction, subsequently comparing them to the original fingerprints to evaluate their efficacy. An adaptive filtering method [14] that dynamically selects the filter size based on the local ridge frequency is proposed. Yun and Cho [15] provide an adaptive preprocessing algorithm that classifies fingerprint images into oily, dry, and normal images. In addition, Sutthiwichaiporn and Areekul [16] developed an adaptive enhanced spectral filtering algorithm that iteratively evaluates the quality of image blocks and performs spectral filtering on unfiltered blocks.
Li et al. [17] proposed a multi-task learning strategy based on deep neural networks. Their method consists of an enhancement branch and a directional branch, which are used to remove structural noise and guide enhancement, respectively. Prabhu et al. [3] designed a multi-scale convolutional network to overcome the challenge of suppressing complex artifacts while preserving detailed textures. Their model uses dilated convolutions and eliminates the need for padding in damaged fingerprints. In addition, Yadav and Tiwari [18] discussed different CNN-based methods and found that increasing the depth and width of the network can increase the receptive field of the image, but the computational cost is high. Recently, Joishi et al. [1] proposed a generative adversarial network model to improve the quality of the ridge structure of potential fingerprint images. In addition, Li et al. [8] proposed a robust and efficient fingerprint image restoration method based on a phase field model, which has good mathematical properties but can only handle small areas of missing fingerprints. However, there is still a lack of a network capable of simultaneously performing noise removal and information completion in incomplete fingerprint recognition.
B. Image Recovery
Traditional image recovery methods, such as diffusion [19], [20], [21] and GANs [22], [23], [24], often rely on strong low-level assumptions, which may be challenged in the face of extensive masking. In order to generate semantically consistent content, many recent methods based on convolutional neural networks (CNNs) have emerged. These methods typically employ similar encoder-decoder architectures. Pathak et al. [25] introduced adversarial training to image inpainting, which enabled the filling of semantic holes. Iizuka et al. [26] effectively improved the performance of inpainting by introducing a local-global discriminator. Yu et al. [27] proposed a novel visual attention module aimed at capturing long-range correlations in images. In addition, Liu et al. [28] designed a novel operation called “partial convolution” to mitigate the negative effects of masking areas caused by convolution.
The birth of the Mask Auto-Encoder(MAE) [29] demonstrates the powerful ability of visual Transformer in large-scale completion. Currently, based on MAE, the two-stage image recovery methods have shown excellent results. In these methods, the second stage first predicts the potential priors of visual markers through a deep autoregressive model and then uses the decoder of the first stage to map these marker sequences into image pixels. For example, DALL-E [30] improved the prediction of markers in the second stage by using Transformers. VQGAN [31] introduced adversarial loss and perceptual loss in the first stage to improve the fidelity of images. While VIM [32] proposed to further improve the labeling stage using the backbone network of VIT [33]. ICT [34] introduces an adversarial network again after Transformer to complement texture details. MaskGIT [35] uses a bidirectional transformer for token modeling and proposes parallel decoding.
However, the current methods that perform well on natural images cannot be directly transferred to the fingerprint domain. Because the patterns of fingerprints are different from natural images, and texture and identity information have higher requirements for the completion results, these methods lack sufficient constraints and cannot achieve effective fingerprint completion.
C. Vision Transformer
The Transformer architecture, originally proposed by Vaswani et al. [36], has revolutionized the field of natural language processing as a groundbreaking attention-based machine translation network. Its widespread adoption and transformative impact have led to the development of influential language models such as BERT [37] and GPT-3 [38], both built upon the foundation of the Transformer architecture. In the realm of computer vision, the pioneering work of ViT [33] demonstrated that pure Transformers can compete with convolutional neural networks (CNNs) in image classification tasks. By partitioning images into non-overlapping patches and projecting these smaller patches into vectors, similar to word embeddings [39] used in natural language processing, ViT paved the way for exploring network architectures and training methodologies that improve the efficiency of vision Transformers. Vision Transformer has shown strong capabilities in various fields of computer vision and pattern recognition.
In the realm of fingerprint recognition, the Transformer model [40], [41], [42] has progressively demonstrated its formidable capabilities, particularly in handling complex pattern recognition and feature extraction tasks. The core advantage of the Transformer lies in its ability to address long-range dependencies, which has led to its significant success across various fields, including image processing and natural language processing. Within the domain of fingerprint recognition, the introduction of the Transformer is primarily aimed at enhancing the extraction of fingerprint features and boosting the accuracy of fingerprint matching. Grosz and Jain [43] introduces an innovative approach based on Transformer that combines global and local embeddings to significantly enhance the accuracy of latent-to-rolled fingerprint matching. He et al. [44] introduce a novel framework using spatial Transformer networks (STN) with an AlignNet for alignment parameter estimation, treating partial fingerprint verification as a binary classification task. Tandon and Namboodiri [45] based on the Convolutional Transformer and equipped with a built-in minutiae extractor, offers a time- and memory-efficient solution for extracting both global and local representations of fingerprints. However, in the field of fingerprint recovery, how to combine fingerprint features with domain knowledge and Vision Transformer remains an issue worth exploring.
Preliminary
A. FingerNet
FingerNet [2] combines traditional methods with deep convolutional networks to guide the network structure design and weight initialization of fingerprint minutiae. They propose to expand simple structures by adding convolutional layers to the network to enhance its representation ability and release its weights by learning complex backgrounds in the data. This method can obtain typical fingerprint representations during the extraction of fingerprint minutiae, including directional fields, segmentation, minutiae, and enhanced images. FingerNet can effectively remove background noise, but cannot complete the information completion work of missing fingerprint parts. In FingerRT, we use the output of FingerNet, including the direction field, minutiae, and enhanced image, as the labels for learning denoising capabilities in the first stage, and the supervision information of minutiae and fingerprint direction field in the second stage. At the same time, we use the output of FingerNet for the complete fingerprint in the masked fingerprint recovery experiment as the ground truth for our comparison after completion.
B. The Attention Mechanism
The attention mechanism proposed by Lin et al. [46] is a fundamental component within the widely adopted Transformer architecture [36]. Expressed in mathematical notation, the attention mechanism produces an output tensor A by taking three input tensors \begin{equation*} A \mathrel {\mathrel {\mathop :}\hspace {-0.0672em}=}\rm {Attention}(Q, K, V)=\rm {Softmax}\left [{{Q {\bar {K}}^{\top }}}\right ] V. \tag {1}\end{equation*}
\begin{equation*} A_{i} = \sum _{j=1}^{n} \frac {\exp (Q_{i} {\bar {K}}_{j}^{\top })}{\sum _{k=1}^{n} \exp (Q_{i} {\bar {K}}_{k}^{\top })} V_{j}. \tag {2}\end{equation*}
Method
A. Motivation
Compared to natural image completion, fingerprint images can benefit from domain-specific knowledge. Minutiae and orientation fields represent two critical features of fingerprints. Minutiae provide local information, while orientation fields offer global structural information. By combining these two features, it is possible to effectively repair missing or damaged portions of a fingerprint, thereby preserving its integrity and continuity.
Furthermore, in comparison to convolutional neural networks, Transformers exhibit greater generative capabilities and are more adaptable to domain-specific fingerprint characteristics. The generative capacity of Transformers is primarily exemplified by their self-attention mechanism, which allows the model to capture long-range dependencies and interaction information among sequence elements. This empowers Transformers to effectively model long-distance, multi-level dependencies across image regions, resulting in the generation of text or image data with rich semantic information and accurate grammatical structures.
Building upon these observations, we introduce a novel network, FingerRT. By harnessing the generative capabilities of Transformers and the control capabilities of orientation fields and minutiae, we can achieve extensive and controllable fingerprint completion.
B. AutoEncoder
When performing fingerprint completion, FingerRT first needs to encode the input incomplete image. As shown in Figure 2, the image is compressed into a lower-dimensional representation called a latent space representation. The key step is to capture essential features of the fingerprint, such as edges, texture, and brightness distribution, while ignoring noise and unrelated information. In the latent space, the fingerprint representation is more abstract but contains the key information necessary to reconstruct the fingerprint image. In FingerRT, the encoder part consists of a Vision Transformer (VIT) architecture with 6 multi-head attention modules to better extract global fingerprint features.
The AutoEncoder part of FingerRT. This part is the first stage of training, consisting of the encoder of the Transformer and the decoder of the convolutional network. The decoder consists of two parts, the direction field decoder and the enhanced fingerprint decoder.
Afterward, FingerRT will start to complete the fingerprint. This stage uses the knowledge learned during the training process to predict the content of the missing area, which will be explained in detail in later chapters.
Finally, FingerRT will enter the decoding stage. FingerRT uses the decoder part to map the latent space vector back to the high-dimensional image space. In this process, the model not only generates content visually consistent with the known region but also ensures that the generated content conforms to basic fingerprint rules such as coherent fingerprint texture and smooth fingerprint edges. Based on this requirement, we designed asymmetric fingerprint encoder and decoder structures. To ensure the continuity of generated fingerprint texture, we use convolutional networks instead of Transformer as decoders. As shown in Figure 2, we use linear layers to map features from the Transformer-based completion network to (B, H, W, C) and then connect them to multiple layers of convolutional networks to map features to images. In addition, FingerRT also uses another single-layer convolutional branch to map features to the fingerprint direction field. The purpose of this branch is to use the direction field to supervise fingerprint completion. Note that in this stage, the enhanced image labels and direction field labels of the fingerprint are provided by FingerNet [2].
The training of the AutoEncoder constitutes the first stage of FingerRT’s training process and is crucial for extracting fingerprint features. It’s important to note that during this initial stage, the AutoEncoder’s training excludes any mask and solely focuses on reconstructing fingerprint images.
C. Minutiae Attention
In fingerprint completion, we aim for the Transformer network to pay more attention to the densely populated regions of minutiae, resulting in more identity-informative completion results. Therefore, we incorporate the prior information on the number of minutiae into the self-attention mechanism. We refer to this approach as the minutiae attention mechanism. It should be noted that our minutiae are predicted by FingerNet.
Note that coefficients in self-attention (Equation 1) are a categorical distribution over values
We introduce minutiae E into the self-attention mechanism as a posterior guide. The prior distribution of self-attention is the Dirichlet distribution (the conjugate prior distribution of categorical distribution) parameterized by minutiae count:
Then we can get the posterior by multiplying the likelihood of the prior. We set \begin{align*} p(\boldsymbol {\pi } \mid x,E) & \propto p(x \mid \boldsymbol {\pi }) p(\boldsymbol {\pi } \mid \boldsymbol {E}) \\ & =\prod _{i=1}^{n} \pi _{i}^{x_{i}} \prod _{j=1}^{n} \pi _{j}^{E_{j}}. \tag {3}\end{align*}
With the posterior probability distribution expression, we can compute the Maximum a posteriori probability (MAP) estimate of the parameters. To streamline the computation, we begin by taking the logarithm of the posterior probability, then we use the Lagrange multipliers to enforce the constraint that \begin{align*} L(\boldsymbol {\pi }, \lambda)=\sum _{i=1}^{n} x_{i} \log \pi _{i}+\sum _{i=1}^{n}\left ({{E_{i}}}\right) \log \pi _{i}+\lambda \left ({{1-\sum _{i=1}^{n} \pi _{i}}}\right). \tag {4}\end{align*}
We differentiate the Lagrangian concerning \begin{equation*} \frac {\partial }{\partial \pi _{i}} L(\pi, \lambda)=\frac {x_{i}}{\pi _{i}}+\frac {E_{i}}{\pi _{i}}-\lambda =\frac {x_{i}+E_{i}}{\pi _{i}}-\lambda = 0. \tag {5}\end{equation*}
Now we can solve \begin{equation*} \sum _{i=1}^{n} \pi _{i} =\sum _{i=1}^{n} \frac {x_{i}+E_{i}}{\lambda }. \tag {6}\end{equation*}
At last, we can get the MAP estimate of \begin{align*} \pi _{i}& =\frac {x_{i}+E_{i}}{\sum _{k=1}^{n} x_{k}+ E_{k}} \\ & =\frac {\exp \left ({{Q_{i} \bar {K}_{i}^{\top }}}\right)+E_{i}}{\sum _{k=1}^{n} \exp \left ({{Q_{i} \bar {K}_{k}^{\top }}}\right)+E_{k}} \tag {7}\end{align*}
Finally, we obtained a very simple attention mechanism based on posterior probability for fingerprint minutiae. By adding the number of minutiae in each VIT block of the fingerprint to the attention mechanism, FingerRT pays more attention to the more dense areas of minutiae, generating fingerprint images with better identity discrimination capabilities.
D. Fingerprint Orientation Field Constraint
Fingerprint direction field constraints are a key technology in fingerprint identification and recovery, which can improve the accuracy of fingerprint completion. Fingerprints are known for their unique flowing arrangement of ridges, and each region of the fingerprint image indicates the local direction of these ridges. The directional characteristics of these ridges serve as a powerful clue when some parts of the fingerprint are missing or of poor quality due to wear, damage, or interference.
In the field of fingerprint completion, the inherent consistency of the direction of the ridges and valleys within the fingerprint becomes very important. FingerRT is committed to ensuring that the newly completed ridges are continuous and consistent in direction with the adjacent original ridges. This continuity provides a template for predicting the possible direction of the ridges in the missing area, ensuring that the structure of the reconstructed area is seamlessly aligned with the structure of the original fingerprint.
FingerRT uses directional field constraints as one of the optimization objectives in the completion algorithm, paving the way for iterative refinement. During the iterative process, the completion results are continuously adjusted to ensure the rationality and continuity of the directional field. An additional advantage of integrating these constraints is to reduce errors during the recovery process. By comparing the directional field of the completed fingerprint with the original directional field, FingerRT uses mean squared error (MSE) to reduce the difference.
It should be noted that FingerRT uses only one convolutional network to predict the direction field, which allows the constraints of the direction field to be mainly focused on the completion network rather than the prediction network, providing more effective constraints for completing features.
E. Fingerprint Recovery Network
The second-stage training and constraining process is illustrated in Figure 3. In the second stage of training, the parameters of the AutoEncoder are fixed. The recovery network of FingerRT, shown in Figure 4, comprises both an encoder and a decoder. The encoder processes the partially masked input data in the low-dimensional latent space. By implementing the masking strategy, the encoder learns the underlying representations of fingerprints, which encourages the network to extract more robust and discriminative features as it cannot simply memorize the input. In FingerRT, the encoder comprises six layers of multi-head attention Transformer blocks.
The recovery network in FingerRT. This part is the second stage of training, using a Transformer network to complete fingerprint features under different mask strategies. The constraints of multiple loss functions make the completion results more accurate.
The recovery blocks in FingerRT consist of an encoder-decoder structure, with a deeper network in the decoder part. This module is used to predict the vector of the masked part.
The masking process involves randomly removing a certain percentage of the input data. In FingerRT, we use two different masking strategies at random. One is to randomly mask the input fingerprint by a certain percentage, which is 75 percent in this paper. The other is to mask the training fingerprint using a background part of a randomly segmented latent fingerprint.
After encoding, the latent representation is passed to the decoder to reconstruct the original input from this compressed knowledge. In FingerRT, the decoder has a deeper network structure, consisting of Transformer blocks with eight layers of multi-head attention. It is used to predict the features of the masked parts. Throughout the completion process, we use a zero vector to represent the vectors of the masked parts. In the prediction process of the completion network, we use the unmasked vectors as input, update the zero vectors after interaction through the attention mechanism, and constrain them with MSE loss to make the masked vectors correspond to the complete vector of the original input in the corresponding parts.
F. Loss Functions
1) Image Loss:
We use the Mean Squared Error(MSE) loss between each pixel to predict the mean of the sum of the squares of the error between the restored fingerprint and the original fingerprint image, in order to constrain the model’s restoration effect at the image level.
2) Feature Loss:
We use feature-level MSE Loss to compute the error between the outputs in the feature space. In this case, it quantifies the squared differences between two feature vectors, helping adjust model parameters to minimize such discrepancies and better learn the data’s feature representation. We employ feature-level MSE Loss to constrain the features of the masked parts after fingerprint restoration to be the same as the features before the mask.
3) Cycle Loss:
We draw on the idea of Cycle-GAN and also use cycle consistency loss to enhance the constraints. For the completion part, we require the result after completion to serve as the known part, and then complete the parts that were not masked before again. At this time, the features after the second completion need to be close to the original features. We also use the MSE loss function to constrain this similarity.
4) Orientation Field Loss:
The constraint of the fingerprint direction field consists of two parts, including the consistency of the fingerprint direction before and after completion, and the consistency of the fingerprint texture. The former is constrained using the MSE loss function of the fingerprint direction field, while the latter is constrained using the direction field consistency.
Following FingerNet, considering texture orientation is a strong fingerprint prior knowledge, we define the direction field consistency loss by:\begin{align*} \text { Coh}~ & =\frac {\sqrt {\left ({{(\sin \mathbf {2} \theta) * \mathbf {I}}}\right)^{2}+\left ({{(\cos \mathbf {2} \theta) * \mathbf {I}}}\right)^{2}}}{\sqrt {\sin ^{2} \mathbf {2} \theta +\cos ^{2} \mathbf {2} \theta * \mathbf {I}}} \\ L_{c o h} & =\frac {|R O I|}{\sum _{R O I} \rm {coh}}-1. \tag {8}\end{align*}
5) Final Loss:
The final loss consists of the four components described above, with a total of five loss functions. In this paper, the weight of each loss is set to 1.\begin{equation*} \mathcal {L}={\mathcal {L}}_{image}+ {\mathcal {L}}_{feature}+ {\mathcal {L}}_{orientation} + {\mathcal {L}}_{cycle} + {\mathcal {L}}_{coh} \tag {9}\end{equation*}
Experiments
A. Setup
1) Datasets:
We conduct experiments utilizing three distinct types of fingerprint datasets. In particular, our experimentation involves the utilization of NIST SD4 [47] to evaluate rolled fingerprints, making use of the FVC dataset [48] to evaluate snapped fingerprints, and employing the NIST SD27 dataset [49] to evaluate latent fingerprints.
The NIST SD4 dataset [47] is an established benchmark within the field of fingerprint indexing. It comprises 2000 pairs of rolled fingerprints, each exhibiting a spatial size of
The NIST SD27 dataset [49], encompasses a total of 258 latent fingerprints that have been meticulously annotated with expert-defined segmentation. Each individual fingerprint within this dataset exhibits dimensions of
The FVC (Fingerprint Verification Competition) dataset [48] functions as a standard reference for the assessment of fingerprint recognition algorithms and methodologies. This paper utilized the FVC2004 DB1 competition dataset to conduct testing.
Our training dataset includes three distinct categories of fingerprint datasets. We employ the NIST SD14, FVC2000, and FVC2002 datasets as our training resources. Additionally, to enhance the training process, we collect 100,000 latent fingerprints and 200,000 rolled fingerprints from crime scenes as supplementary training samples. It is important to note that our training procedure does not require the use of paired fingerprints. The NIST SD14 dataset comprises a total of 27,000 pairs of rolled fingerprints, with each pair measuring
The fingerprint data we collected from crime scenes to supplement the training set. Specifically, a is the latent fingerprints collected from crime scenes, b is the rolled fingerprints from archives, and c is an additional 100,000 rolled fingerprints from the training set, which are used as gallery fingerprints in the experiment.
To more accurately simulate real-world scenarios and assess the practical impact of masking on fingerprint recognition, we have recently prepared a new testing dataset known as the Real Fingerprint Masking Dataset (RFMD). The RFMD dataset comprises 200 pairs of fingerprint images taken from real-world criminal scenarios, as shown in Figure 6. By evaluating the performance of different algorithms on the RFMD dataset, we aim to accurately assess the effectiveness of various masking strategies and further optimize our algorithms to handle the complex scenarios that may arise in the real world.
The masked fingerprint data we collected from the real world, which we named the Real Fingerprint Masking Dataset (RFMD).
2) Metrics:
In three different incomplete fingerprint completion tasks, our main evaluation metric is still the accuracy of fingerprint matching. For rolled and snapped fingerprints, we use the fingerprint segmentation results output by FingerNet and expand the fingerprint using the FingerRT network to obtain a more complete fingerprint. The corresponding archival fingerprints are not processed and are directly used as matching targets. We measure the effectiveness of completion using Top-k matching accuracy. For latent fingerprints, since they contain more accurate manual segmentation labels, we use the manual labels as segmentation regions for expansion.
To more clearly demonstrate the fingerprint completion capability of FingerRT, we also use different mask strategies to simulate incomplete fingerprint completion tasks. In this experiment, we use commonly used metrics in the field of image quality, such as PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity Index), to evaluate the quality of fingerprint recovery, for both convolutional and snapshot fingerprints. In addition, the root mean square deviation (RMSD) is used as a metric to evaluate the accuracy of the estimated direction field, which is used to evaluate the quality of fingerprint direction recovery.
PSNR is mathematically represented as follows:\begin{align*} M S E& =\frac {1}{m n} \sum _{i=0}^{m-1} \sum _{j=0}^{n-1}[I(i, j)-K(i, j)]^{2} \tag {10}\\ P S N R& =10 \cdot \log _{10}\left ({{\frac {M A X_{I}^{2}}{M S E}}}\right) \tag {11}\end{align*}
PSNR represents the Peak signal-to-noise ratio in decibels (dB). MSE stands for the Mean Squared Error, which is the average of the squared differences between the corresponding pixels in two images being compared.
SSIM aims to measure the perceived similarity between two images based on their structural information. SSIM is based on three comparison measures between samples: luminance, contrast, and structure. It is mathematically defined as:\begin{equation*} \rm {SSIM}(x, y)=\frac {\left ({{2 ~\mu _{x} \mu _{y}+c_{1}}}\right)\left ({{2 \sigma _{x y}+c_{2}}}\right)}{\left ({{\mu _{x}^{2}+\mu _{y}^{2}+c_{1}}}\right)\left ({{\sigma _{x}^{2}+\sigma _{y}^{2}+c_{2}}}\right)} \tag {12}\end{equation*}
3) Implementation Details:
When dealing with latent and rolled fingerprints, FingerRT leverages fingerprint data collected from crime scenes and the NIST SD14 dataset for training. However, for snapped fingerprints, we deliberately broadened our data resources by incorporating additional snapped fingerprint data from FVC2000, FVC2002, and FVC2006, along with some pressed fingerprint data collected from crime scenes, to create a fine-tuning dataset. To further enhance processing effectiveness, we fine-tuned the crucial recovery network component based on the original FingerRT. In this process, we manually selected different network weights based on actual needs to specifically optimize the recovery quality of both snapped and rolled fingerprints.
Throughout the training process, all models are developed utilizing the PyTorch framework. The weights of the backbone are initialized through randomization, forgoing the use of pre-trained models. An AdamW optimizer is employed, with an initial learning rate set at 0.0001, alongside the implementation of a poly learning rate policy. Training is conducted on a server equipped with two RTX Titan GPUs, spanning a total of 200 epochs for stage 1 and 100 epochs for stage 2, and a batch size of 200 per GPU in stage 1 and 64 in stage 2, respectively.
B. Performance on Fingerprint Retrieval
In fingerprint retrieval tasks, each fingerprint is represented by an embedding vector through a retrieval network. The similarity between two fingerprints is measured by the product of their corresponding embedding vectors. To fairly assess the efficacy of the FingerRT fingerprint restoration model, comparisons of matching results before and after restoration are conducted across four datasets: FVC2004 DB1, NIST SD4-MASK, NIST SD27, and RFMD. For each query fingerprint, the ten closest fingerprints are selected from the database based on the embedding vectors obtained from the retrieval network and are matched one by one to obtain retrieval accuracy. It is important to emphasize that an additional 100,000 rolled fingerprints are introduced in all experiments to increase the retrieval challenge. Furthermore, three different fingerprint enhancement techniques, including FingerNet, FingerGAN [7], and MSU-AFIS [6], are implemented, and three distinct versions of FingerRT are trained with them as labels to ensure the fairness and completeness of the experiments. Two advanced fingerprint recognition algorithms, FingerPatches [52] and VeriFinger v12.3 [53], are utilized to systematically evaluate the improvements in fingerprint matching performance after completion and enhancement by FingerRT, from the perspectives of global and local features, respectively. The top-1, top-5, and top-10 accuracy rates in the matching results from these experiments are reported.
When using the NIST SD27 and FVC2004 datasets, our experimental workflow leverages the FingerNet to perform precise segmentation on each fingerprint. The segmented background is then used as a mask and inputted into our FingerRT model to achieve high-quality fingerprint completion. We input the fingerprint images completed by FingerRT into two different matching algorithms to independently verify the recognition accuracy in each case. The experimental results, as shown in table I show that FingerRT completion technology significantly improves the accuracy of both fingerprint matching algorithms, whether they are tested on real or simulated datasets. This is because FingerRT has an excellent completion ability that effectively restores critical information in fingerprint images, allowing matching algorithms to more accurately identify fingerprint features and perform the matching process.
C. Performance on Fingerprint Recovery
In order to facilitate a comprehensive comparison of different methods for fingerprint restoration, we design four fingerprint mask scenarios to evaluate the restoration capabilities of the networks. In the case of a random mask, we use
FingerRT’s masked fingerprint completion results on FVC2004 DB1.(a) Full fingerprint image; (b) Masked fingerprint image; (c) FingerNet enhancement of full fingerprint as ground truth; (d) FingerRT repair result of masked fingerprint image; (e) FingerNet extracted full fingerprint direction field as ground truth; (f) FingerRT extracted direction field after repair of the masked fingerprint.
Different vision transformer-based network completion results. FingerRT can more consistently and consistently complete fingerprint textures.
As shown in Table II and Table III, FingerRT demonstrates excellent performance across multiple performance metrics, surpassing other methods. Especially in the task of completing large-scale fingerprint images, FingerRT exhibits remarkable superiority. It outperforms convolutional network-based completion methods significantly in key performance metrics such as PSNR, SSIM, and directional field error. Note
Compared to the latest image completion networks, such as MaskGit and ICT, FingerRT also exhibits a greater advantage. When compared to MaskGit, which employs multi-step inference, and the multi-stage ICT network, FingerRT generates more continuous results that closely adhere to the original fingerprint texture distribution while completing the image in a single step. Figures 7, 8, and 10 show the results of FingerRT on FVC2004 DB1, NIST SD27, and NIST SD4, respectively. Figure 8 further illustrates the completion results of FingerRT. When applied to rolled fingerprint images, FingerRT demonstrates a remarkably high ability to restore fingerprint textures, producing textures that closely resemble the original results. This series of comparative experiments confirms the significant advantages of FingerRT in the task of completing masked fingerprint images.
D. Performance on Minutiae Recovery
We have devised a new set of experiments to validate the precision of minutiae newly recovered using FingerRT. These tests are carried out on the NIST SD4-MASK dataset. In the absence of manual labeling, we use the minutiae extracted by FingerNet from the original NIST SD4 dataset as our reference point. Additionally, we have trained a model to extract minutiae from images enhanced by FingerRT, utilizing the minutiae extracted by FingerNet from our training dataset as labels. We evaluate the accuracy of minutiae extraction using precision, recall, and F1-Score metrics.
For fingerprints that have been enhanced but not recovered by FingerRT, the average precision of extracted minutiae is 0.83, with a recall of 0.67 and an F1-Score of 74.1%. In contrast, recovered fingerprints show an average precision of 0.76, a recall of 0.82, and an F1-Score of 78.9%. Meanwhile, FingerNet’s enhancement of masked fingerprint minutiae achieves an average precision of 0.81, a recall of 0.70, and an F1-Score of 75.1%.
This experiment emphasizes FingerRT’s proficiency in extracting novel, previously undetected minutiae, with the majority of these extracted minutiae aligning with those present in complete fingerprints. Fingerprint completion using FingerRT retrieves more missing minutiae, subsequently elevating VeriFinger’s performance. Nevertheless, when comparing enhancement techniques rather than completion capabilities, FingerRT and FingerNet exhibit comparable performance in VeriFinger, revealing no substantial differences. For an exhaustive tally of minutiae, please refer to Table V.
Statistical analysis reveals that, on average, an additional 6.9 minutiae can be extracted from each rolled fingerprint after FingerRT recovery. Among these, the average number of accurate minutiae is 3.6, comprising more than half. Figure 11 further showcases the minutiae successfully recovered. Following precise recovery by FingerRT, we emphasize the successfully completed minutiae points with conspicuous red boxes, clearly demonstrating the substantial ranking enhancement attained by VeriFinger on the recovered fingerprints.
FingerRT’s masked fingerprint completion results on NIST SD4. (a) Full fingerprint image; (b) Masked fingerprint image; (c) FingerNet enhancement of full fingerprint as ground truth; (d) FingerRT repair result of masked fingerprint image; (e) FingerNet extracted full fingerprint direction field as ground truth; (f) FingerRT extracted direction field after repair of the masked fingerprint.
After the FingerRT recovery, we use red boxes to highlight the accurately completed minutiae, and clearly demonstrate the improved ranking performance of VeriFinger on the recover fingerprints. (a) Full fingerprint image; (b) Masked fingerprint image; (c) FingerNet enhancement of full fingerprint as ground truth; (d) FingerRT repair result of masked fingerprint image.
E. Ablation Studies
1) Different Combinations of Loss Functions:
In order to systematically explore the impact of the proposed combination of five loss functions on model performance, we designed a series of rigorous ablation experiments. These experiments gradually removed each loss function, analyzed its contribution to model performance, and thereby validated the necessity and interdependence of each loss function. Table IV shows the comparison of different choices of losses. The experimental results showed that the complete model, which combined all five loss functions, achieved significant improvements in generalization ability and robustness, providing a more effective solution for the task of fingerprint completion. This finding further confirms the important role of our proposed combination of loss functions in optimizing model performance.
2) The Ablation Study of Minutiae Attention:
To more comprehensively explore the impact of different attention mechanisms on fingerprint-matching accuracy, we conducted experiments using fingerprint matching accuracy as the primary metric to assess the effectiveness of attention mechanisms in fingerprint recovery. We introduce two commonly used attention mechanisms, Channel-attention [54] and Spatial-attention [55], to compare their effects with the minutiae attention mechanism. We perform ablation studies on all four datasets used in our paper.
The experimental results demonstrate that the minutiae attention mechanism significantly outperforms other attention mechanisms in improving the matching accuracy of completed fingerprints. Across multiple datasets, models employing the minutiae attention mechanism were able to better capture and restore crucial detail information in fingerprints, leading to substantial improvements in the accuracy and reliability of fingerprint matching.
3) The Ablation Study of ViT Backbone:
We include a detailed contrastive analysis featuring a ResNet50 backbone as a benchmark. This approach entails using consistent datasets, metrics, and experimental parameters to assess the performances of two different models, ensuring impartial and balanced evaluation. Our aim is to discern how traditional convolutional neural networks (CNNs) fare against newer frameworks like the Vision Transformer (ViT) when addressing substantial mask reconstruction challenges.
The outcomes of these comparisons are highly illustrative. They suggest that, despite their prevalence and established success in diverse image processing endeavors, CNNs encounter significant difficulties when tasked with reconstructing extensive masked image regions. This deficiency is particularly evident in contrast to the ViT model’s capabilities. A key concern identified with CNNs in this setting pertains to their inherent receptive field constraints. In the convolution process, an unavoidable amalgamation of masked and unmasked data occurs. This intermingling results in information leakage, causing the network to inadvertently rely on adjacent unmasked pixels instead of accurately reconstructing the masked areas. This limitation impairs CNN’s proficiency in mask reconstruction tasks, rendering it unsuitable for applications requiring high-fidelity image reconstruction.
F. Limitation and Bad Case Study
The primary challenges in fingerprint completion technology are twofold: refining minute details and mending extensive damage. The precise replication of minutiae features, such as diminutive splits and the ends of ridges, is essential for accurate identification but poses considerable difficulties. The uniqueness of each individual’s fingerprint lies in its intricate micro-details, which necessitate algorithms of exceptional precision and sensitivity for effective reconstruction. These minute aspects are crucial, as they significantly contribute to the distinctiveness of each fingerprint and are essential for accurate matching in biometric systems.
Additionally, the task of repairing fingerprints that have undergone significant damage or loss is notably complex, as shown in Figure 12. This recovery process demands algorithms to have a deep understanding of the general patterns and flow characteristics of fingerprints, while also possessing the ability to creatively fill in missing areas with limited information. Successful reconstruction relies on the fingerprint foreground providing sufficient information and requires a model trained with adequate data to predict the intricate ridge patterns and details that need to be repaired.
Conclusion
This paper introduces a network architecture specifically designed for incomplete fingerprint recognition, called Finger Recovery Transformer (FingerRT). The main design goal of FingerRT is to solve the problem of lacking identity feature information in incomplete fingerprints. The uniqueness of FingerRT lies in its combination of two key components: fingerprint feature enhancement and fingerprint completion network. FingerRT effectively utilizes key feature information in fingerprints, such as fingerprint direction field and details, and leverages the powerful generative ability of Transformer to accurately complete the missing parts of fingerprints. FingerRT performs well in three different types of fingerprint recognition tasks: rolled, snapped, and latent fingerprints. FingerRT provides a new approach for how to better handle the common incomplete fingerprint problems in practical applications, such as identity verification in the security field and criminal investigation and evidence analysis in the forensic science field.