Introduction
Group testing, initially conceived during World War II to efficiently identify rare defects in large populations, has evolved into a fundamental principle across a spectrum of fields, from medical diagnostics to contemporary computer science applications. This approach, which pools and assesses samples concurrently, maximizes resource efficiency and identifies specific attributes in subsets of populations. As today is systems—be it cloud infrastructures or the Internet of Things (IoT)—deepen their interconnectedness, there’s a burgeoning demand for robust tools that can authenticate entity affiliations within designated groups. Such verifications not only aid in identification but also ensure authentication in our interconnected digital ecosystems. Yet, despite the advances, contemporary systems present intricate challenges, underscoring a notable gap: the integration of traditional group testing techniques with entity affiliation verifications. Addressing this gap is paramount. It promises not just enhanced membership verification, especially using facial images, but also fortifies the twin pillars of privacy and security in digital realms.
Verification Principles and Challenges: Group membership verification, at its core, seeks to confirm the legitimacy of a specific item, device, or individual within a designated group. This procedure unfolds in two pivotal stages: firstly, the verification of the entity’s affiliation to the group, and secondly, the identification of the entity itself. A paramount challenge in this domain is striking a delicate balance: ensuring accurate distinction between members and non-members while simultaneously preserving individual identities to uphold privacy.
Verification Mechanism: At the core of the verification process lies the collection of templates. These encompass representations of items (via passive physical unclonable functions (PUFs)), devices (through active PUFs), and unique individual traits (captured by biometrics). These templates are securely stored within a server-side data structure. Upon a verification request, the client transmits a distinct signature to the server, which subsequently determines access authorization. It is essential that this mechanism preserves utmost anonymity to shield individual identities. The primary goal of this protocol is to enable the server to verify group membership with accuracy, without having detailed insight into the system’s internal mechanisms.
Operational Model: In our operational framework, a central server manages the group membership protocol, processing requests from various clients. When a client acquires a new template, it communicates with the server. It is essential to recognize the potential risks posed by such servers. While some might operate with adversarial intentions, our model assumes the server follows an “honest but curious” behavior. This implies that although the server executes its functions reliably and adheres to the prescribed protocol, there might be an inherent inclination to interpret cached templates or analyze the nature of incoming requests. Such servers could attempt to glean additional information from the data they handle, aiming either to reconstruct the original data or infer relationships between different queries. The system’s design meticulously ensures that the server cannot infer private templates, guaranteeing accurate validation of a user’s group affiliation and determining their group identity when necessary.
A. Related Work
The paradigm of anonymous authentication for group members has been a staple in cryptography for years [1]. However, the applications we consider, particularly for biometrics, diverge significantly from established models of authentication, identification, and secret binding. Traditional approaches ensure security either at the server or client end, but they invariably result in the revelation of the user’s identity.
Aggregating signals into a unified representation is widely adopted in computer vision. Techniques such as Bag of Words (BoW) [2] and VLAD [3] consolidate local descriptors from an image into a comprehensive descriptor. Contemporary adaptations like BoW encoding convolutional features of CNNs have been introduced [4]. Arandjelovic et al. further refined VLAD with a learnable pooling layer, termed NetVLAD [5], with subsequent iterations exploring soft assignment to multiple clusters [6]. Distinctly, Zhong et al. [7] developed a descriptor for the faces of celebrities present in the same photo. Although their system excels with two faces per image, performance drops with an increasing number of faces. Our approach, while inspired by these techniques, diverges primarily because our queries comprise a singular face, and our group representations usually encompass more than two faces captured under diverse conditions.
The amalgamation of templates into a collective representation is prominent in biometrics. For example, [8] combined multiple captures of a single person’s face to offset challenges from poses, expressions, and image quality. Contrastingly, our focus is on aggregating unique faces from different individuals in a group. Notably, while conventional methods prioritize retrieving visually analogous elements, they do not inherently provide security or privacy.
The methodologies introduced in [9] and [10] pivot on the notion of transforming randomly-selected templates into discrete embeddings. These are then coalesced to form a singular group representation. Their cost-effectiveness, coupled with the inherent challenges of identity reconstruction, positions these strategies as particularly compelling. Further intricacies, including the impact of the sparsity level of high-dimensional features characterizing group members on aspects like security, compactness, and verification efficacy, are expounded upon in [11]. In a progressive stride, [12] departs from the conventional approach. Instead of statically computing group representations based on predefined entities, this study concurrently learns group representation and their corresponding assignments. By adopting variance as a metric for dissimilarity, the approach endeavors to minimize inter-group differences while amplifying intra-group distinctions. Empirical assessments indicate that this dual learning mechanism fosters enhanced performance, all while maintaining robust security measures.
B. Contribution
We propose a novel group membership assignment based on joint modeling and learning of nonlinear transforms with priors and nonlinear transform representation and group representative assignment. The model parameters are learned by minimizing an empirical expectation of the model log likelihood, which, under the discrimination prior, corresponds to maximizing the discrimination power measure. The proposed model allows a rejection option over continuous, discontinuous, and overlapping regions in the transform domain. Our learning strategy is based on an iterative, alternating algorithm with three steps. The honest but curious server cannot reconstruct the signatures, i.e., the data structure is protected in terms of security requirements. Moreover, the privacy of the users is guaranteed by anonymous verification.
C. Outline
Sec. III elucidates our framework. A comprehensive performance analysis is presented in Sec. VII. Concluding remarks are in Sec. VIII.
D. Notation
Vectors and matrices are denoted by boldface lower-case (x) and upper-case (X) letters, respectively. We consider the same notation for a random vector x and its realization. The difference should be clear from the context. We use the notation
Preliminaries on Sparse Data Representation
The foundational principle of sparse representation is to express data using fewer components than traditionally required, without compromising the integrity or essential characteristics of the data. This principle has found widespread application across various domains, including feature extraction, clustering, classification, and signal reconstruction, underscoring its versatility and impact. Three primary models of sparse data representation are the synthesis model, the analysis model, and the transform model. In the following section, we briefly review the essence of these models. Figure 1 visualizes the core concepts of these models.
A. Synthesis Model
The synthesis model assumes that a high-dimensional data sample
B. Analysis Model
The analysis model employs a dictionary
C. Transform Model
The transform model assumes that a high-dimensional data sample
Proposed Framework Overview
A. Assignment and Learning Approach
In our research, we introduce a methodology for learning a privacy-preserving assignment. This stands in contrast to the works cited in [13], [14], [15], [16], [17], [18], and [19], where various generic privacy-preserving identification or search mechanisms have been put forward. Unlike these approaches, our protocol assigns group membership not by assessing similarity in the original data space, but through evaluating a min-max functional measure within a transformed space. This assignment procedure unfolds in two distinct steps: (i) generation of candidate nonlinear transform representations, and (ii) evaluation of the min-max measure across these representations leading to the assignment of group membership.
In our pursuit to develop an assignment mechanism that is sparse, discriminative, information-preserving, and respects privacy, we employ specific priors when modeling the candidate nonlinear transforms. In the learning process, we simultaneously focus on two core objectives: (i) estimating the parameters inherent to the candidate transforms, and (ii) assigning group membership. To achieve both the discriminative and privacy-preserving goals, we advocate for support intersection measures. According to these measures, every representation of a data instance that is assigned to a specific group should maximize its support intersection with other representations within that same group. On the contrary, a representation associated with a particular group should minimize its support intersection with representations from different groups. Here, “support intersection” between vector representations denotes the count of non-zero elements that both vectors contain at the same index positions.
B. Setup
Consider a dataset of K data vectors,
Our primary aim is to construct an information-preserving group representation,
Our secondary goal revolves around deriving an information-preserving transform representation,
C. Framework Overview
Our framework comprises the subsequent steps, as illustrated in Figure 2:
1) Preparation at Owner Side
The owner jointly estimates the sparse representations and assigns the group representatives from the data they possess. This estimation is achieved using a trained linear mapping, followed by a generalized element-wise nonlinearity. These group representatives are then transmitted to the server, which serves as a storage facility.
2) Querying at Client Side
The client produces a sparse representation from their query data. This is performed utilizing the same trained linear mapping and the generalized element-wise nonlinearity. Subsequently, the client forwards this sparse representation to the server.
3) Searching at Server Side
The server uses the received data to conduct similarity and dissimilarity searches among the group representatives. This process aims to precisely identify the group that most closely matches the client’s query.
Assignment Principle and Probabilistic Model
In this section, we elucidate the core principle guiding our group assignment methodology. Our focus is on the evaluation of candidate representations through a specific measure, and the probabilistic model integral to our approach of privacy-preserving group membership assignment.
A. Group Membership Assignment Principle
The procedure for our group membership assignment is bifurcated into:
Candidate Representation Generation, and
Group and Nonlinear Transform Representation Assignment.
1) Candidate Representation Generation
To derive a candidate representation, denoted as
For the generation of an ensemble of candidate representations, symbolized as
2) Group and NT Representation Assignment
Given the candidate representations, we proceed with the actual assignment. Unlike traditional similarity (or dissimilarity) measures, we employ a min-max functional measure. Each candidate representation undergoes evaluation through this measure. Based on the evaluation score, we determine the group membership and its associated NT representation.
B. Probabilistic Model
To introduce the probabilistic model of our framework, let’s start with the decomposition of the joint probability distribution \begin{align*} p \left ({{ \mathbf {x}_{i}, \mathbf {y}_{i}, \pmb {\theta } \mid \mathbf {W} }}\right ) & = p \left ({{ \mathbf {y}_{i}, \pmb {\theta } \mid \mathbf {x}_{i}, \mathbf {W} }}\right ) p \left ({{ \mathbf {x}_{i} }}\right ) \tag {1a}\\ & = p \left ({{ \mathbf {x}_{i} \mid \mathbf {y}_{i}, \pmb {\theta }, \mathbf {W} }}\right ) p \left ({{ \mathbf {y}_{i}, \pmb {\theta } \mid \mathbf {W} }}\right ) \tag {1b}\end{align*}
\begin{equation*} p \left ({{ \mathbf {y}_{i}, \pmb {\theta } \mid \mathbf {x}_{i}, \mathbf {W} }}\right ) \propto p \left ({{ \mathbf {x}_{i} \mid \mathbf {y}_{i}, \pmb {\theta }, \mathbf {W} }}\right ) p \left ({{ \mathbf {y}_{i}, \pmb {\theta } \mid \mathbf {W} }}\right ) \tag {2}\end{equation*}
\begin{align*} p \left ({{ \mathbf {Y}, \mathbf {W} \mid \mathbf {X}}}\right ) & = p \left ({{ \mathbf {Y} \mid \mathbf {W}, \mathbf {X}}}\right ) p \left ({{ \mathbf {W} \mid \mathbf {X}}}\right ) \tag {3a}\\ & = \prod _{i=1}^{K} \int _{\pmb {\theta }} p \left ({{ \mathbf {y}_{i}, \pmb {\theta } \mid \mathbf {x}_{i}, \mathbf {W} }}\right ) p \left ({{ \mathbf {W} }}\right ) \mathrm {d} \pmb {\theta } \tag {3b}\\ & = \prod _{i=1}^{K} \int _{\pmb {\theta }} p \left ({{ \mathbf {x}_{i} \mid \mathbf {y}_{i}, \pmb {\theta }, \mathbf {W} }}\right ) p \left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right ) p \left ({{ \mathbf {W} }}\right ) \mathrm {d} \pmb {\theta } \tag {3c}\end{align*}
Using this probabilistic formulation, we characterize our learning model through the following components:
A sparsifying error coupled with a
adjustment error, represented by the prior\pmb {\theta } .p \left ({{ \mathbf {x}_{i} \mid \mathbf {y}_{i}, \pmb {\theta }, \mathbf {W} }}\right) Discrimination and sparsity prior
.p \left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right) Linear map W prior
.p \left ({{ \mathbf {W} }}\right)
By adopting this structured probabilistic perspective, we aim for a holistic understanding of the dynamics between the observed inputs, their associated transformations, model parameters, and their linear mapping.
Addressing Probabilistic Modeling Priors
A. Sparsifying Error and Model Parameters Adjustment Error Prior
In our pursuit to capture the intricate relationship between observed data and its inherent structure, we formulate a probabilistic construct as presented below:\begin{align*} & p \! \left ({{ \mathbf {x}_{i}\! \mid \! \mathbf {y}_{i}, \pmb {\theta }, \mathbf {W} }}\right ) \! \\ & \quad \propto \exp \left [{{\! - \frac {1}{\beta _{\textsf {spa}}} d_{\textsf {spa}} \left ({{ \mathbf {W}\mathbf {x}_{i},\mathbf {y}_{i} }}\right ) \!-\! \frac {1}{\beta _{\textsf {adj}}} d_{\textsf {adj}} \left ({{ \mathbf {W}\mathbf {x}_{i}, \pmb {\theta } }}\right ) \!}}\right ], \tag {4}\end{align*}
Our primary objective for introducing this construct is twofold. Initially, we encapsulate the sparsifying error vector \begin{align*} d_{\textsf {spa}} \left ({{ \mathbf {W}\mathbf {x}_{i}, \mathbf { y}_{i} }}\right ) & = \frac {1}{2} {\Vert \mathbf {W} \mathbf {x}_{i} - \mathbf { y}_{i} \Vert }_{2}^{2}, \tag {5}\\ d_{\textsf {adj}} \left ({{ \mathbf {W}\mathbf {x}_{i}, \pmb {\theta } }}\right ) & = \frac {1}{2} {\Vert \mathbf {W} \mathbf {x}_{i} - \pmb {\nu }_{c} - \pmb {\tau }_{c} \Vert }_{2}^{2}. \tag {6}\end{align*}
\begin{align*} p \left ({{ \mathbf {x}_{i} \mid \mathbf {y}_{i}, \mathbf {W} }}\right ) & \propto \exp \left [{{ - \frac {1}{\beta _{\textsf {spa}}} d_{\textsf {spa}} \left ({{ \mathbf {W}\mathbf {x}_{i}, \mathbf { y}_{i} }}\right ) }}\right ], \tag {7}\\ p \left ({{ \mathbf {x}_{i} \mid \pmb {\theta }, \mathbf {W} }}\right ) & \propto \exp \left [{{ - \frac {1}{\beta _{\textsf {adj}}} d_{\textsf {adj}} \left ({{ \mathbf {W}\mathbf {x}_{i}, \pmb {\theta } }}\right ) }}\right ]. \tag {8}\end{align*}
B. Sparsity and Discrimination Prior
We model our sparsity-inducing and discrimination prior as follows:\begin{align*} p \! \left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right ) \! \propto \! \exp \left [{{\! - \frac {1}{\beta _{\textsf {disc}}} d_{\textsf {disc}} \! \left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right ) \!-\! \frac {1}{\beta _{\textsf {p}}} d_{\textsf {p}} \! \left ({{ \pmb {\theta } }}\right ) \!- \!\frac {1}{\beta _{\textsf {1}}} d_{\textsf {1}} \! \left ({{ \mathbf {y}_{i} }}\right ) \!}}\right ], \tag {9}\end{align*}
\begin{equation*} p \left ({{ \mathbf {y}_{i} }}\right ) \! \propto \! \exp \left ({{ - {\Vert \mathbf {y}_{i} \Vert }_{1} / \beta _{1} }}\right ). \tag {10}\end{equation*}
\begin{equation*} p \left ({{ \pmb {\theta } \! \mid \! \mathbf {y}_{i}}}\right ) \! \propto \! \exp \left [{{ - \frac {1}{\beta _{\textsf {disc}}} d_{\textsf {disc}} \! \left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right ) - \frac {1}{\beta _{\textsf {p}}} d_{\textsf {p}} \! \left ({{ \pmb {\theta } }}\right ) }}\right ] \tag {11}\end{equation*}
In order to define the measures
The discrimination measure is characterized by relationships based on the support intersection between
,\mathbf {y}_{i} and\pmb {\nu }_{c} .\pmb {\tau }_{c} The discrimination measure has a min-max structure, with its expression factored with respect to
and\pmb {\nu }_{c} .\pmb {\tau }_{c}
We then elucidate our foundational measures related to similarity, dissimilarity, and strength, focusing on the support intersection between the representations and the
1) Quantifying Representation Similarity and Dissimilarity
The measure \begin{equation*} \textsf {Sim} ( \mathbf { y}_{1}, \mathbf { y}_{2})= \Vert \mathbf { y}_{1}^{+} \! \odot \mathbf { y}_{2}^{+} \Vert _{1} \! + \Vert \mathbf { y}_{1}^{-} \! \odot \mathbf { y}_{2}^{-}\Vert _{1}, \tag {12}\end{equation*}
The term
The measure \begin{equation*} \textsf {Dis} ( \mathbf { y}_{1}, \mathbf { y}_{2})= \Vert \mathbf { y}_{1}^{+} \! \odot \mathbf { y}_{2}^{-} \Vert _{1} \! + \Vert \mathbf { y}_{1}^{-} \! \odot \mathbf { y}_{2}^{+} \Vert _{1}. \tag {13}\end{equation*}
Furthermore, the measure \begin{equation*} \textsf {Stg} ( \mathbf { y}_{1}, \mathbf { y}_{2} ) \! = \! \Vert \mathbf { y}_{1} \! \odot \mathbf { y}_{2} \Vert _{2}^{2}. \tag {14}\end{equation*}
2) Min-Max Discrimination Prior Measure
We introduce a min-max functional, denoted as \begin{align*} d_{\textsf {disc}} \left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right ) & = \min _{d}\max _{c} \big [ \textsf {Sim} \left ({{ \mathbf {y}_{i}, \pmb {\tau }_{d} }}\right ) + \textsf {Dis} \left ({{ \mathbf {y}_{i}, \pmb {\nu }_{c} }}\right ) ] \\ & \quad + \min _{d} \textsf {Stg} \left ({{ \mathbf {y}_{i}, \pmb {\tau }_{d} }}\right ). \tag {15}\end{align*}
The similarity relative to
is minimized, as measured by\pmb {\tau }_{d} .\textsf {Sim} The intersection strength regarding support with respect to
is minimized, as evaluated by\pmb {\tau }_{d} .\textsf {Stg} The similarity relative to
is maximized, as measured by\pmb {\nu }_{c} .\textsf {Sim}
To understand the dynamics of this prior measure, consider the formulation as akin to a dance of forces within a physical field. In this space, the point—our transform representation, denoted as
At the heart of this formulation is the score
3) Model Parameters \pmb {\theta }
Prior Measure
We introduce the measure \begin{align*} d_{\textsf {p}} \left ({{ \pmb {\theta } }}\right ) = \!\!\!\!\! \sum _{c \in \{1, \cdots, C\} } \, \sum _{d \in \{1, \cdots, C\} \setminus c } \!\!\! d_{\textsf {disc}} \left ({{ \pmb {\nu }_{c}, \pmb {\theta }_{d} }}\right ) + d_{\textsf {disc}} \left ({{ \pmb {\tau }_{c}, \pmb {\theta }_{d} }}\right ). \tag {16}\end{align*}
The dual summation ensures that for every parameter c, we account for its relationship with every other distinct parameter, d. In doing so,
C. Linear Map Prior
The purpose of the ‘linear map prior’ is to penalize information loss while simultaneously discouraging the adoption of undesirable matrices. To achieve this dual objective, we regularize both the condition number and the expected coherence of the matrix W. Specifically, the prior is defined as:\begin{equation*} p \left ({{ \mathbf {W} }}\right ) \propto \exp \left ({{ - \Omega \left ({{ \mathbf {W} }}\right ) }}\right ), \tag {17}\end{equation*}
\begin{align*} \Omega \left ({{ \mathbf {W} }}\right ) & = \frac {1}{\beta _{\textsf {W}_{1}}} {\Vert \mathbf {W} \Vert }_{F}^{2} + \frac {1}{\beta _{\textsf {W}_{2}}} {\Vert \mathbf {W} \mathbf {W}^{T} - \mathbf {I}\Vert }_{F}^{2} \\ & \quad - \frac {1}{\beta _{\textsf {W}_{3}}} \log \vert \det \mathbf {W}^{T} \mathbf {W} \vert. \tag {18}\end{align*}
The term
serves as a regularizer, penalizing the magnitude of matrix W to ensure stability.{\Vert \mathbf {W} \Vert }_{F}^{2} aims to minimize the deviation of W from orthogonality.{\Vert \mathbf {W} \mathbf {W}^{T} - \mathbf {I}\Vert }_{F}^{2} Lastly,
assesses the volume scaling factor of W, thereby promoting full rank matrices.\log \vert \det \mathbf {W}^{T} \mathbf {W} \vert
Learning Model to Assign Group Membership
A. Problem Formulation
Consider a given training dataset, denoted by X. The task of directly maximizing the probability function \begin{equation*} - \log \prod _{i=1}^{K} \int _{\pmb {\theta }} p \left ({{ \mathbf {x}_{i} \mid \mathbf {y}_{i}, \pmb {\theta }, \mathbf {W} }}\right ) p \left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right ) p \left ({{ \mathbf {W} }}\right ) \mathrm {d} \pmb {\theta } \tag {19}\end{equation*}
\begin{align*} & \hspace {-1pc}\int _{\pmb {\theta }_{\textsf {est}}} p \left ({{ \mathbf {x}_{i} \mid \mathbf {y}_{i}, \pmb {\theta }_{\textsf {est}}, \mathbf {W} }}\right ) p \left ({{ \mathbf {y}_{i}, \pmb {\theta }_{\textsf {est}} }}\right ) \mathrm {d} \pmb {\theta }_{\textsf {est}} \\ & \leq \kappa \; p \! \left ({{ \mathbf {x}_{i} \mid \mathbf {y}_{i}, \pmb {\theta }, \mathbf {W} }}\right ) p \left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right ), \tag {20}\end{align*}
Considering Eq. (3), (19), (20), we arrive at the following problem formulation:\begin{align*} \{ \widehat {\mathbf {Y}}, \widehat {\pmb {\theta }}, \widehat {\mathbf {W}} \} & =\arg \min _{\mathbf {Y}, \pmb {\theta }, \mathbf {W}} \sum _{i=1}^{K} \Big [ - \log \, p \! \left ({{ \mathbf {x}_{i} \mid \mathbf {y}_{i}, \pmb {\theta }, \mathbf {W} }}\right ) \\ & \qquad - \log \, p \! \left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right ) \Big ] - \log \, p \! \left ({{ \mathbf {W} }}\right ). \! \tag {21}\end{align*}
\begin{align*} \{ \widehat {\mathbf {Y}}, \widehat {\pmb {\theta }}, \widehat {\mathbf {W}} \} & =\arg \min _{\mathbf {Y}, \pmb {\theta }, \mathbf {W}} \sum _{i=1}^{K} \Big [ d_{\textsf {spa}} \left ({{ \mathbf {u}_{i}, \mathbf { y}_{i} }}\right ) + \lambda _{\textsf {adj}} \, d_{\textsf {adj}}\left ({{\mathbf {u}_{i}, \pmb {\theta } }}\right ) \\ & \quad + \lambda _{\textsf {disc}} \, d_{\textsf {disc}}\left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right ) + \lambda _{\textsf {p}} \, d_{\textsf {p}} \left ({{ \pmb {\theta } }}\right ) + \lambda _{1} {\Vert \mathbf {y}_{i} \Vert }_{1} \Big ] \\ & \quad + \lambda _{\textsf {W}} \, \Omega \left ({{ \mathbf {W} }}\right ), \! \! \tag {22}\end{align*}
An essential point of clarification here is that our derived solution to Eq. (22) does not equate to the maximum a posteriori (MAP) solution. While the MAP provides a point estimate by maximizing the posterior distribution, it is computationally intensive due to the complexity of calculating higher-dimensional integrals in the parameter space.
Instead, our framework articulated in Eq. (22) encapsulates an integrated marginal minimization (IMM) strategy. This method entails a sequence of iterative steps, where, in each iteration, we maximize the terms of our model with respect to the variables Y,
Computational Efficiency: Unlike the MAP approach, which often requires complex integrations over the parameter space, our method iteratively maximizes simpler terms, allowing for more efficient computational procedures.
Flexibility: Given the iterative nature of our method, it is more adaptable to various datasets and can better accommodate changes in data distribution, especially in scenarios with limited or evolving data.
Stability: The integrated approach reduces the risk of settling into local optima that do not represent the broader dataset well. By considering marginal effects iteratively, we can capture more global patterns in the data, enhancing the model’s generalization capabilities.
In essence, while the MAP solution offers a theoretical ideal, practical constraints necessitate the use of strategies like integrated marginal minimization, which strike a balance between theoretical rigor and computational tractability. Our methodology facilitates the identification of a joint local maximum in the space
B. Learning Algorithm
We propose an alternating block coordinate descent algorithm that progresses across three stages:
Simultaneous estimation of the representation
and assignment of group membership c,\mathbf {y}_{i} Update of the group parameters, represented as
,\pmb {\theta } = \{ \pmb {\theta }_{1}, \cdots, \pmb {\theta }_{C} \} = \{ \left [{{ \pmb {\nu }_{1}, \pmb {\tau }_{1} }}\right], \cdots, \left [{{ \pmb {\nu }_{C}, \pmb {\tau }_{C} }}\right] \} Update of the linear map W.
1) NT Representation Estimation and Assignment
Given the dataset X, the latest estimate of the group membership parameters \begin{align*} \left [{{ \widehat {\mathbf {y}}_{1}, \cdots, \widehat {\mathbf {y}}_{K} }}\right ] & = \arg \min _{\left [{{ {\mathbf {y}}_{1}, \cdots, {\mathbf {y}}_{K} }}\right ]} \sum _{i=1}^{K} \Big [ \frac {1}{2} {\Vert \mathbf {u}_{i} - \mathbf {y}_{i} \Vert }_{2}^{2} \\ & \qquad + \lambda _{1} {\Vert \mathbf {y}_{i} \Vert }_{1} + \lambda _{\textsf {disc}} \,d_{\textsf {disc}} \left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right ) \Big ].\!\!\!\! \tag {23}\end{align*}
Candidate NT Representation Estimation: During the first phase, observe that for each pair \begin{align*} \!\!\!\! (\mathsf {P_{S}})\!: \widehat {\mathbf {q}}_{i,c} \!& = \arg \min _{ \mathbf {y}_{i}} \frac {1}{2} \Vert \mathbf {u}_{i} \! - \! \mathbf {q}_{i,c} \Vert _{2}^{2} + \lambda _{1} \pmb {1}^{T} \vert \mathbf {q}_{i,c} \vert \\ & \quad + \lambda _{\textsf {disc}} \, d_{\textsf {disc}} \! \left ({{ \mathbf {q}_{i,c}, \left [{{ \pmb {\nu }_{\!c}, \pmb {\tau }_{\!c} }}\right ] }}\right )\!.\!\! \tag {24}\end{align*}
\begin{align*} \widehat {\mathbf {q}}_{i,c} = \psi \left ({{ \mathbf {u}_{i} }}\right ) := \mathrm {sign}(\mathbf {u}_{i}) \odot \max ( \vert \mathbf {u}_{i} \vert - \mathbf {g}_{i} - \lambda _{1} \pmb {1}, \pmb {0}) \oslash \mathbf {k}_{c}, \tag {25}\end{align*}
Proof:
Given the available database X and the current estimate of the linear map W, the representation estimation problem is formulated in (23). Let \begin{align*} & \min _{\mathbf {y}} \frac {1}{2} \Vert \mathbf {W} \mathbf {x} - \mathbf {y} \Vert _{2}^{2} + \lambda _{1} \pmb {1}^{T} \vert \mathbf {y} \vert + \lambda _{\textsf {disc}} \Big ( ( \mathbf {y}^{+} )^{T} \pmb {\tau }^{+}_{c} + (\mathbf {y}^{-})^{T} \pmb {\tau }^{-}_{c} \\ & \quad + ( \pmb {\tau } _{c} \odot \pmb {\tau }_{c} )^{T} ( \mathbf {y} \odot \mathbf {y} ) - \left [{{ ( \mathbf {y}^{+} )^{T} \pmb {\nu }^{+}_{c} + (\mathbf {y}^{-})^{T} \pmb {\nu }^{-}_{c} }}\right ] \Big ). \tag {26}\end{align*}
\begin{align*} \mathbf {u} & =\mathbf {W} \mathbf {x} \tag {27}\\ \mathbf {h}_{1}^{T} \vert \mathbf {y} \vert & =\left ({{ \mathbf {y}^{+} }}\right )^{T} \pmb {\tau }^{+} + \left ({{ \mathbf {y}^{-} }}\right )^{T} \pmb {\tau }^{-}, \tag {28}\\ \mathbf {h}_{2}^{T} \vert \mathbf {y} \vert & =\left ({{ \mathbf {y}^{+} }}\right )^{T} \pmb {\nu }^{+} + \left ({{ \mathbf {y}^{-} }}\right )^{T} \pmb {\nu }^{-}, \tag {29}\\ \mathbf {s}_{c}^{T} & =\lambda _{\textsf {disc}} {\left ({{ \pmb {\tau }_{c} \odot \pmb {\tau }_{c} }}\right )}^{T}, \tag {30}\\ \mathbf {g}_{i}^{T} & =\lambda _{\textsf {disc}} \left ({{ \mathbf {h}_{1}^{T} - \mathbf {h}_{2}^{T} }}\right ), \tag {31}\end{align*}
\begin{equation*} \min _{\mathbf { y}} \frac {1}{2} { \Vert \mathbf {y} - \mathbf {u} \Vert }_{2}^{2} + \mathbf {g}_{i}^{T} \vert \mathbf {y} \vert + \mathbf {s}_{c}^{T} \left ({{ \mathbf {y} \odot \mathbf {y} }}\right ) + \lambda _{1} \pmb {1}^{T} \vert \mathbf {y} \vert. \tag {32}\end{equation*}
\begin{align*} & \hspace {-0.2pc} \mathrm {sign} \left ({{ \mathbf {y} }}\right ) \odot \vert \mathbf {y} \vert \odot \left ({{ \pmb {1} + 2 \mathbf {s}_{c}}}\right ) - \mathrm {sign} \left ({{ \mathbf {u} }}\right ) \odot \vert \mathbf {u} \vert + \lambda _{1} \mathrm {sign} \left ({{ \mathbf {y} }}\right ) \\ & \quad + \lambda _{\textsf {disc}} \big ( \mathrm {sign} \left ({{ \mathbf {y}^{+} }}\right ) \odot \pmb {\tau }_{c}^{+} + \mathrm {sign} \left ({{ \mathbf {y}^{-} }}\right ) \odot \pmb {\tau }_{c}^{-} \\ & \quad - \mathrm {sign} ( \mathbf {y}^{+} ) \odot \pmb {\nu }_{c}^{+} - \mathrm {sign} ( \mathbf {y}^{-} ) \odot \pmb {\nu }_{c}^{-} \big ) = \pmb {0}. \tag {33}\end{align*}
\begin{align*} & \hspace {-0.2pc}\vert \mathbf {y} \vert \odot \left ({{ \pmb {1} + 2 \mathbf {s}_{c}}}\right ) \\ & = \max \Big ( \vert \mathbf {u} \vert - \lambda _{\textsf {disc}} \big ( \mathrm {sign} \left ({{ \mathbf {u}^{+} }}\right ) \odot \pmb {\tau }_{c}^{+} - \mathrm {sign} \left ({{ \mathbf {u}^{-} }}\right ) \\ & \quad \odot \pmb {\tau }_{c}^{-} - \mathrm {sign} \left ({{ \mathbf {u}^{+} }}\right ) \odot \pmb {\nu }_{c}^{+} + \mathrm {sign} \left ({{ \mathbf {u}^{-} }}\right ) \odot \pmb {\nu }_{c}^{-} \big ) - \lambda _{1} \pmb {1}, \pmb {0} \Big ), \tag {34}\end{align*}
\begin{equation*} \mathbf {y} \! = \! \mathrm {sign} \left ({{ \mathbf {u} }}\right ) \odot \max \left ({{ \vert \mathbf {u} \vert - \mathbf {g}_{i} - \lambda _{1} \pmb {1}, \pmb {0} }}\right ) \oslash \left ({{ \pmb {1} + 2 \mathbf {s}_{c}}}\right ), \tag {35}\end{equation*}
Assignment: During the second step, given all the candidate representations \begin{equation*} \textsf {S}(\mathbf { q}_{i,c}, \left [{{ \pmb {\nu }_{c}, \pmb {\tau }_{c} }}\right ])\!=\!\textsf {Sim} \left ({{ \mathbf {q}_{i,c}, \pmb {\tau }_{c} }}\right ) \!+\! \textsf {Stg} \left ({{ \mathbf {y}_{i}, \pmb {\tau }_{c} }}\right ) \!+\! \textsf {Sim} \left ({{ \mathbf {y}_{i}, \pmb {\nu }_{c}}}\right ).\end{equation*}
\begin{align*} \widehat {c} & = \arg \min _{c} \, \textsf {S} (\mathbf { q}_{i,c}, \pmb {\nu }_{c}, \pmb {\tau }_{c}), \tag {36}\\ {\widehat {\mathbf { y}}}_{i} & = \mathbf { q}_{i, \widehat {c}} \;. \tag {37}\end{align*}
2) Group Parameters \pmb {\theta }
Update
Given the current estimate of the linear map W and representations \begin{align*} \widehat {\pmb {\theta }}& =\arg \min _{\pmb {\theta }} \sum _{i=1}^{K} \big [ \lambda _{\textsf {disc}} \, d_{\textsf {disc}}\! \left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right ) \\ & \quad + \lambda _{\textsf {adj}} \, d_{\textsf {adj}} \! \left ({{ \mathbf {u}_{i}, \pmb {\theta } }}\right ) \! \big ) + \lambda _{\textsf {p}} \, d_{\textsf {p}}\! \left ({{ \pmb {\theta } }}\right ) \big ]. \tag {38}\end{align*}
Single \begin{align*} (\mathsf {P_{T_{1}}}):\widehat {\pmb {\nu }}_{c}& = \arg \min _{\pmb {\nu }_{c}} \sum _{i \in {\mathcal {I}}_{c} } \frac {1}{2} \Vert \mathbf {u}_{i} - \pmb {\nu }_{c} - \pmb {\tau }_{c} \Vert _{2}^{2} \\ & \quad + \lambda _{\textsf {disc}} \!\!\!\!\!\!\! \sum _{d \in \{1,\cdots, C\}\setminus c } \!\!\!\! d_{\textsf {disc}} \left ({{ \pmb {\nu }_{c}, \pmb {\theta }_{d} }}\right ),\!\! \tag {39}\end{align*}
Proof:
Given \begin{align*} & \hspace {-0.2pc} \min _{\pmb {\nu }_{c1}} \sum _{m} \frac {1}{2} \Vert \mathbf {W}\mathbf {x}_{c1,m} - \pmb {\nu }_{c1} - \pmb {\tau }_{c1} \Vert _{2}^{2} \\ & \quad + \lambda _{\textsf {disc}} \sum _{m} \left ({{ \textsf {Sim} \left ({{ \mathbf {y}_{c1,m}, \pmb {\tau }_{c1} }}\right ) }}\right. \\ & \quad - \left. {{\textsf {Sim} \left ({{ \mathbf {y}_{c1,m}, \pmb {\nu }_{c1} }}\right ) + \textsf {Stg} \left ({{ \mathbf {y}_{c1,m}, \pmb {\tau }_{c1} }}\right ) }}\right ) \\ & \quad + \lambda _{\textsf {p}} \sum _{c \neq c1} \left ({{ \textsf {Sim} \left ({{ \pmb {\nu }_{c1}, \pmb {\tau }_{c} }}\right ) \!- \!\textsf {Sim} \left ({{ \pmb {\nu }_{c1}, \pmb {\nu }_{c} }}\right ) \! + \! \textsf {Stg} \left ({{\pmb {\nu }_{c1}, \pmb {\tau }_{c} }}\right ) }}\right ). \tag {40}\end{align*}
\begin{align*} & \hspace {-0.2pc}\left ({{ M\mathbf {v} - \mathbf {u} }}\right ) \! - \! \lambda _{\textsf {disc}} \sum _{m} \big ( \mathrm {sign} \left ({{ \mathbf {v}^{+} }}\right ) \odot \mathbf {y}_{c1,m}^{+} \! -\! \mathrm {sign} \left ({{ \mathbf {v}^{-} }}\right ) \odot \mathbf {y}_{c1,m}^{-} \Big ) \\ & \quad + \, \lambda _{\textsf {p}} \sum _{c \neq c1} \Big ( \mathrm {sign} \left ({{ \mathbf {v}^{+} }}\right ) \odot \pmb {\tau }_{c}^{+} - \mathrm {sign} \left ({{ \mathbf {v}^{-} }}\right ) \odot \pmb {\tau }_{c}^{-} \\ & \quad -\, \mathrm {sign} \left ({{ \mathbf {v}^{+} }}\right ) \odot \pmb {\nu }_{c}^{+} + \mathrm {sign} \left ({{ \mathbf {v}^{-} }}\right ) \odot \pmb {\nu }_{c}^{-} \\ & \quad +\, 2 \; \mathbf {v} \odot \left ({{ \pmb {\tau }_{c} \odot \pmb {\tau }_{c} }}\right ) \Big ). \tag {41}\end{align*}
Denote:\begin{align*} \mathbf {h}_{y}^{T} \vert \mathbf {v} \vert & ={\left ({{\mathbf {v}^{+} }}\right )}^{T} \mathbf {y}_{c1,m}^{+} - {\left ({{ \mathbf {u}^{-} }}\right )}^{T} \mathbf {y}_{c1,m}^{-}, \\ \mathbf {h}_{1}^{T} \vert \mathbf {v} \vert & ={\left ({{ \mathbf {v}^{+} }}\right ) }^{T} \pmb {\tau }_{c}^{+} - {\left ({{ \mathbf {u}^{-} }}\right )}^{T} \pmb {\tau }_{c}^{-}, \\ \mathbf {h}_{2}^{T} \vert \mathbf {v} \vert & ={\left ({{ \mathbf {v}^{+} }}\right ) }^{T} \pmb {\nu }_{c}^{+} - {\left ({{ \mathbf {u}^{-} }}\right )}^{T} \pmb {\nu }_{c}^{-}, \\ \mathbf {s}_{c}^{T} & = \lambda _{\textsf {p}} {\left ({{ \pmb {\tau }_{c} \odot \pmb {\tau }_{c} }}\right )}^{T}, \\ \mathbf {g}_{c}^{T} & = \lambda _{\textsf {p}} \left ({{ \mathbf {h}_{1}^{T} - \mathbf {h}_{2}^{T} }}\right ), \\ \mathbf {p}_{c}^{T} & =\lambda _{\textsf {disc}} \mathbf {h}_{y}^{T}, \tag {42}\end{align*}
\begin{align*} \mathbf {h}_{y} & = \sum _{m} \max \left ({{ \mathbf {y}_{c1,m}, \pmb {0} }}\right ) \odot \mathrm {sign} \left ({{ \max \left ({{ \mathbf {u}, \pmb {0} }}\right ) }}\right ) \\ & \quad - \sum _{m} \max \left ({{ -\mathbf {y}_{c1,m}, \pmb {0} }}\right ) \odot \mathrm {sign} \left ({{ \max \left ({{ - \mathbf {u}, \pmb {0} }}\right ) }}\right ), \tag {43}\\ \mathbf {h}_{1} & = \sum _{c \neq c1} \max \left ({{ \pmb {\tau }_{c}, \pmb {0} }}\right ) \odot \mathrm {sign} \left ({{ \max \left ({{ \mathbf {u}, \pmb {0} }}\right ) }}\right ) \\ \ & \quad - \sum _{c \neq c1} \max \left ({{ - \pmb {\tau }_{c}, \pmb {0} }}\right ) \odot \mathrm {sign} \left ({{ \max \left ({{ - \mathbf {u}, \pmb {0} }}\right ) }}\right ), \tag {44}\\ \mathbf {h}_{2} & = \sum _{c \neq c1} \max \left ({{ \pmb {\nu }_{c}, \pmb {0} }}\right ) \odot \mathrm {sign} \left ({{ \max \left ({{ \mathbf {u}, \pmb {0} }}\right ) }}\right ) \\ & \quad - \sum _{c \neq c1} \max \left ({{ - \pmb {\nu }_{c}, \pmb {0} }}\right ) \odot \mathrm {sign} \left ({{ \max \left ({{ - \mathbf {u}, \pmb {0} }}\right ) }}\right ), \tag {45}\end{align*}
Take the magnitude decomposition \begin{equation*} \mathbf {v} = \mathrm {sign} \left ({{ \mathbf {u} }}\right ) \odot \max \left ({{ \vert \mathbf {u} \vert + \mathbf {p}_{c} - \mathbf {g}_{c}, \pmb {0} }}\right ) \oslash \left ({{ M + 2 \mathbf {s}_{c} }}\right ), \tag {46}\end{equation*}
Single \begin{align*} \!\!\!\! (\mathsf {P_{T_{2}}}):\widehat {\pmb {\tau }}_{c}& = \arg \min _{\pmb {\nu }_{c}} \sum _{ i \in {\mathcal {I}}_{c} } \frac {1}{2} \Vert \mathbf {u}_{i} - \pmb {\nu }_{c} - \pmb {\tau }_{c} \Vert _{2}^{2} \\ & \quad + \lambda _{\textsf {disc}} \!\!\!\!\!\!\! \sum _{d \in \{1,\cdots, C\} \setminus c } \!\!\!\! d_{\textsf {disc}} \left ({{ \pmb {\tau }_{c}, \pmb {\theta }_{d} }}\right ). \tag {47}\end{align*}
Similar to the previous parameter update, the solution for this problem adheres structurally to (25). Nevertheless, the distinction between the two lies in their thresholding and normalization vectors.
3) Linear Map \mathbf{W}
Update
Consider a given data set X, its corresponding representations Y, and the group membership parameters \begin{align*} \widehat {\mathbf {W}} & = \arg \min _{\mathbf {W}} \frac {1}{2} \Vert \mathbf {W} \mathbf {X} - \mathbf {R}\Vert _{F}^{2} + \frac {\lambda _{\textsf {W}_{1}}}{2} {\Vert \mathbf {W} \Vert }_{F}^{2} \\ & \quad + \,\frac {\lambda _{\textsf {W}_{2}}}{2} {\Vert \mathbf {W} \mathbf {W}^{T} \! \! - \mathbf {I}\Vert }_{F}^{2} - \frac {\lambda _{\textsf {W}_{3}}}{2} \log \vert \! \det \mathbf {W}^{T} \mathbf {W} \vert, \!\! \tag {48}\end{align*}
Numerical Evaluation
In this section, we assess the efficacy of our proposed scheme through its application in face recognition across two scenarios. We then compare its performance against established methods: EoA-ML, AoE-ML [10], and JLAR [12]. While EoA-ML and AoE-ML enroll K individuals across C random groups without joint optimization, both our proposed method and JLAR are distinct in that they jointly learn group assignments and representations.
A. Face Datasets
We extract face descriptors using a network that has been pre-trained on the VGG-Face architecture. These descriptors are then processed through PCA to reduce their dimensionality. Subsequently, the reduced descriptors are
1) CFP
The Celebrities in Frontal-Profile (CFP) database encompasses 500 individuals, each represented by 10 frontal and 4 profile images captured in unconstrained environments. From this collection, we exclusively utilize
2) LFW
The Labeled Faces in the Wild (LFW) dataset includes 13,233 facial images collected from the internet. In our study, we use their pre-aligned versions. The enrollment set is derived from
B. Scenario #1: Group Verification
Consider a scenario where a user claims their membership to group c. We define two hypotheses:
Figure. 4 provides a comparative analysis, illustrating the enhanced efficiency of our scheme in group verification relative to other baseline methods. Complementing this, TABLE 1 offers a detailed quantitative evaluation for four specific group sizes
Performances on group verification for varying group size M.
C. Scenario #2: Group Identification
This scenario addresses open set identification, wherein a querying user may either be enrolled or categorized as an impostor. The system’s identification procedure bifurcates into two phases. In the first phase, the system determines whether the user is enrolled. While this mirrors the verification described previously, the pivotal distinction lies in the group’s indeterminacy. The system calculates the min-max score for each group, denoted as
The performance metrics for the group identification scenario are delineated in Fig 5. These metrics highlight the enhancements that our proposed scheme offers in this scenario. Table 2 provides a detailed quantitative comparison for four group sizes
Performances on group identification for varying group size M on CFP (left) and LFW (right).
Figure 7 presents some enrolled and querying faces of LFW dataset. All the failed identification examples show a change of lighting, pose or expression or occlusion.
Examples of group identification on LFW. black frames (enrolled samples), green frames (successful queries) and red frames (failed queries).
Conclusion
We introduce a new group membership methodology which achieves two primary objectives: (i) it jointly learns nonlinear transform representations, incorporating prior information, and (ii) it determines group representatives utilizing a maximum likelihood approach grounded in functional measures. We have further proposed an efficient algorithm tailored for the optimal estimation of model parameters. Evaluations of our proposed framework were conducted on image databases, underscoring its applicability and proficiency in face verification and identification tasks.
ACKNOWLEDGMENT
Implementation codes available at (https://gitlab.idiap.ch/biometric/code.group_membership_verification).