Group Membership Verification via Nonlinear Sparsifying Transform Learning

SECTION I.

Introduction

Group testing, initially conceived during World War II to efficiently identify rare defects in large populations, has evolved into a fundamental principle across a spectrum of fields, from medical diagnostics to contemporary computer science applications. This approach, which pools and assesses samples concurrently, maximizes resource efficiency and identifies specific attributes in subsets of populations. As today is systems—be it cloud infrastructures or the Internet of Things (IoT)—deepen their interconnectedness, there’s a burgeoning demand for robust tools that can authenticate entity affiliations within designated groups. Such verifications not only aid in identification but also ensure authentication in our interconnected digital ecosystems. Yet, despite the advances, contemporary systems present intricate challenges, underscoring a notable gap: the integration of traditional group testing techniques with entity affiliation verifications. Addressing this gap is paramount. It promises not just enhanced membership verification, especially using facial images, but also fortifies the twin pillars of privacy and security in digital realms.

Verification Principles and Challenges: Group membership verification, at its core, seeks to confirm the legitimacy of a specific item, device, or individual within a designated group. This procedure unfolds in two pivotal stages: firstly, the verification of the entity’s affiliation to the group, and secondly, the identification of the entity itself. A paramount challenge in this domain is striking a delicate balance: ensuring accurate distinction between members and non-members while simultaneously preserving individual identities to uphold privacy.

Verification Mechanism: At the core of the verification process lies the collection of templates. These encompass representations of items (via passive physical unclonable functions (PUFs)), devices (through active PUFs), and unique individual traits (captured by biometrics). These templates are securely stored within a server-side data structure. Upon a verification request, the client transmits a distinct signature to the server, which subsequently determines access authorization. It is essential that this mechanism preserves utmost anonymity to shield individual identities. The primary goal of this protocol is to enable the server to verify group membership with accuracy, without having detailed insight into the system’s internal mechanisms.

Operational Model: In our operational framework, a central server manages the group membership protocol, processing requests from various clients. When a client acquires a new template, it communicates with the server. It is essential to recognize the potential risks posed by such servers. While some might operate with adversarial intentions, our model assumes the server follows an “honest but curious” behavior. This implies that although the server executes its functions reliably and adheres to the prescribed protocol, there might be an inherent inclination to interpret cached templates or analyze the nature of incoming requests. Such servers could attempt to glean additional information from the data they handle, aiming either to reconstruct the original data or infer relationships between different queries. The system’s design meticulously ensures that the server cannot infer private templates, guaranteeing accurate validation of a user’s group affiliation and determining their group identity when necessary.

A. Related Work

The paradigm of anonymous authentication for group members has been a staple in cryptography for years [1]. However, the applications we consider, particularly for biometrics, diverge significantly from established models of authentication, identification, and secret binding. Traditional approaches ensure security either at the server or client end, but they invariably result in the revelation of the user’s identity.

Aggregating signals into a unified representation is widely adopted in computer vision. Techniques such as Bag of Words (BoW) [2] and VLAD [3] consolidate local descriptors from an image into a comprehensive descriptor. Contemporary adaptations like BoW encoding convolutional features of CNNs have been introduced [4]. Arandjelovic et al. further refined VLAD with a learnable pooling layer, termed NetVLAD [5], with subsequent iterations exploring soft assignment to multiple clusters [6]. Distinctly, Zhong et al. [7] developed a descriptor for the faces of celebrities present in the same photo. Although their system excels with two faces per image, performance drops with an increasing number of faces. Our approach, while inspired by these techniques, diverges primarily because our queries comprise a singular face, and our group representations usually encompass more than two faces captured under diverse conditions.

The amalgamation of templates into a collective representation is prominent in biometrics. For example, [8] combined multiple captures of a single person’s face to offset challenges from poses, expressions, and image quality. Contrastingly, our focus is on aggregating unique faces from different individuals in a group. Notably, while conventional methods prioritize retrieving visually analogous elements, they do not inherently provide security or privacy.

The methodologies introduced in [9] and [10] pivot on the notion of transforming randomly-selected templates into discrete embeddings. These are then coalesced to form a singular group representation. Their cost-effectiveness, coupled with the inherent challenges of identity reconstruction, positions these strategies as particularly compelling. Further intricacies, including the impact of the sparsity level of high-dimensional features characterizing group members on aspects like security, compactness, and verification efficacy, are expounded upon in [11]. In a progressive stride, [12] departs from the conventional approach. Instead of statically computing group representations based on predefined entities, this study concurrently learns group representation and their corresponding assignments. By adopting variance as a metric for dissimilarity, the approach endeavors to minimize inter-group differences while amplifying intra-group distinctions. Empirical assessments indicate that this dual learning mechanism fosters enhanced performance, all while maintaining robust security measures.

B. Contribution

We propose a novel group membership assignment based on joint modeling and learning of nonlinear transforms with priors and nonlinear transform representation and group representative assignment. The model parameters are learned by minimizing an empirical expectation of the model log likelihood, which, under the discrimination prior, corresponds to maximizing the discrimination power measure. The proposed model allows a rejection option over continuous, discontinuous, and overlapping regions in the transform domain. Our learning strategy is based on an iterative, alternating algorithm with three steps. The honest but curious server cannot reconstruct the signatures, i.e., the data structure is protected in terms of security requirements. Moreover, the privacy of the users is guaranteed by anonymous verification.

C. Outline

Sec. III elucidates our framework. A comprehensive performance analysis is presented in Sec. VII. Concluding remarks are in Sec. VIII.

D. Notation

Vectors and matrices are denoted by boldface lower-case (x) and upper-case (X) letters, respectively. We consider the same notation for a random vector x and its realization. The difference should be clear from the context. We use the notation $\left [{{ N }}\right]$ for the set $\{ 1, \cdots, N \}$ . The superscript $(\cdot)^{T}$ stands for the transpose.

SECTION II.

Preliminaries on Sparse Data Representation

The foundational principle of sparse representation is to express data using fewer components than traditionally required, without compromising the integrity or essential characteristics of the data. This principle has found widespread application across various domains, including feature extraction, clustering, classification, and signal reconstruction, underscoring its versatility and impact. Three primary models of sparse data representation are the synthesis model, the analysis model, and the transform model. In the following section, we briefly review the essence of these models. Figure 1 visualizes the core concepts of these models.

FIGURE 1.

Visualization of three sparse data representation models.

Show All

A. Synthesis Model

The synthesis model assumes that a high-dimensional data sample $\mathbf {x}_{i} \in \mathbb {R}^{N}$ can be approximated as a linear combination of a select few basis columns (atoms) from a dictionary $\mathbf {D} = \left [{{ \mathbf {d}_{1}, \cdots, \mathbf {d}_{M}}}\right] \in \mathbb {R}^{N \times M}$ , yielding a sparse representation $\mathbf {y}_{i} \in \mathbb {R}^{M}$ , where ${\Vert \mathbf {y}_{i} \Vert }_{0} \ll M$ . The deterministic formulation of the sparse synthesis model is expressed as $\mathbf {x}_{i} = \mathbf {D} \mathbf {y}_{i} + \mathbf {v}_{i}$ , where $\mathbf {v}_{i} \in \mathbb {R}^{N}$ denotes the approximation error in the original data domain—essentially, the residual between the actual data and its sparse approximation. The synthesis model is also known as the regression model with sparsity regularized penalty.

B. Analysis Model

The analysis model employs a dictionary $\mathbf {A} \! \in \! \mathbb {R}^{M \times N}$ with $M \! \gt \! N$ to analyze the data sample $\mathbf {x}_{i} \! \in \! \mathbb {R}^{N}$ . Given this data sample $\mathbf {x}_{i} \! \in \! \mathbb {R}^{N}$ and dictionary $\mathbf {A} \! \in \! \mathbb {R}^{M \times N}$ , the model assumes the sparse representation $\mathbf {y}_{i} \! \! = \! \mathbf {A} \mathbf {x}_{i}$ , i.e., ${\Vert \mathbf {y}_{i} \Vert }_{0} \! \! \ll \! M$ .

C. Transform Model

The transform model assumes that a high-dimensional data sample $\mathbf {x}_{i}$ can be effectively sparsified through a linear transformation $\mathbf {W} \in \mathbb {R}^{M \times N}$ , resulting in $\mathbf {W} \mathbf {x}_{i} = \mathbf {y}_{i} + \mathbf {z}_{i}$ . In this deterministic formulation, $\mathbf {y}_{i} \in \mathbb {R}^{M}$ denotes the sparse representation, which satisfies $\Vert \mathbf {y}_{i} \Vert _{0} \ll M$ , and $\mathbf {z}_{i}$ represents the approximation error within the transform domain. This model fundamentally employs linear transformation but significantly enhances its applicability through the integration of nonlinearities. Nonlinearities can be introduced via a generalized element-wise nonlinearity operator $\psi _{\pmb {\theta }}(\mathbf {W} \mathbf {x}_{i})$ , where $\pmb {\theta }$ denotes the parameters modulating the nonlinearity. Examples of nonlinear transforms include the hard thresholding function, the soft thresholding function, and ReLU activation function.

SECTION III.

Proposed Framework Overview

A. Assignment and Learning Approach

In our research, we introduce a methodology for learning a privacy-preserving assignment. This stands in contrast to the works cited in [13], [14], [15], [16], [17], [18], and [19], where various generic privacy-preserving identification or search mechanisms have been put forward. Unlike these approaches, our protocol assigns group membership not by assessing similarity in the original data space, but through evaluating a min-max functional measure within a transformed space. This assignment procedure unfolds in two distinct steps: (i) generation of candidate nonlinear transform representations, and (ii) evaluation of the min-max measure across these representations leading to the assignment of group membership.

In our pursuit to develop an assignment mechanism that is sparse, discriminative, information-preserving, and respects privacy, we employ specific priors when modeling the candidate nonlinear transforms. In the learning process, we simultaneously focus on two core objectives: (i) estimating the parameters inherent to the candidate transforms, and (ii) assigning group membership. To achieve both the discriminative and privacy-preserving goals, we advocate for support intersection measures. According to these measures, every representation of a data instance that is assigned to a specific group should maximize its support intersection with other representations within that same group. On the contrary, a representation associated with a particular group should minimize its support intersection with representations from different groups. Here, “support intersection” between vector representations denotes the count of non-zero elements that both vectors contain at the same index positions.

B. Setup

Consider a dataset of K data vectors, $\mathbf { x}_{i} \in \mathbb {R}^{N}$ , represented as $\mathbf {X} \in {\mathbb {R}}^{N \times K}$ . For simplicity, we postulate that each entity, whether an item, device, or individual, belongs to a singular group. This means that each data vector, $\mathbf { x}_{i}$ , is affiliated with one distinct group $c \in \{ 1, \cdots, C \}$ . We further assume that entities within the same group exhibit similar features, positioning them within proximate sub-spaces.

Our primary aim is to construct an information-preserving group representation, $\pmb {\theta }_{c} = \left [{{ \pmb {\nu }_{c}, \pmb {\tau }_{c} }}\right] \in \mathcal {T}^{L \times 2}, c \in \left [{{ C}}\right]$ , for a set of members. This representation should adhere to security requirements; specifically, it should shield the inherent structure of the data from servers that, while trustworthy, may be intrusively curious (honest, but curious servers). Here, the vectors $\pmb {\nu }_{c}$ and $\pmb {\tau }_{c}$ are parameters representing similarity and dissimilarity within group $c \in [C]$ . Once these security requirements are met, we commit these group representatives to a public server. Our coding alphabet, denoted as $\mathcal {T}$ , can adopt various forms such as binary, ternary, continuous, and so forth. In this research, we consider a non-quantized alphabet.

Our secondary goal revolves around deriving an information-preserving transform representation, $\mathbf {y}_{i}$ . This representation, corresponding to the original space group member $\mathbf {x}_{i}$ , should remain distinguishable from non-members. Simultaneously, it is imperative that it maintains the privacy of the underlying entity, thus not disclosing the member’s identity.

C. Framework Overview

Our framework comprises the subsequent steps, as illustrated in Figure 2:

FIGURE 2.

General block diagram of the proposed framework.

Show All

1) Preparation at Owner Side

The owner jointly estimates the sparse representations and assigns the group representatives from the data they possess. This estimation is achieved using a trained linear mapping, followed by a generalized element-wise nonlinearity. These group representatives are then transmitted to the server, which serves as a storage facility.

2) Querying at Client Side

The client produces a sparse representation from their query data. This is performed utilizing the same trained linear mapping and the generalized element-wise nonlinearity. Subsequently, the client forwards this sparse representation to the server.

3) Searching at Server Side

The server uses the received data to conduct similarity and dissimilarity searches among the group representatives. This process aims to precisely identify the group that most closely matches the client’s query.

SECTION IV.

Assignment Principle and Probabilistic Model

In this section, we elucidate the core principle guiding our group assignment methodology. Our focus is on the evaluation of candidate representations through a specific measure, and the probabilistic model integral to our approach of privacy-preserving group membership assignment.

A. Group Membership Assignment Principle

The procedure for our group membership assignment is bifurcated into:

Candidate Representation Generation, and
Group and Nonlinear Transform Representation Assignment.

1) Candidate Representation Generation

To derive a candidate representation, denoted as $\mathbf { q}_{i,c}$ , we employ a nonlinear transform (NT) symbolized as ${\mathcal {T}}_{{\mathcal {P}}_{c}}$ equipped with parameters ${\mathcal {P}}_{c} = \{ \mathbf {W} \in \mathbb {R}^{L \times N}, \pmb {\nu }_{c}\in \mathbb {R}^{L}, \pmb {\tau }_{c} \in \mathbb {R}^{L} \}$ . This NT is applied to the input data $\mathbf {x}_{i}$ . We provide a schematic representation of this NT-based generation process below. Here, the function $\psi _{\left [{{ \pmb {\nu }_{c}, \pmb {\tau }_{c} }}\right]} \left ({{ \cdot }}\right)$ denotes an element-wise nonlinearity parameterized by $\left [{{ \pmb {\nu }_{c}, \pmb {\tau }_{c} }}\right] \! =: \! \pmb {\theta }_{c}$ , which we will unpack in subsequent sections.

Show All

For the generation of an ensemble of candidate representations, symbolized as $\{ \mathbf { q}_{i,1},\cdots, \mathbf { q}_{i, C} \}$ , we leverage a set of C NTs represented as ${\mathcal {T}}_{T} = \{ {\mathcal {T}}_{{\mathcal {P}}_{1}}, \cdots, {\mathcal {T}}_{{\mathcal {P}}_{C}} \}$ , paired with their respective parameter sets ${\mathcal {P}}_{T} = \{ {\mathcal {P}}_{1}, \cdots, {\mathcal {P}}_{C} \}$ . Each NT, denoted as ${\mathcal {T}}_{{\mathcal {P}}_{c}}$ , is uniquely characterized by its parameter set ${\mathcal {P}}_{c} = \{ \mathbf {W} \in \mathbb {R}^{L \times N}, \pmb {\nu }_{c}\in \mathbb {R}^{L}, \pmb {\tau }_{c} \in \mathbb {R}^{L} \}, c \in \left [{{C}}\right]$ , though all share a common linear map W. Yet, every NT possesses unique pairs $[\pmb {\nu }_{c}, \pmb {\tau }_{c}]$ tied to a specific group $c\in \left [{{C}}\right]$ . For brevity, we also label the set of these parameter pairs as $\pmb {\theta } = \{ \pmb {\theta }_{1}, \cdots, \pmb {\theta }_{C} \} = \{ \left [{{ \pmb {\nu }_{1}, \pmb {\tau }_{1} }}\right], \cdots, \left [{{ \pmb {\nu }_{C}, \pmb {\tau }_{C} }}\right] \}$ .

2) Group and NT Representation Assignment

Given the candidate representations, we proceed with the actual assignment. Unlike traditional similarity (or dissimilarity) measures, we employ a min-max functional measure. Each candidate representation undergoes evaluation through this measure. Based on the evaluation score, we determine the group membership and its associated NT representation.

B. Probabilistic Model

To introduce the probabilistic model of our framework, let’s start with the decomposition of the joint probability distribution $p \left ({{ \mathbf {x}_{i}, \mathbf {y}_{i}, \pmb {\theta }, \mathbf {W} }}\right)$ . Using the chain rule, we have $p \left ({{ \mathbf {x}_{i}, \mathbf {y}_{i}, \pmb {\theta }, \mathbf {W} }}\right) = p \left ({{ \mathbf {x}_{i}, \mathbf {y}_{i}, \pmb {\theta } \mid \mathbf {W} }}\right) p \left ({{ \mathbf {W} }}\right)$ . Assuming $p \left ({{ \mathbf {x}_{i} \mid \mathbf {W}}}\right) = p \left ({{ \mathbf {x}_{i} }}\right)$ and applying the chain rule again, we have:

$\begin{align*} p \left ({{ \mathbf {x}_{i}, \mathbf {y}_{i}, \pmb {\theta } \mid \mathbf {W} }}\right ) & = p \left ({{ \mathbf {y}_{i}, \pmb {\theta } \mid \mathbf {x}_{i}, \mathbf {W} }}\right ) p \left ({{ \mathbf {x}_{i} }}\right ) \tag {1a}\\ & = p \left ({{ \mathbf {x}_{i} \mid \mathbf {y}_{i}, \pmb {\theta }, \mathbf {W} }}\right ) p \left ({{ \mathbf {y}_{i}, \pmb {\theta } \mid \mathbf {W} }}\right ) \tag {1b}\end{align*}$ View Source

Therefore, using Bayes’ rule, we have:

$\begin{equation*} p \left ({{ \mathbf {y}_{i}, \pmb {\theta } \mid \mathbf {x}_{i}, \mathbf {W} }}\right ) \propto p \left ({{ \mathbf {x}_{i} \mid \mathbf {y}_{i}, \pmb {\theta }, \mathbf {W} }}\right ) p \left ({{ \mathbf {y}_{i}, \pmb {\theta } \mid \mathbf {W} }}\right ) \tag {2}\end{equation*}$

View Source

Let’s assume

$p \left ({{ \mathbf {y}_{i}, \pmb {\theta } \mid \mathbf {W} }}\right) = p \left ({{ \mathbf {y}_{i}, \pmb {\theta }}}\right)$

, i.e., neglects the dependence on W. This independence assumption allows us more flexibility in the class of assumptions related to sparsity and discrimination for the parametric prior

$p \left ({{ \mathbf {y}_{i}, \pmb {\theta }}}\right)$

. Also, for simplification, let

$p \left ({{ \mathbf {W} \mid \mathbf {x}_{i} }}\right) = p \left ({{ \mathbf {W} }}\right)$

. Given K data samples

$\mathbf {X} = \left [{{ \mathbf {x}_{1}, \cdots \mathbf {x}_{K} }}\right]$

and using (2), we can consider the following learning model:

$\begin{align*} p \left ({{ \mathbf {Y}, \mathbf {W} \mid \mathbf {X}}}\right ) & = p \left ({{ \mathbf {Y} \mid \mathbf {W}, \mathbf {X}}}\right ) p \left ({{ \mathbf {W} \mid \mathbf {X}}}\right ) \tag {3a}\\ & = \prod _{i=1}^{K} \int _{\pmb {\theta }} p \left ({{ \mathbf {y}_{i}, \pmb {\theta } \mid \mathbf {x}_{i}, \mathbf {W} }}\right ) p \left ({{ \mathbf {W} }}\right ) \mathrm {d} \pmb {\theta } \tag {3b}\\ & = \prod _{i=1}^{K} \int _{\pmb {\theta }} p \left ({{ \mathbf {x}_{i} \mid \mathbf {y}_{i}, \pmb {\theta }, \mathbf {W} }}\right ) p \left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right ) p \left ({{ \mathbf {W} }}\right ) \mathrm {d} \pmb {\theta } \tag {3c}\end{align*}$

View Source

Using this probabilistic formulation, we characterize our learning model through the following components:

A sparsifying error coupled with a $\pmb {\theta }$ adjustment error, represented by the prior $p \left ({{ \mathbf {x}_{i} \mid \mathbf {y}_{i}, \pmb {\theta }, \mathbf {W} }}\right)$ .
Discrimination and sparsity prior $p \left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right)$ .
Linear map W prior $p \left ({{ \mathbf {W} }}\right)$ .

By adopting this structured probabilistic perspective, we aim for a holistic understanding of the dynamics between the observed inputs, their associated transformations, model parameters, and their linear mapping.

SECTION V.

Addressing Probabilistic Modeling Priors

A. Sparsifying Error and Model Parameters Adjustment Error Prior

In our pursuit to capture the intricate relationship between observed data and its inherent structure, we formulate a probabilistic construct as presented below:

$\begin{align*} & p \! \left ({{ \mathbf {x}_{i}\! \mid \! \mathbf {y}_{i}, \pmb {\theta }, \mathbf {W} }}\right ) \! \\ & \quad \propto \exp \left [{{\! - \frac {1}{\beta _{\textsf {spa}}} d_{\textsf {spa}} \left ({{ \mathbf {W}\mathbf {x}_{i},\mathbf {y}_{i} }}\right ) \!-\! \frac {1}{\beta _{\textsf {adj}}} d_{\textsf {adj}} \left ({{ \mathbf {W}\mathbf {x}_{i}, \pmb {\theta } }}\right ) \!}}\right ], \tag {4}\end{align*}$ View Source

where

$d_{\textsf {spa}} \left ({{ \cdot, \cdot }}\right): \mathbb {R}^{L} \times \mathbb {R}^{L} \rightarrow \mathbb {R}$

represents the sparsifying error measure, and

$d_{\textsf {adj}} \left ({{ \cdot, \cdot }}\right): \mathbb {R}^{L} \times \mathbb {R}^{L} \rightarrow \mathbb {R}$

represents the model parameters adjustment error. The parameters

$\beta _{\textsf {spa}}$

and

$\beta _{\textsf {adj}}$

are scaling factors.

Our primary objective for introducing this construct is twofold. Initially, we encapsulate the sparsifying error vector $\mathbf { W}\mathbf { x}_{i}-\mathbf { y}_{i}$ . Subsequently, and crucially, we seek to refine the alignment of our model’s estimations with observed data by addressing discrepancies. This is achieved by implementing an adjustment to the discrimination parameter error vector, denoted as $\mathbf {W} \mathbf {x}_{i} - \pmb {\nu }_{c} - \pmb {\tau }_{c}$ . We define our measures as follows:

$\begin{align*} d_{\textsf {spa}} \left ({{ \mathbf {W}\mathbf {x}_{i}, \mathbf { y}_{i} }}\right ) & = \frac {1}{2} {\Vert \mathbf {W} \mathbf {x}_{i} - \mathbf { y}_{i} \Vert }_{2}^{2}, \tag {5}\\ d_{\textsf {adj}} \left ({{ \mathbf {W}\mathbf {x}_{i}, \pmb {\theta } }}\right ) & = \frac {1}{2} {\Vert \mathbf {W} \mathbf {x}_{i} - \pmb {\nu }_{c} - \pmb {\tau }_{c} \Vert }_{2}^{2}. \tag {6}\end{align*}$ View Source

where the index c corresponds to the assigned class of the data sample

$\mathbf {x}_{i}$

. These measures directly influence our probabilistic model’s likelihood expressions, described as:

$\begin{align*} p \left ({{ \mathbf {x}_{i} \mid \mathbf {y}_{i}, \mathbf {W} }}\right ) & \propto \exp \left [{{ - \frac {1}{\beta _{\textsf {spa}}} d_{\textsf {spa}} \left ({{ \mathbf {W}\mathbf {x}_{i}, \mathbf { y}_{i} }}\right ) }}\right ], \tag {7}\\ p \left ({{ \mathbf {x}_{i} \mid \pmb {\theta }, \mathbf {W} }}\right ) & \propto \exp \left [{{ - \frac {1}{\beta _{\textsf {adj}}} d_{\textsf {adj}} \left ({{ \mathbf {W}\mathbf {x}_{i}, \pmb {\theta } }}\right ) }}\right ]. \tag {8}\end{align*}$

View Source

B. Sparsity and Discrimination Prior

We model our sparsity-inducing and discrimination prior as follows:

$\begin{align*} p \! \left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right ) \! \propto \! \exp \left [{{\! - \frac {1}{\beta _{\textsf {disc}}} d_{\textsf {disc}} \! \left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right ) \!-\! \frac {1}{\beta _{\textsf {p}}} d_{\textsf {p}} \! \left ({{ \pmb {\theta } }}\right ) \!- \!\frac {1}{\beta _{\textsf {1}}} d_{\textsf {1}} \! \left ({{ \mathbf {y}_{i} }}\right ) \!}}\right ], \tag {9}\end{align*}$ View Source

where

$d_{\textsf {disc}}$

represents the discrimination prior measure,

$d_{\textsf {p}}$

denotes the similarity/dissimilarity prior measure, and

$d_{\textsf {1}}$

indicates the sparsity measure. The parameters

$\beta _{\textsf {disc}}$

,

$\beta _{\textsf {p}}$

, and

$\beta _{\textsf {1}}$

are scaling factors. By considering the

$\ell _{1}$

-norm as the sparsity measure for vector

$\mathbf {y}_{i}$

, i.e.,

$d_{\textsf {1}} \! \left ({{ \mathbf {y}_{i} }}\right) = {\Vert \mathbf {y}_{i} \Vert }_{1}$

, we introduce a prior that induces sparsity in

$\mathbf {y}_{i}$

. The sparsity-inducing prior on

$\mathbf {y}_{i}$

is given by:

$\begin{equation*} p \left ({{ \mathbf {y}_{i} }}\right ) \! \propto \! \exp \left ({{ - {\Vert \mathbf {y}_{i} \Vert }_{1} / \beta _{1} }}\right ). \tag {10}\end{equation*}$

View Source

The discrimination prior is modeled similarly as:

$\begin{equation*} p \left ({{ \pmb {\theta } \! \mid \! \mathbf {y}_{i}}}\right ) \! \propto \! \exp \left [{{ - \frac {1}{\beta _{\textsf {disc}}} d_{\textsf {disc}} \! \left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right ) - \frac {1}{\beta _{\textsf {p}}} d_{\textsf {p}} \! \left ({{ \pmb {\theta } }}\right ) }}\right ] \tag {11}\end{equation*}$

View Source

In order to define the measures $d_{\textsf {disc}}$ and $d_{\textsf {p}}$ , we first describe our min-max discrimination measure based on the following assumptions:

The discrimination measure is characterized by relationships based on the support intersection between $\mathbf {y}_{i}$ , $\pmb {\nu }_{c}$ and $\pmb {\tau }_{c}$ .
The discrimination measure has a min-max structure, with its expression factored with respect to $\pmb {\nu }_{c}$ and $\pmb {\tau }_{c}$ .

We then elucidate our foundational measures related to similarity, dissimilarity, and strength, focusing on the support intersection between the representations and the $\pmb {\theta }_{c}$ parameter pair. Following this, we detail our min-max discrimination functional measure.

1) Quantifying Representation Similarity and Dissimilarity

The measure $\textsf {Sim}$ quantifies the similarity between two representations $\mathbf { y}_{1}$ and $\mathbf { y}_{2}$ . It is defined as:

$\begin{equation*} \textsf {Sim} ( \mathbf { y}_{1}, \mathbf { y}_{2})= \Vert \mathbf { y}_{1}^{+} \! \odot \mathbf { y}_{2}^{+} \Vert _{1} \! + \Vert \mathbf { y}_{1}^{-} \! \odot \mathbf { y}_{2}^{-}\Vert _{1}, \tag {12}\end{equation*}$ View Source

where

$\mathbf { y}_{1}=\mathbf { y}_{1}^{+}-\mathbf { y}_{1}^{-}$

,

$\mathbf { y}_{2}=\mathbf { y}_{2}^{+} - \mathbf { y}_{2}^{-}$

,

$\mathbf { y}_{1}^{+} = \max (\mathbf { y}_{1}, \mathbf { 0})$

and

$\mathbf { y}_{1}^{-}= \max (-\mathbf { y}_{1}, \mathbf { 0})$

.

The term $\Vert \mathbf { y}_{1}^{+} \! \odot \mathbf { y}_{2}^{+} \Vert _{1}$ measures the similarity in the positive components. Essentially, for two vectors to have high similarity in their positive components, they should both have positive values at the same indices, and these values should be relatively large. This is captured by the element-wise multiplication ( $\odot$ ) and the $L_{1}$ norm. Similarly, the term $\Vert \mathbf { y}_{1}^{-} \! \odot \mathbf { y}_{2}^{-}\Vert _{1}$ evaluates the agreement in their negative components. The overall similarity score, therefore, combines the extent to which the positive components agree and the negative components agree. The larger the score, the more similar the two representations are in both their positive and negative components.

The measure $\textsf {Dis}$ , related to dissimilarity (oppositeness) between two representations $\mathbf { y}_{1}$ and $\mathbf { y}_{2}$ is defined as:

$\begin{equation*} \textsf {Dis} ( \mathbf { y}_{1}, \mathbf { y}_{2})= \Vert \mathbf { y}_{1}^{+} \! \odot \mathbf { y}_{2}^{-} \Vert _{1} \! + \Vert \mathbf { y}_{1}^{-} \! \odot \mathbf { y}_{2}^{+} \Vert _{1}. \tag {13}\end{equation*}$ View Source

The term

$\Vert \mathbf { y}_{1}^{+} \! \odot \mathbf { y}_{2}^{-} \Vert _{1}$

computes the contrast between the positive components of

$\mathbf { y}_{1}$

and the negative components of

$\mathbf { y}_{2}$

. A high value in this term indicates that there are indices where

$\mathbf { y}_{1}$

has a positive value while

$\mathbf { y}_{2}$

has a negative value, or vice versa. The element-wise multiplication emphasizes this opposition. The term

$\Vert \mathbf { y}_{1}^{-} \! \odot \mathbf { y}_{2}^{+} \Vert _{1}$

mirrors this idea, identifying negative values in

$\mathbf { y}_{1}$

where there are positive values in

$\mathbf { y}_{2}$

.

Furthermore, the measure $\textsf {Stg}$ , which correlates to the strength on the support intersection, is articulated as:

$\begin{equation*} \textsf {Stg} ( \mathbf { y}_{1}, \mathbf { y}_{2} ) \! = \! \Vert \mathbf { y}_{1} \! \odot \mathbf { y}_{2} \Vert _{2}^{2}. \tag {14}\end{equation*}$ View Source

Figure 3 depcits a visual representation of the similarity and dissimilarity contributions pertaining to two typical sparse representations,

$\mathbf {y}_{1}$

and

$\mathbf {y}_{2}$

.

FIGURE 3.

Visualization of similarity and dissimilarity measures.

Show All

2) Min-Max Discrimination Prior Measure

We introduce a min-max functional, denoted as $d_{\textsf {disc}} \left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right)$ :

$\begin{align*} d_{\textsf {disc}} \left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right ) & = \min _{d}\max _{c} \big [ \textsf {Sim} \left ({{ \mathbf {y}_{i}, \pmb {\tau }_{d} }}\right ) + \textsf {Dis} \left ({{ \mathbf {y}_{i}, \pmb {\nu }_{c} }}\right ) ] \\ & \quad + \min _{d} \textsf {Stg} \left ({{ \mathbf {y}_{i}, \pmb {\tau }_{d} }}\right ). \tag {15}\end{align*}$ View Source

This formulated metric ensures that the transform representation,

$\mathbf {y}_{i}$

, aligns with the following criteria:

The similarity relative to $\pmb {\tau }_{d}$ is minimized, as measured by $\textsf {Sim}$ .
The intersection strength regarding support with respect to $\pmb {\tau }_{d}$ is minimized, as evaluated by $\textsf {Stg}$ .
The similarity relative to $\pmb {\nu }_{c}$ is maximized, as measured by $\textsf {Sim}$ .

To understand the dynamics of this prior measure, consider the formulation as akin to a dance of forces within a physical field. In this space, the point—our transform representation, denoted as $\mathbf {y}_{i}$ —is swayed by two principal forces: one symbolized by $\textsf {Sim} \! \left ({{ \mathbf {y}_{i},\! \pmb {\tau }_{d} \! }}\right)$ and the other by $\textsf {Dis} \left ({{ \mathbf {y}_{i}, \pmb {\nu }_{c} }}\right)$ . The force governed by $\min _{d} \textsf {Sim} \left ({{ \mathbf {y}_{i}, \pmb {\tau }_{d} }}\right)$ highlights the minimum similarity metric as defined by $\textsf {Sim}$ . This force operates with an intent to maximize the “distance” between $\mathbf {y}_{i}$ and $\pmb {\tau }_{d}$ , effectively pushing $\mathbf {y}_{i}$ away from regions that resemble $\pmb {\tau }_{d}$ . In contrast, the force dictated by $\max _{c} \textsf {Dis} \left ({{ \mathbf {y}_{i}, \pmb {\nu }_{c} }}\right)$ is somewhat paradoxical. While the term ‘dissimilarity’ might suggest repulsion, within our defined metric space, it functions differently. This force underscores $\mathbf {y}_{i}$ ’s pursuit to align as closely as possible with the representation $\pmb {\nu }_{c}$ by amplifying its ‘oppositeness’ or dissimilarity. Consequently, it attracts $\mathbf {y}_{i}$ towards $\pmb {\nu }_{c}$ , seeking regions where this measure of ‘oppositeness’ is maximized.

At the heart of this formulation is the score $d_{\textsf {disc}} \left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right)$ . This score is not just a mere numerical value but a manifestation of equilibrium. It embodies the delicate balance, almost a dance, between the aforementioned forces. As they act upon $\mathbf {y}_{i}$ , they ensure its positioning is not arbitrary but follows the intricate choreography laid out by the relations of $\textsf {Sim}$ and $\textsf {Dis}$ .

3) Model Parameters $\pmb {\theta }$ Prior Measure

We introduce the measure $d_{\textsf {p}} \left ({{ \pmb {\theta } }}\right)$ to quantify the combined influence of different parameter sets within the indexed range $\{1, \cdots, C \}$ . To achieve this, we harness pairwise interactions between distinct parameter sets, computing their aggregate effects as follows:

$\begin{align*} d_{\textsf {p}} \left ({{ \pmb {\theta } }}\right ) = \!\!\!\!\! \sum _{c \in \{1, \cdots, C\} } \, \sum _{d \in \{1, \cdots, C\} \setminus c } \!\!\! d_{\textsf {disc}} \left ({{ \pmb {\nu }_{c}, \pmb {\theta }_{d} }}\right ) + d_{\textsf {disc}} \left ({{ \pmb {\tau }_{c}, \pmb {\theta }_{d} }}\right ). \tag {16}\end{align*}$ View Source

Here, the notation

$\{1,\cdots, C \} \!\! \setminus \!\! c$

refers to the set

$\{1, \cdots, {c-1}, {c+1}, \cdots, C \}$

, effectively excluding the element c from the set

$\{1, \cdots, C \}$

. Note that we have

$p \left ({{ \pmb {\theta } }}\right) \propto \exp \left ({{ - d_{\textsf {p}} \left ({{ \pmb {\theta } }}\right) / \beta _{\textsf {p}}}}\right)$

. Indeed,

$d_{\textsf {p}} \left ({{ \pmb {\theta } }}\right)$

quantifies both the similarity and dissimilarity between the parameters

$\pmb {\theta }$

by employing the discrimination measure

$d_{\textsf {disc}}$

.

The dual summation ensures that for every parameter c, we account for its relationship with every other distinct parameter, d. In doing so, $d_{\textsf {p}}$ encapsulates a holistic view of parameter inter-dependencies and interactions in the model.

C. Linear Map Prior

The purpose of the ‘linear map prior’ is to penalize information loss while simultaneously discouraging the adoption of undesirable matrices. To achieve this dual objective, we regularize both the condition number and the expected coherence of the matrix W. Specifically, the prior is defined as:

$\begin{equation*} p \left ({{ \mathbf {W} }}\right ) \propto \exp \left ({{ - \Omega \left ({{ \mathbf {W} }}\right ) }}\right ), \tag {17}\end{equation*}$ View Source

where

$\begin{align*} \Omega \left ({{ \mathbf {W} }}\right ) & = \frac {1}{\beta _{\textsf {W}_{1}}} {\Vert \mathbf {W} \Vert }_{F}^{2} + \frac {1}{\beta _{\textsf {W}_{2}}} {\Vert \mathbf {W} \mathbf {W}^{T} - \mathbf {I}\Vert }_{F}^{2} \\ & \quad - \frac {1}{\beta _{\textsf {W}_{3}}} \log \vert \det \mathbf {W}^{T} \mathbf {W} \vert. \tag {18}\end{align*}$

View Source

In this formulation:

The term ${\Vert \mathbf {W} \Vert }_{F}^{2}$ serves as a regularizer, penalizing the magnitude of matrix W to ensure stability.
${\Vert \mathbf {W} \mathbf {W}^{T} - \mathbf {I}\Vert }_{F}^{2}$ aims to minimize the deviation of W from orthogonality.
Lastly, $\log \vert \det \mathbf {W}^{T} \mathbf {W} \vert$ assesses the volume scaling factor of W, thereby promoting full rank matrices.

For a more detailed exploration and justification of the constraint

$\Omega \left ({{ \mathbf {W} }}\right)$

on the linear map W, readers are directed to the works of [14] and [15], along with associated references contained therein.

SECTION VI.

Learning Model to Assign Group Membership

A. Problem Formulation

Consider a given training dataset, denoted by X. The task of directly maximizing the probability function $p \left ({{ \mathbf {x}_{i} \mid \mathbf {y}_{i}, \pmb {\theta }, \mathbf {W} }}\right)p \left ({{ \pmb {\theta } | \mathbf {y}_{i} }}\right) p \left ({{ \mathbf {y}_{i} }}\right) p \left ({{ \mathbf {W} }}\right)$ over the parameter space $\{ \mathbf {Y}, \pmb {\theta }, \mathbf {W} \}$ is computationally burdensome, primarily due to the high dimensionality and nonlinearities involved. To circumvent this challenge, we pivot our approach towards minimizing the negative logarithm of our probabilistic model (3c), over the variables $\{ \mathbf {Y}, \pmb {\theta }, \mathbf {W} \}$ . That is we aim to minimize the negative log likelihood

$\begin{equation*} - \log \prod _{i=1}^{K} \int _{\pmb {\theta }} p \left ({{ \mathbf {x}_{i} \mid \mathbf {y}_{i}, \pmb {\theta }, \mathbf {W} }}\right ) p \left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right ) p \left ({{ \mathbf {W} }}\right ) \mathrm {d} \pmb {\theta } \tag {19}\end{equation*}$ View Source

over the variables

$\{ \mathbf {Y}, \pmb {\theta }, \mathbf {W} \}$

. However, minimizing the exact negative logarithm is difficult since it requires integrating to compute the marginal and the partitioning function of the prior

$p \left ({{ \mathbf {y}, \pmb {\theta } }}\right)$

. Instead, we consider minimizing the negative logarithm of its maximum point-wise estimate:

$\begin{align*} & \hspace {-1pc}\int _{\pmb {\theta }_{\textsf {est}}} p \left ({{ \mathbf {x}_{i} \mid \mathbf {y}_{i}, \pmb {\theta }_{\textsf {est}}, \mathbf {W} }}\right ) p \left ({{ \mathbf {y}_{i}, \pmb {\theta }_{\textsf {est}} }}\right ) \mathrm {d} \pmb {\theta }_{\textsf {est}} \\ & \leq \kappa \; p \! \left ({{ \mathbf {x}_{i} \mid \mathbf {y}_{i}, \pmb {\theta }, \mathbf {W} }}\right ) p \left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right ), \tag {20}\end{align*}$

View Source

where

$\kappa$

is a constant. Here, we assume that

$\pmb {\theta }$

are the parameters at which

$p \left ({{ \mathbf {x}_{i} \mid \mathbf {y}_{i}, \pmb {\theta }_{\textsf {est}}, \mathbf {W} }}\right) p \left ({{ \mathbf {y}_{i}, \pmb {\theta }_{\textsf {est}} }}\right)$

attains its maximum value.

Considering Eq. (3), (19), (20), we arrive at the following problem formulation:

$\begin{align*} \{ \widehat {\mathbf {Y}}, \widehat {\pmb {\theta }}, \widehat {\mathbf {W}} \} & =\arg \min _{\mathbf {Y}, \pmb {\theta }, \mathbf {W}} \sum _{i=1}^{K} \Big [ - \log \, p \! \left ({{ \mathbf {x}_{i} \mid \mathbf {y}_{i}, \pmb {\theta }, \mathbf {W} }}\right ) \\ & \qquad - \log \, p \! \left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right ) \Big ] - \log \, p \! \left ({{ \mathbf {W} }}\right ). \! \tag {21}\end{align*}$ View Source

By substituting the measure obtained from section V, we can define our optimization problem as follows:

$\begin{align*} \{ \widehat {\mathbf {Y}}, \widehat {\pmb {\theta }}, \widehat {\mathbf {W}} \} & =\arg \min _{\mathbf {Y}, \pmb {\theta }, \mathbf {W}} \sum _{i=1}^{K} \Big [ d_{\textsf {spa}} \left ({{ \mathbf {u}_{i}, \mathbf { y}_{i} }}\right ) + \lambda _{\textsf {adj}} \, d_{\textsf {adj}}\left ({{\mathbf {u}_{i}, \pmb {\theta } }}\right ) \\ & \quad + \lambda _{\textsf {disc}} \, d_{\textsf {disc}}\left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right ) + \lambda _{\textsf {p}} \, d_{\textsf {p}} \left ({{ \pmb {\theta } }}\right ) + \lambda _{1} {\Vert \mathbf {y}_{i} \Vert }_{1} \Big ] \\ & \quad + \lambda _{\textsf {W}} \, \Omega \left ({{ \mathbf {W} }}\right ), \! \! \tag {22}\end{align*}$

View Source

where

$\mathbf {u}_{i} = \mathbf {W} \mathbf {x}_{i}$

represents the transformed input data. The collection

$\{ \lambda _{\textsf {adj}}, \lambda _{\textsf {disc}}, \lambda _{\textsf {p}}, \lambda _{1}, \lambda _{\textsf {W}} \}$

denotes the set of Lagrangian multipliers, each serving as an inverse coefficient to their associated scaling parameters. We set

$\lambda _{\textsf {spa}} =1$

.

An essential point of clarification here is that our derived solution to Eq. (22) does not equate to the maximum a posteriori (MAP) solution. While the MAP provides a point estimate by maximizing the posterior distribution, it is computationally intensive due to the complexity of calculating higher-dimensional integrals in the parameter space.

Instead, our framework articulated in Eq. (22) encapsulates an integrated marginal minimization (IMM) strategy. This method entails a sequence of iterative steps, where, in each iteration, we maximize the terms of our model with respect to the variables Y, $\pmb {\theta }$ , and W. This integrated marginal minimization offers several advantages:

Computational Efficiency: Unlike the MAP approach, which often requires complex integrations over the parameter space, our method iteratively maximizes simpler terms, allowing for more efficient computational procedures.
Flexibility: Given the iterative nature of our method, it is more adaptable to various datasets and can better accommodate changes in data distribution, especially in scenarios with limited or evolving data.
Stability: The integrated approach reduces the risk of settling into local optima that do not represent the broader dataset well. By considering marginal effects iteratively, we can capture more global patterns in the data, enhancing the model’s generalization capabilities.

In essence, while the MAP solution offers a theoretical ideal, practical constraints necessitate the use of strategies like integrated marginal minimization, which strike a balance between theoretical rigor and computational tractability. Our methodology facilitates the identification of a joint local maximum in the space $\{ \mathbf {Y}, \pmb {\theta }, \mathbf {W} \}$ for the likelihood and prior probabilities.

B. Learning Algorithm

We propose an alternating block coordinate descent algorithm that progresses across three stages:

Simultaneous estimation of the representation $\mathbf {y}_{i}$ and assignment of group membership c,
Update of the group parameters, represented as $\pmb {\theta } = \{ \pmb {\theta }_{1}, \cdots, \pmb {\theta }_{C} \} = \{ \left [{{ \pmb {\nu }_{1}, \pmb {\tau }_{1} }}\right], \cdots, \left [{{ \pmb {\nu }_{C}, \pmb {\tau }_{C} }}\right] \}$ ,
Update of the linear map W.

1) NT Representation Estimation and Assignment

Given the dataset X, the latest estimate of the group membership parameters $\pmb {\theta }$ , and the current approximation of the linear map W, the expression in (22) simplifies to the ensuing representation estimation problem:

$\begin{align*} \left [{{ \widehat {\mathbf {y}}_{1}, \cdots, \widehat {\mathbf {y}}_{K} }}\right ] & = \arg \min _{\left [{{ {\mathbf {y}}_{1}, \cdots, {\mathbf {y}}_{K} }}\right ]} \sum _{i=1}^{K} \Big [ \frac {1}{2} {\Vert \mathbf {u}_{i} - \mathbf {y}_{i} \Vert }_{2}^{2} \\ & \qquad + \lambda _{1} {\Vert \mathbf {y}_{i} \Vert }_{1} + \lambda _{\textsf {disc}} \,d_{\textsf {disc}} \left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right ) \Big ].\!\!\!\! \tag {23}\end{align*}$ View Source

To resolve this problem, we implement a two-fold approach. Initially, we fix the group index and determine an estimated candidate for the nonlinear transform representation. In the second stage, we evaluate the candidate representations against the predefined group parameters to assign each data point to its most fitting group.

Candidate NT Representation Estimation: During the first phase, observe that for each pair $[\pmb {\tau }_{c}, \pmb {\nu }_{c}]$ , (23) reduces to the following constraint projection problem:

$\begin{align*} \!\!\!\! (\mathsf {P_{S}})\!: \widehat {\mathbf {q}}_{i,c} \!& = \arg \min _{ \mathbf {y}_{i}} \frac {1}{2} \Vert \mathbf {u}_{i} \! - \! \mathbf {q}_{i,c} \Vert _{2}^{2} + \lambda _{1} \pmb {1}^{T} \vert \mathbf {q}_{i,c} \vert \\ & \quad + \lambda _{\textsf {disc}} \, d_{\textsf {disc}} \! \left ({{ \mathbf {q}_{i,c}, \left [{{ \pmb {\nu }_{\!c}, \pmb {\tau }_{\!c} }}\right ] }}\right )\!.\!\! \tag {24}\end{align*}$ View Source

We provide a detailed exposition that (

$\mathsf {P_{S}}$

) per y reduces to

$\min _{\mathbf { y}}\! \frac {1}{2} { \Vert \mathbf {u} - \mathbf {y} \Vert }_{2}^{2} + \mathbf {g}^{T} \vert \mathbf {y} \vert + \mathbf {s}^{T} \left ({{ \mathbf {y} \odot \mathbf {y} }}\right) + \lambda _{1} \pmb {1}^{T} \vert \mathbf {y} \vert$

and per

$\left [{{ \pmb {\tau }_{c}, \pmb {\nu }_{c} }}\right]$

has a closed-form solution as:

$\begin{align*} \widehat {\mathbf {q}}_{i,c} = \psi \left ({{ \mathbf {u}_{i} }}\right ) := \mathrm {sign}(\mathbf {u}_{i}) \odot \max ( \vert \mathbf {u}_{i} \vert - \mathbf {g}_{i} - \lambda _{1} \pmb {1}, \pmb {0}) \oslash \mathbf {k}_{c}, \tag {25}\end{align*}$

View Source

where

$\mathbf {k}_{c} = \left ({{ 1 + 2 \,\mathbf {s}_{c}}}\right)$

and

$\mathbf {y} = \mathbf {y}_{i}$

,

$\mathbf {s}_{c}^{T} = \lambda _{\textsf {disc}} {\left ({{ \pmb {\tau }_{c} \odot \pmb {\tau }_{c} }}\right)}^{T}$

,

$\mathbf {g}^{T}_{i} \! = \! \lambda _{\textsf {disc}} \left ({{ \mathbf {h}_{1}^{T} - \mathbf {h}_{2}^{T} }}\right)$

,

$\mathbf {h}^{T}_{1} \vert \mathbf {y} \vert = \left ({{ \mathbf {y}^{+} }}\right)^{T} \pmb {\tau }_{c}^{+} + \left ({{ \mathbf {y}^{-} }}\right)^{T} \pmb {\tau }_{c}^{-}$

,

$\mathbf {h}^{T}_{2} \vert \mathbf {y} \vert \! = \! \left ({{ \mathbf {y}^{+} }}\right)^{T} \pmb {\nu }^{-}_{c} + \left ({{ \mathbf {y}^{-} }}\right)^{T} \pmb {\nu }^{+}_{c}$

,

$\mathbf {h}_{1} \! = \! \max \left ({{ \pmb {\tau }_{c}, \pmb {0} }}\right) \odot \mathrm {sign} \left ({{ \max \left ({{ \mathbf {y}, \pmb {0}}}\right) }}\right) \! + \! \max \left ({{- \pmb {\tau }_{c}, \pmb {0} }}\right) \odot \mathrm {sign} \left ({{ \max \left ({{ - \mathbf {y}, \! \pmb {0}}}\right) }}\right)$

,

$\mathbf {h}_{2} \! = \! \max \left ({{ \pmb {\nu }_{c},\! \pmb {0} }}\right) \odot \mathrm {sign} \left ({{ \max \left ({{ \mathbf {y}, \! \pmb {0}}}\right) }}\right) + \max \left ({{ -\pmb {\nu }_{c}, \pmb {0} }}\right) \odot \mathrm {sign} \left ({{ \max \left ({{ - \mathbf {y}, \pmb {0}}}\right) }}\right)$

.

Proof:

Given the available database X and the current estimate of the linear map W, the representation estimation problem is formulated in (23). Let $\mathbf {y} = \mathbf {y}_{i}$ and $\mathbf {x} = \mathbf {x}_{i}$ , the above problem per single $\mathbf {y}_{i}$ reduces to:

$\begin{align*} & \min _{\mathbf {y}} \frac {1}{2} \Vert \mathbf {W} \mathbf {x} - \mathbf {y} \Vert _{2}^{2} + \lambda _{1} \pmb {1}^{T} \vert \mathbf {y} \vert + \lambda _{\textsf {disc}} \Big ( ( \mathbf {y}^{+} )^{T} \pmb {\tau }^{+}_{c} + (\mathbf {y}^{-})^{T} \pmb {\tau }^{-}_{c} \\ & \quad + ( \pmb {\tau } _{c} \odot \pmb {\tau }_{c} )^{T} ( \mathbf {y} \odot \mathbf {y} ) - \left [{{ ( \mathbf {y}^{+} )^{T} \pmb {\nu }^{+}_{c} + (\mathbf {y}^{-})^{T} \pmb {\nu }^{-}_{c} }}\right ] \Big ). \tag {26}\end{align*}$ View Source

By denoting:

$\begin{align*} \mathbf {u} & =\mathbf {W} \mathbf {x} \tag {27}\\ \mathbf {h}_{1}^{T} \vert \mathbf {y} \vert & =\left ({{ \mathbf {y}^{+} }}\right )^{T} \pmb {\tau }^{+} + \left ({{ \mathbf {y}^{-} }}\right )^{T} \pmb {\tau }^{-}, \tag {28}\\ \mathbf {h}_{2}^{T} \vert \mathbf {y} \vert & =\left ({{ \mathbf {y}^{+} }}\right )^{T} \pmb {\nu }^{+} + \left ({{ \mathbf {y}^{-} }}\right )^{T} \pmb {\nu }^{-}, \tag {29}\\ \mathbf {s}_{c}^{T} & =\lambda _{\textsf {disc}} {\left ({{ \pmb {\tau }_{c} \odot \pmb {\tau }_{c} }}\right )}^{T}, \tag {30}\\ \mathbf {g}_{i}^{T} & =\lambda _{\textsf {disc}} \left ({{ \mathbf {h}_{1}^{T} - \mathbf {h}_{2}^{T} }}\right ), \tag {31}\end{align*}$

View Source

where

$\mathbf {h}_{1} \! \! = \! \max \! \left ({{ \pmb {\tau }_{c}, \pmb {0} }}\right) \odot \mathrm {sign} \! \left ({{ \max \left ({{ \mathbf {y}, \pmb {0}}}\right) }}\right) + \max \left ({{ - \pmb {\tau }_{\! c}, \pmb {0} }}\right) \odot \mathrm {sign} \left ({{ \max \left ({{ - \mathbf {y}, \pmb {0}}}\right) }}\right)$

,

$\mathbf {h}_{2} = \max \left ({{ \pmb {\nu }_{\! c}, \pmb {0} }}\right) \odot \mathrm {sign} \left ({{ \max \left ({{ \mathbf {y}, \pmb {0}}}\right) }}\right) + \max \left ({{ - \pmb {\nu }_{\! c}, \pmb {0} }}\right) \odot \mathrm {sign} \left ({{ \max \left ({{ - \mathbf {y}, \pmb {0}}}\right) }}\right)$

, the problem is simplified as:

$\begin{equation*} \min _{\mathbf { y}} \frac {1}{2} { \Vert \mathbf {y} - \mathbf {u} \Vert }_{2}^{2} + \mathbf {g}_{i}^{T} \vert \mathbf {y} \vert + \mathbf {s}_{c}^{T} \left ({{ \mathbf {y} \odot \mathbf {y} }}\right ) + \lambda _{1} \pmb {1}^{T} \vert \mathbf {y} \vert. \tag {32}\end{equation*}$

View Source

Taking the first order derivative w.r.t y and using the sign magnitude decomposition of

$\mathbf {y} = \mathrm {sign} \left ({{ \mathbf {y} }}\right) \odot \vert \mathbf {y} \vert$

and

$\mathbf {u} = \mathrm {sign} \left ({{ \mathbf {u} }}\right) \odot \vert \mathbf {u} \vert$

gives:

$\begin{align*} & \hspace {-0.2pc} \mathrm {sign} \left ({{ \mathbf {y} }}\right ) \odot \vert \mathbf {y} \vert \odot \left ({{ \pmb {1} + 2 \mathbf {s}_{c}}}\right ) - \mathrm {sign} \left ({{ \mathbf {u} }}\right ) \odot \vert \mathbf {u} \vert + \lambda _{1} \mathrm {sign} \left ({{ \mathbf {y} }}\right ) \\ & \quad + \lambda _{\textsf {disc}} \big ( \mathrm {sign} \left ({{ \mathbf {y}^{+} }}\right ) \odot \pmb {\tau }_{c}^{+} + \mathrm {sign} \left ({{ \mathbf {y}^{-} }}\right ) \odot \pmb {\tau }_{c}^{-} \\ & \quad - \mathrm {sign} ( \mathbf {y}^{+} ) \odot \pmb {\nu }_{c}^{+} - \mathrm {sign} ( \mathbf {y}^{-} ) \odot \pmb {\nu }_{c}^{-} \big ) = \pmb {0}. \tag {33}\end{align*}$

View Source

Let

$\mathrm {sign} (\mathbf {y})\! \! =\! \! \mathrm {sign} (\mathbf {u})$

, then by Hadamard multiplying from the left side by

$\mathrm {sign}\! \left ({{ \mathbf {u} }}\right)$

and noting that

$\mathrm {sign} \! \left ({{ \mathbf {u} }}\right) \odot \mathrm {sign} \left ({{\! \mathbf {u}^{+}\! }}\right) \! =\! \mathrm {sign} \left ({{ \mathbf {u}^{+} }}\right)$

,

$\mathrm {sign} \left ({{ \mathbf {u} }}\right) \odot \mathrm {sign} \left ({{ \mathbf {u}^{-} }}\right) = \mathrm {sign} \left ({{ \mathbf {u}^{-} }}\right)$

and taking into account the positive values for magnitude we have:

$\begin{align*} & \hspace {-0.2pc}\vert \mathbf {y} \vert \odot \left ({{ \pmb {1} + 2 \mathbf {s}_{c}}}\right ) \\ & = \max \Big ( \vert \mathbf {u} \vert - \lambda _{\textsf {disc}} \big ( \mathrm {sign} \left ({{ \mathbf {u}^{+} }}\right ) \odot \pmb {\tau }_{c}^{+} - \mathrm {sign} \left ({{ \mathbf {u}^{-} }}\right ) \\ & \quad \odot \pmb {\tau }_{c}^{-} - \mathrm {sign} \left ({{ \mathbf {u}^{+} }}\right ) \odot \pmb {\nu }_{c}^{+} + \mathrm {sign} \left ({{ \mathbf {u}^{-} }}\right ) \odot \pmb {\nu }_{c}^{-} \big ) - \lambda _{1} \pmb {1}, \pmb {0} \Big ), \tag {34}\end{align*}$

View Source

Note that

$\mathbf {h}_{1} = \mathrm {sign} \left ({{ \mathbf {u}^{+} }}\right) \odot \pmb {\tau }^{+} - \mathrm {sign} \left ({{ \mathbf {u}^{-} }}\right) \odot \pmb {\tau }^{-}$

and

$\mathbf {h}_{2} = \mathrm {sign} \left ({{ \mathbf {u}^{+} }}\right) \odot \pmb {\nu }^{+} - \mathrm {sign} \left ({{ \mathbf {u}^{-} }}\right) \odot \pmb {\nu }^{-}$

. Therefore, the closed-form solution of problem (26) is given as:

$\begin{equation*} \mathbf {y} \! = \! \mathrm {sign} \left ({{ \mathbf {u} }}\right ) \odot \max \left ({{ \vert \mathbf {u} \vert - \mathbf {g}_{i} - \lambda _{1} \pmb {1}, \pmb {0} }}\right ) \oslash \left ({{ \pmb {1} + 2 \mathbf {s}_{c}}}\right ), \tag {35}\end{equation*}$

View Source

which completes the proof.

Assignment: During the second step, given all the candidate representations $\mathbf {u}_{i,c}, c \in [1, \cdots, C]$ , we evaluate the scores using the composite function:

$\begin{equation*} \textsf {S}(\mathbf { q}_{i,c}, \left [{{ \pmb {\nu }_{c}, \pmb {\tau }_{c} }}\right ])\!=\!\textsf {Sim} \left ({{ \mathbf {q}_{i,c}, \pmb {\tau }_{c} }}\right ) \!+\! \textsf {Stg} \left ({{ \mathbf {y}_{i}, \pmb {\tau }_{c} }}\right ) \!+\! \textsf {Sim} \left ({{ \mathbf {y}_{i}, \pmb {\nu }_{c}}}\right ).\end{equation*}$ View Source

Based on these scores, we assign the data point

$\mathbf { x}_{i}$

to a group characterized by the index

$\widehat {c} \in [1,\cdots, C]$

. Concurrently, we choose the representation

$\mathbf {q}_{i, \widehat {c}}$

, that results in a minimal evaluation score. Specifically, this assignment is governed by:

$\begin{align*} \widehat {c} & = \arg \min _{c} \, \textsf {S} (\mathbf { q}_{i,c}, \pmb {\nu }_{c}, \pmb {\tau }_{c}), \tag {36}\\ {\widehat {\mathbf { y}}}_{i} & = \mathbf { q}_{i, \widehat {c}} \;. \tag {37}\end{align*}$

View Source

2) Group Parameters $\pmb {\theta }$ Update

Given the current estimate of the linear map W and representations $\mathbf {y}_{i}$ , we can reformulate the problem (22) as follows:

$\begin{align*} \widehat {\pmb {\theta }}& =\arg \min _{\pmb {\theta }} \sum _{i=1}^{K} \big [ \lambda _{\textsf {disc}} \, d_{\textsf {disc}}\! \left ({{ \mathbf {y}_{i}, \pmb {\theta } }}\right ) \\ & \quad + \lambda _{\textsf {adj}} \, d_{\textsf {adj}} \! \left ({{ \mathbf {u}_{i}, \pmb {\theta } }}\right ) \! \big ) + \lambda _{\textsf {p}} \, d_{\textsf {p}}\! \left ({{ \pmb {\theta } }}\right ) \big ]. \tag {38}\end{align*}$ View Source

With the reformulated problem in place, our solution strategy employs a distinct two-phase procedure. In the first phase, we focus on the update mechanism for single parameters

$\pmb {\nu }_{c}$

. Subsequently, the second phase addresses the intricacies of the

$\pmb {\tau }_{c}$

parameters update. The division into these phases is derived from the inherent structure of the problem, ensuring that each component is accurately captured and updated. The specifics of each phase are presented in the following sections.

Single $\pmb {\nu }_{c}$ Parameter Update: Given $\{ \pmb {\theta }_{1},\cdots, \pmb {\theta }_{c-1}, \pmb {\theta }_{c+1},\cdots, \pmb {\theta }_{C} \}$ , problem (38) per $\pmb {\nu }_{c}$ reduces to

$\begin{align*} (\mathsf {P_{T_{1}}}):\widehat {\pmb {\nu }}_{c}& = \arg \min _{\pmb {\nu }_{c}} \sum _{i \in {\mathcal {I}}_{c} } \frac {1}{2} \Vert \mathbf {u}_{i} - \pmb {\nu }_{c} - \pmb {\tau }_{c} \Vert _{2}^{2} \\ & \quad + \lambda _{\textsf {disc}} \!\!\!\!\!\!\! \sum _{d \in \{1,\cdots, C\}\setminus c } \!\!\!\! d_{\textsf {disc}} \left ({{ \pmb {\nu }_{c}, \pmb {\theta }_{d} }}\right ),\!\! \tag {39}\end{align*}$ View Source

where

${\mathcal {I}}_{c}$

is a set, which contains all indexes i for data

$\mathbf { x}_{i}$

that were assigned to group indexed by c. The solution for

$(\mathsf {P_{T_{1}}})$

aligns structurally with the solution from (25). Notably, they differ in their respective thresholding and normalization vectors.

Proof:

Given $\pmb {\theta }_{s} \! =\! \{ \pmb {\theta }_{1 \setminus c},\! \pmb {\theta }_{2}\! \}$ , problem (38) per $\pmb {\nu }_{c1}$ reduces to:

$\begin{align*} & \hspace {-0.2pc} \min _{\pmb {\nu }_{c1}} \sum _{m} \frac {1}{2} \Vert \mathbf {W}\mathbf {x}_{c1,m} - \pmb {\nu }_{c1} - \pmb {\tau }_{c1} \Vert _{2}^{2} \\ & \quad + \lambda _{\textsf {disc}} \sum _{m} \left ({{ \textsf {Sim} \left ({{ \mathbf {y}_{c1,m}, \pmb {\tau }_{c1} }}\right ) }}\right. \\ & \quad - \left. {{\textsf {Sim} \left ({{ \mathbf {y}_{c1,m}, \pmb {\nu }_{c1} }}\right ) + \textsf {Stg} \left ({{ \mathbf {y}_{c1,m}, \pmb {\tau }_{c1} }}\right ) }}\right ) \\ & \quad + \lambda _{\textsf {p}} \sum _{c \neq c1} \left ({{ \textsf {Sim} \left ({{ \pmb {\nu }_{c1}, \pmb {\tau }_{c} }}\right ) \!- \!\textsf {Sim} \left ({{ \pmb {\nu }_{c1}, \pmb {\nu }_{c} }}\right ) \! + \! \textsf {Stg} \left ({{\pmb {\nu }_{c1}, \pmb {\tau }_{c} }}\right ) }}\right ). \tag {40}\end{align*}$ View Source

Let

$\mathbf {v} = \pmb {\nu }_{c1}$

and

$\mathbf {u} = \sum _{m} \mathbf {W} \mathbf {x}_{c1,m} - \pmb {\tau }_{c1}$

. The first order derivative with respect to v is:

$\begin{align*} & \hspace {-0.2pc}\left ({{ M\mathbf {v} - \mathbf {u} }}\right ) \! - \! \lambda _{\textsf {disc}} \sum _{m} \big ( \mathrm {sign} \left ({{ \mathbf {v}^{+} }}\right ) \odot \mathbf {y}_{c1,m}^{+} \! -\! \mathrm {sign} \left ({{ \mathbf {v}^{-} }}\right ) \odot \mathbf {y}_{c1,m}^{-} \Big ) \\ & \quad + \, \lambda _{\textsf {p}} \sum _{c \neq c1} \Big ( \mathrm {sign} \left ({{ \mathbf {v}^{+} }}\right ) \odot \pmb {\tau }_{c}^{+} - \mathrm {sign} \left ({{ \mathbf {v}^{-} }}\right ) \odot \pmb {\tau }_{c}^{-} \\ & \quad -\, \mathrm {sign} \left ({{ \mathbf {v}^{+} }}\right ) \odot \pmb {\nu }_{c}^{+} + \mathrm {sign} \left ({{ \mathbf {v}^{-} }}\right ) \odot \pmb {\nu }_{c}^{-} \\ & \quad +\, 2 \; \mathbf {v} \odot \left ({{ \pmb {\tau }_{c} \odot \pmb {\tau }_{c} }}\right ) \Big ). \tag {41}\end{align*}$

View Source

Denote:

$\begin{align*} \mathbf {h}_{y}^{T} \vert \mathbf {v} \vert & ={\left ({{\mathbf {v}^{+} }}\right )}^{T} \mathbf {y}_{c1,m}^{+} - {\left ({{ \mathbf {u}^{-} }}\right )}^{T} \mathbf {y}_{c1,m}^{-}, \\ \mathbf {h}_{1}^{T} \vert \mathbf {v} \vert & ={\left ({{ \mathbf {v}^{+} }}\right ) }^{T} \pmb {\tau }_{c}^{+} - {\left ({{ \mathbf {u}^{-} }}\right )}^{T} \pmb {\tau }_{c}^{-}, \\ \mathbf {h}_{2}^{T} \vert \mathbf {v} \vert & ={\left ({{ \mathbf {v}^{+} }}\right ) }^{T} \pmb {\nu }_{c}^{+} - {\left ({{ \mathbf {u}^{-} }}\right )}^{T} \pmb {\nu }_{c}^{-}, \\ \mathbf {s}_{c}^{T} & = \lambda _{\textsf {p}} {\left ({{ \pmb {\tau }_{c} \odot \pmb {\tau }_{c} }}\right )}^{T}, \\ \mathbf {g}_{c}^{T} & = \lambda _{\textsf {p}} \left ({{ \mathbf {h}_{1}^{T} - \mathbf {h}_{2}^{T} }}\right ), \\ \mathbf {p}_{c}^{T} & =\lambda _{\textsf {disc}} \mathbf {h}_{y}^{T}, \tag {42}\end{align*}$ View Source

where

$\begin{align*} \mathbf {h}_{y} & = \sum _{m} \max \left ({{ \mathbf {y}_{c1,m}, \pmb {0} }}\right ) \odot \mathrm {sign} \left ({{ \max \left ({{ \mathbf {u}, \pmb {0} }}\right ) }}\right ) \\ & \quad - \sum _{m} \max \left ({{ -\mathbf {y}_{c1,m}, \pmb {0} }}\right ) \odot \mathrm {sign} \left ({{ \max \left ({{ - \mathbf {u}, \pmb {0} }}\right ) }}\right ), \tag {43}\\ \mathbf {h}_{1} & = \sum _{c \neq c1} \max \left ({{ \pmb {\tau }_{c}, \pmb {0} }}\right ) \odot \mathrm {sign} \left ({{ \max \left ({{ \mathbf {u}, \pmb {0} }}\right ) }}\right ) \\ \ & \quad - \sum _{c \neq c1} \max \left ({{ - \pmb {\tau }_{c}, \pmb {0} }}\right ) \odot \mathrm {sign} \left ({{ \max \left ({{ - \mathbf {u}, \pmb {0} }}\right ) }}\right ), \tag {44}\\ \mathbf {h}_{2} & = \sum _{c \neq c1} \max \left ({{ \pmb {\nu }_{c}, \pmb {0} }}\right ) \odot \mathrm {sign} \left ({{ \max \left ({{ \mathbf {u}, \pmb {0} }}\right ) }}\right ) \\ & \quad - \sum _{c \neq c1} \max \left ({{ - \pmb {\nu }_{c}, \pmb {0} }}\right ) \odot \mathrm {sign} \left ({{ \max \left ({{ - \mathbf {u}, \pmb {0} }}\right ) }}\right ), \tag {45}\end{align*}$

View Source

Take the magnitude decomposition $\mathbf {v} = \mathrm {sign} \left ({{ \mathbf {v} }}\right) \odot \vert \mathbf {v} \vert$ and $\mathbf {u} = \mathrm {sign} \left ({{ \mathbf {u} }}\right) \odot \vert \mathbf {u} \vert$ and let $\mathrm {sign} \left ({{ \mathbf {v} }}\right) = \mathrm {sign} \left ({{ \mathbf {u} }}\right)$ . By Hadamard multiplying from the left side by $\mathrm {sign} \left ({{ \mathbf {u} }}\right)$ and noting that the magnitude can be only positive, the closed-form solution is simplified as:

$\begin{equation*} \mathbf {v} = \mathrm {sign} \left ({{ \mathbf {u} }}\right ) \odot \max \left ({{ \vert \mathbf {u} \vert + \mathbf {p}_{c} - \mathbf {g}_{c}, \pmb {0} }}\right ) \oslash \left ({{ M + 2 \mathbf {s}_{c} }}\right ), \tag {46}\end{equation*}$ View Source

which completes the proof.

Single $\pmb {\tau }_{c}$ Parameter Update: Given $\{ \pmb {\theta }_{1},\cdots, \pmb {\theta }_{c-1}, \pmb {\theta }_{c+1}, \cdots, \pmb {\theta }_{C} \}$ , problem (38) per $\pmb {\tau }_{c}$ reduces to:

$\begin{align*} \!\!\!\! (\mathsf {P_{T_{2}}}):\widehat {\pmb {\tau }}_{c}& = \arg \min _{\pmb {\nu }_{c}} \sum _{ i \in {\mathcal {I}}_{c} } \frac {1}{2} \Vert \mathbf {u}_{i} - \pmb {\nu }_{c} - \pmb {\tau }_{c} \Vert _{2}^{2} \\ & \quad + \lambda _{\textsf {disc}} \!\!\!\!\!\!\! \sum _{d \in \{1,\cdots, C\} \setminus c } \!\!\!\! d_{\textsf {disc}} \left ({{ \pmb {\tau }_{c}, \pmb {\theta }_{d} }}\right ). \tag {47}\end{align*}$ View Source

Similar to the previous parameter update, the solution for this problem adheres structurally to (25). Nevertheless, the distinction between the two lies in their thresholding and normalization vectors.

3) Linear Map $\mathbf{W}$ Update

Consider a given data set X, its corresponding representations Y, and the group membership parameters $\pmb {\theta }$ . With these, the problem in (22) can be restructured specifically for the linear map W update as:

$\begin{align*} \widehat {\mathbf {W}} & = \arg \min _{\mathbf {W}} \frac {1}{2} \Vert \mathbf {W} \mathbf {X} - \mathbf {R}\Vert _{F}^{2} + \frac {\lambda _{\textsf {W}_{1}}}{2} {\Vert \mathbf {W} \Vert }_{F}^{2} \\ & \quad + \,\frac {\lambda _{\textsf {W}_{2}}}{2} {\Vert \mathbf {W} \mathbf {W}^{T} \! \! - \mathbf {I}\Vert }_{F}^{2} - \frac {\lambda _{\textsf {W}_{3}}}{2} \log \vert \! \det \mathbf {W}^{T} \mathbf {W} \vert, \!\! \tag {48}\end{align*}$ View Source

where we denote

$\mathbf { R}=\left [{{ \mathbf { r}_{1}, \cdots, \mathbf { r}_{L} }}\right]$

with

$\mathbf {r}_{i} = \mathbf {y}_{i} - \lambda _{\textsf {adj}}\! \left ({{ \pmb {\nu }_{c} \! +\! \pmb {\tau }_{c} }}\right)$

. Furthermore,

$\lambda _{\textsf {W}_{1}}$

,

$\lambda _{\textsf {W}_{2}}$

, and

$\lambda _{\textsf {W}_{3}}$

are inversely related to the scaling parameters

$\beta _{\textsf {W}_{1}}$

,

$\beta _{\textsf {W}_{2}}$

, and

$\beta _{\textsf {W}_{3}}$

, respectively. The solution to this problem aligns with the method proposed in [15].

SECTION VII.

Numerical Evaluation

In this section, we assess the efficacy of our proposed scheme through its application in face recognition across two scenarios. We then compare its performance against established methods: EoA-ML, AoE-ML [10], and JLAR [12]. While EoA-ML and AoE-ML enroll K individuals across C random groups without joint optimization, both our proposed method and JLAR are distinct in that they jointly learn group assignments and representations.

A. Face Datasets

We extract face descriptors using a network that has been pre-trained on the VGG-Face architecture. These descriptors are then processed through PCA to reduce their dimensionality. Subsequently, the reduced descriptors are $L_{2}$ -normalized, resulting in standardized feature vectors of size $N = 1,024$ .

1) CFP

The Celebrities in Frontal-Profile (CFP) database encompasses 500 individuals, each represented by 10 frontal and 4 profile images captured in unconstrained environments. From this collection, we exclusively utilize $K=400$ frontal images. For the purpose of defining impostors, a subset of $K_{q} = 100$ individuals is randomly chosen.

2) LFW

The Labeled Faces in the Wild (LFW) dataset includes 13,233 facial images collected from the internet. In our study, we use their pre-aligned versions. The enrollment set is derived from $K=1680$ individuals, each represented by at least two images within the LFW database. For each individual, one random template is enrolled as $\mathbf {x}_{i}$ . A separate group of $K_{q} = 263$ individuals is randomly selected from the dataset to represent impostors.

B. Scenario #1: Group Verification

Consider a scenario where a user claims their membership to group c. We define two hypotheses: ${\mathcal {H}}_{1}$ affirms the user’s claim, while ${\mathcal {H}}_{0}$ refutes it, categorizing the user as an impostor. The user’s signature, denoted by q, is transformed into $\mathbf {q}_{c}$ using the function ${\mathcal {T}}_{{\mathcal {P}}_{c}}(\mathbf {q})$ , where ${\mathcal {P}}_{c}$ are nonlinear transform parameters. Once embedded, both the signature $\mathbf {q}_{c}$ , and its associated group c are transmitted to the system. The system then contrasts $\mathbf {q}_{c}$ with the group’s representative parameters $\pmb {\theta }_{c}$ to decide on acceptance ( $t = 1$ ) or rejection ( $t=0$ ). This constitutes a binary hypothesis test with two potential error outcomes: The false positive rate, denoted as $P_{\textsf {fp}}:=\mathbb {P}(t=1|{\mathcal {H}}_{0})$ , and the false negative rate, represented as $P_{\textsf {fn}}:=\mathbb {P}(t=0|{\mathcal {H}}_{1})$ . The figure of merit is $P_{\textsf {fn}}$ when $P_{\textsf {fp}} = 0.05$ .

Figure. 4 provides a comparative analysis, illustrating the enhanced efficiency of our scheme in group verification relative to other baseline methods. Complementing this, TABLE 1 offers a detailed quantitative evaluation for four specific group sizes $M = 5, 10, 20, 40$ , with all values rounded to three decimal places. Particularly, our method demonstrates marked benefits on the CFP dataset. Furthermore, the observed improvements with both JLAR and our scheme become considerably more noticeable for larger group sizes, suggesting that grouping similar vectors leads to minimal information loss.

TABLE 1 Performances on Group Verification for Varying Group Size

$M$ .

$P_{\textsf {fn}}$ at

$P_{\textsf {fp}}=0.05$ for CFP (Up) and LFW (Down)

$Table 1- Performances on Group Verification for Varying Group Size $M$ . $P_{\textsf {fn}}$ at $P_{\textsf {fp}}=0.05$ for CFP (Up) and LFW (Down)$

$FIGURE 4 - Performances on group verification for varying group size M. $P_{\textsf {fn}}$ at $P_{\textsf {fp}}=0.05$ for CFP (left) and LFW (right).$

FIGURE 4

Performances on group verification for varying group size M. $P_{\textsf {fn}}$ at $P_{\textsf {fp}}=0.05$ for CFP (left) and LFW (right).

Show All

C. Scenario #2: Group Identification

This scenario addresses open set identification, wherein a querying user may either be enrolled or categorized as an impostor. The system’s identification procedure bifurcates into two phases. In the first phase, the system determines whether the user is enrolled. While this mirrors the verification described previously, the pivotal distinction lies in the group’s indeterminacy. The system calculates the min-max score for each group, denoted as $\delta _{c} = \textsf {S}(\mathbf {q}_{c}, \pmb {\theta }_{c})$ for all $c\in [C]$ . An acceptance decision $(t=1)$ is rendered if the lowest score among these C scores is below a specified threshold $\tau$ . The figure of merit is $P_{\textsf {fn}}$ when $P_{\textsf {fp}} = 0.05$ . Upon achieving $t=1$ in the first phase, the system advances to the second. Here, the system deduces the likely group membership, represented by $\widehat {c} = \arg \min _{c\in [C]} \delta _{c}$ . Two pivotal figures of merit (performance metrics) of this phase: the error probability, $P_{\epsilon }:= \mathbb {P}(\widehat {c}\neq c)$ , and the Detection and Identification Rate (DIR), defined as $\textsf {DIR} := (1-P_{\epsilon }) (1-P_{\textsf {fn}})$ .

The performance metrics for the group identification scenario are delineated in Fig 5. These metrics highlight the enhancements that our proposed scheme offers in this scenario. Table 2 provides a detailed quantitative comparison for four group sizes $M = 5, 10, 20, 40$ , with all values rounded to three decimal places, corresponding to the results depicted in Figure 5. Additionally, Figure 6 shows the impact of group size on the Detection and Identification Rate (DIR) for the CFP dataset. Evidently, packing more templates into one group will cause performance deterioration. Table 3 provides a quantitative comparison for three false positive ratea $P_{\textsf {fp}} = 0.05, 0.2, 0.4$ , also rounded to two decimal places, corresponding to the results depicted in Figure 6.

TABLE 2 Performances on Group Identification for Varying Group Size

$M$ on CFP (Up) and LFW (Down)

$P_{\textsf {fn}}$ at

$P_{\textsf {fp}}=0.05$ for the First Step of Group Identification

$Table 2- Performances on Group Identification for Varying Group Size $M$ on CFP (Up) and LFW (Down) $P_{\textsf {fn}}$ at $P_{\textsf {fp}}=0.05$ for the First Step of Group Identification$

TABLE 3 The Detection and Identification Rate (DIR) vs.

$P_{\textsf {fp}}$ on CFP for Group Size

$M = 16$ (Up) and

$M = 10$ (Down)

$Table 3- The Detection and Identification Rate (DIR) vs. $P_{\textsf {fp}}$ on CFP for Group Size $M = 16$ (Up) and $M = 10$ (Down)$

$FIGURE 5 - Performances on group identification for varying group size M on CFP (left) and LFW (right). $P_{\textsf {fn}}$ at $P_{\textsf {fp}}=0.05$ for the first step (solid line) and $P_{\epsilon }$ for the second step of group identification (dashed).$

FIGURE 5

Performances on group identification for varying group size M on CFP (left) and LFW (right). $P_{\textsf {fn}}$ at $P_{\textsf {fp}}=0.05$ for the first step (solid line) and $P_{\epsilon }$ for the second step of group identification (dashed).

Show All

$FIGURE 6. - The Detection and Identification Rate (DIR) vs. $P_{\textsf {fp}}$ on CFP.$

FIGURE 6.

The Detection and Identification Rate (DIR) vs. $P_{\textsf {fp}}$ on CFP.

Show All

Figure 7 presents some enrolled and querying faces of LFW dataset. All the failed identification examples show a change of lighting, pose or expression or occlusion.

FIGURE 7.

Examples of group identification on LFW. black frames (enrolled samples), green frames (successful queries) and red frames (failed queries).

Show All

SECTION VIII.

Conclusion

We introduce a new group membership methodology which achieves two primary objectives: (i) it jointly learns nonlinear transform representations, incorporating prior information, and (ii) it determines group representatives utilizing a maximum likelihood approach grounded in functional measures. We have further proposed an efficient algorithm tailored for the optimal estimation of model parameters. Evaluations of our proposed framework were conducted on image databases, underscoring its applicability and proficiency in face verification and identification tasks.

Group Membership Verification via Nonlinear Sparsifying Transform Learning

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

A. Related Work

B. Contribution

C. Outline

D. Notation

Preliminaries on Sparse Data Representation

A. Synthesis Model

B. Analysis Model

C. Transform Model

Proposed Framework Overview

A. Assignment and Learning Approach

B. Setup

C. Framework Overview

1) Preparation at Owner Side

2) Querying at Client Side

3) Searching at Server Side

Assignment Principle and Probabilistic Model

A. Group Membership Assignment Principle

1) Candidate Representation Generation

2) Group and NT Representation Assignment

B. Probabilistic Model

Addressing Probabilistic Modeling Priors

A. Sparsifying Error and Model Parameters Adjustment Error Prior

B. Sparsity and Discrimination Prior

1) Quantifying Representation Similarity and Dissimilarity

2) Min-Max Discrimination Prior Measure

3) Model Parameters \pmb {\theta }\pmb {\theta } Prior Measure

C. Linear Map Prior

Learning Model to Assign Group Membership

A. Problem Formulation

B. Learning Algorithm

1) NT Representation Estimation and Assignment

Proof:

2) Group Parameters \pmb {\theta }\pmb {\theta } Update

Proof:

3) Linear Map \mathbf{W}\mathbf{W} Update

Numerical Evaluation

A. Face Datasets

1) CFP

2) LFW

B. Scenario #1: Group Verification

C. Scenario #2: Group Identification

Conclusion

ACKNOWLEDGMENT

References

IEEE Account

Purchase Details

Profile Information

Need Help?

3) Model Parameters $\pmb {\theta }$ Prior Measure

2) Group Parameters $\pmb {\theta }$ Update

3) Linear Map $\mathbf{W}$ Update