Journals & Magazines >IEEE Transactions on Neural N... >Volume: 33 Issue: 4

Fast Multiscale Neighbor Embedding

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Dimension reduction (DR) computes faithful low-dimensional (LD) representations of high-dimensional (HD) data. Outstanding performances are achieved by recent neighbor em...Show More

Metadata

Abstract:

Dimension reduction (DR) computes faithful low-dimensional (LD) representations of high-dimensional (HD) data. Outstanding performances are achieved by recent neighbor embedding (NE) algorithms such as

$t$ -SNE, which mitigate the curse of dimensionality. The single-scale or multiscale nature of NE schemes drives the HD neighborhood preservation in the LD space (LDS). While single-scale methods focus on single-sized neighborhoods through the concept of perplexity, multiscale ones preserve neighborhoods in a broader range of sizes and account for the global HD organization to define the LDS. For both single-scale and multiscale methods, however, their time complexity in the number of samples is unaffordable for big data sets. Single-scale methods can be accelerated by relying on the inherent sparsity of the HD similarities they involve. On the other hand, the dense structure of the multiscale HD similarities prevents developing fast multiscale schemes in a similar way. This article addresses this difficulty by designing randomized accelerations of the multiscale methods. To account for all levels of interactions, the HD data are first subsampled at different scales, enabling to identify small and relevant neighbor sets for each data point thanks to vantage-point trees. Afterward, these sets are employed with a Barnes–Hut algorithm to cheaply evaluate the considered cost function and its gradient, enabling large-scale use of multiscale NE schemes. Extensive experiments demonstrate that the proposed accelerations are, statistically significantly, both faster than the original multiscale methods by orders of magnitude, and better preserving the HD neighborhoods than state-of-the-art single-scale schemes, leading to high-quality LD embeddings. Public codes are freely available at https://github.com/cdebodt.

Published in: IEEE Transactions on Neural Networks and Learning Systems ( Volume: 33, Issue: 4, April 2022)

Page(s): 1546 - 1560

Date of Publication: 25 December 2020

ISSN Information:

PubMed ID: 33361004

DOI: 10.1109/TNNLS.2020.3042807

Funding Agency:

Contents

I. Introduction

Dimensionality reduction (DR) represents high-dimensional (HD) data sets into low-dimensional (LD) spaces, mostly for exploratory visualization or to dismiss the curse of dimensionality [1]. This curse encompasses the inherent difficulties to cope with HD data and motivated the development of adequate approaches to extract meaningful LD features [2], [3]. In visualization context, the relevance of a LD embedding is typically related to the HD neighborhood preservation. Mappings from HD to LD coordinates [1] formalize this neighborhood preservation principle in the context of paradigms, such as the reproduction of distances [4] or neighborhoods [5], [6]. Linear projections of the HD vectors include early principal component analysis (PCA) [7] and classical metric multidimensional scaling (MDS) [4], driven by the variance or dot product preservation. Nonlinear metric MDS extensions [8] define (weighted) distance preservation schemes relying on either Euclidean or approximated geodesic measures [9]. Affinity matrices may also be computed to guide the LD embedding tuning [10], [11]. However, these approaches are hardly superior to the older methods in visualization tasks [1], [12], [13], potentially because they can be expressed as classical MDS applied in an unknown feature space [14]. Distance-preserving schemes are particularly affected by the norm concentration phenomenon [15], which leads pairwise distances to become more and more similar as the dimension increases [16]. Meanwhile, neighbor embedding (NE) techniques as stochastic neighbor embedding (SNE) [6] and variants [13], [17] alleviate this phenomenon by matching neighbor probability distributions defined in both spaces to compute the LD points [16]. The resulting outstanding performances motivated developing alternative SNE-based models, with heavy-tailed distributions as in -SNE [13], [18], [19], divergence mixtures as cost functions [17], [20], [21], missing data management [22], enhanced optimization [23]–[26], and so on.

References is not available for this document.

MIT Libraries

MIT Libraries

Fast Multiscale Neighbor Embedding

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Fast Multiscale Neighbor Embedding

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References