I. Introduction
Dimensionality reduction (DR) represents high-dimensional (HD) data sets into low-dimensional (LD) spaces, mostly for exploratory visualization or to dismiss the curse of dimensionality [1]. This curse encompasses the inherent difficulties to cope with HD data and motivated the development of adequate approaches to extract meaningful LD features [2], [3]. In visualization context, the relevance of a LD embedding is typically related to the HD neighborhood preservation. Mappings from HD to LD coordinates [1] formalize this neighborhood preservation principle in the context of paradigms, such as the reproduction of distances [4] or neighborhoods [5], [6]. Linear projections of the HD vectors include early principal component analysis (PCA) [7] and classical metric multidimensional scaling (MDS) [4], driven by the variance or dot product preservation. Nonlinear metric MDS extensions [8] define (weighted) distance preservation schemes relying on either Euclidean or approximated geodesic measures [9]. Affinity matrices may also be computed to guide the LD embedding tuning [10], [11]. However, these approaches are hardly superior to the older methods in visualization tasks [1], [12], [13], potentially because they can be expressed as classical MDS applied in an unknown feature space [14]. Distance-preserving schemes are particularly affected by the norm concentration phenomenon [15], which leads pairwise distances to become more and more similar as the dimension increases [16]. Meanwhile, neighbor embedding (NE) techniques as stochastic neighbor embedding (SNE) [6] and variants [13], [17] alleviate this phenomenon by matching neighbor probability distributions defined in both spaces to compute the LD points [16]. The resulting outstanding performances motivated developing alternative SNE-based models, with heavy-tailed distributions as in -SNE [13], [18], [19], divergence mixtures as cost functions [17], [20], [21], missing data management [22], enhanced optimization [23]–[26], and so on.