1 Introduction
Over the last century, Component Analysis (CA) methods [1] such as Principal Component Analysis (PCA) [2], [3], Linear Discriminant Analysis (LDA) [4], [5], Canonical Correlation Analysis (CCA) [6], Locality Preserving Projections (LPP) [7], and Spectral Clustering (SC) [9] have been extensively used as a feature extraction step in modeling, clustering, classification, and visualization problems. The aim of CA techniques is to decompose a signal into relevant components that are optimal for a given task (e.g., classification, visualization). These components, explicitly or implicitly (e.g., kernel methods), define the representation of the signal. CA techniques are appealing for two main reasons. First, CA models typically have a small number of parameters, and therefore can be estimated using relatively few samples. CA techniques are especially useful to model high-dimensional data because, due to the curse-of-dimensionality, learning models typically requires a large number of samples. Second, many CA techniques can be formulated as eigen-problems, offering great potential for efficient learning of linear and nonlinear models without local minima. The use of eigen-solvers to address statistical problems dates back to the 1930s, and since then many numerically stable and efficient packages have been developed to solve eigen-problems. For these reasons, during the last century many computer vision, computer graphics, signal processing, and statistical problems were posed as problems of learning a low-dimensional CA model.