I. Introduction
Dimensionality reduction is a crucial step in machine learning problems, especially when the number of samples is small when compared with data dimensionality. Extracting important features from data reduces dimensionality while preserving salient information that can help in learning predictive models with high generalization. With the assumption that such important factors are redundantly represented in multiple ways/views in the observed data, the canonical correlation analysis (CCA) can effectively be used to extract dependence (mutual information) between two given views [1]–[5]. Here, the term view is used to refer each related set of features about the same underlying phenomenon.