1 Introduction
Recently, multi-view clustering is becoming one of the researching hotspots in unsupervised learning. It is defined as a machine learning paradigm which gathers similar subjects into the same group and dissimilar ones into different groups by utilizing the available multi-view features, such that the complementary information and consistency among different views can be captured. These multi-view features are usually generated by various handcrafted feature extractors, for example, there are many heterogeneous handcrafted visual features including SIFT [1], LBP [2], and HOG [3]. Due to the success of deep learning, various kinds of deep neural networks, such as stacked autoencoder (SAE) [4], variational autoencoder (VAE) [5], and convolutional autoencoder (CAE) [6], have been proposed for unsupervised feature learning. The existence of these multi-view features raised the interest of multi-view clustering, in particular, the deep multi-view clustering, which is the main focus of this paper.