I. Introduction
High-dimensional data are ubiquitous in various research fields, such as computer vision, bioinformatics and information retrieval. The existence of high-dimensional data increases the demand for computer hardware performance, such as memory and computing power. At the same time, due to the influence of abnormal points or noise, the performance of relevant algorithms is significantly reduced [1], [2]. Thus, numerous dimensionality reduction techniques are applied meaningfully to address these issues by discovering the optimal feature subset to represent original high-dimensional feature vectors. Data processed in this way is beneficial to reduce the running time and make the model to be optimized more compact and general [2].