I. Introduction
Modern machinery has become more integrated and complicated than ever, and inevitably, the condition monitoring (CM) system is applied to monitor machinery operating conditions. The condition monitoring of machinery tends to require rapidly increasing monitoring sensor numbers, sharply growing sampling frequencies and persistently extending monitoring durations, which has brought dramatic acceleration to the data accumulation process and led the machinery condition monitoring into the big data era [1]. The accumulated massive data are generally complicated in structure and low in information density. Namely, available data and dirty data mix up and interplay in complex manners [2]. Recognizing and eliminating the dirty data in datasets, which are caused by unknown machinery conditions, recording mistakes, etc. and do not conform an expected behavior, from the available data would help improve the data quality and promote the data analysis process. Thus it is worthwhile for researchers to investigate the dirty data recognition methods.