A distributed rough set theory based algorithm for an efficient big data pre-processing under the spark framework | IEEE Conference Publication | IEEE Xplore