I. Introduction
Big data is commonly defined as the way we gather, store, manipulate, analyze and get insight from a fast-increasing heterogeneous data. Most of the new generated data is unstructured due to the increase of mobile and human's unlimited generated data from social medias that combine text, pictures, audio, video, in an unstructured way. Unstructured data is a fast-increasing phenomenon than all other types of data, industry analysts say. It will increase by as much as 800 percent during the next five years according to a survey conducted by [1]. This urge the need to automatically characterize and categorize such data. These classifications are strongly coupled with the semantic meaning of what the data represents. In many cases, the data comes in a format and a quality state in which it is impossible to process immediately as it is, and if so, the results cannot guarantee a valuable analysis and insights.