1. Introduction
In recent works addressing object recognition and scene classification tasks, the bag-of-words (BoW) is one of the most popular model for feature design. Inspired by the seminal work of [26], different approaches have been proposed to improve both its generative property to describe accurately images and its discriminatory power for classification. Despite remarkable progresses, it remains challenges concerning the extraction of local descriptors, codebook design, local descriptors coding and pooling, including a spatiallayout into the final feature, and the final classification.