I. Introduction
In the last decade many advances in both deep learning and remote sensing fields has been achieved. These achievements created a variety of research opportunities, namely scene classification, object detection, surveillance and structural inspection. A significant body of research in remote sensing is focused on image descriptors. These descriptors use low-level feature extractors such as Scale Invariant Feature Transform (SIFT) or local binary patterns (LBP). In recent years many studies has been devoted to the same problem using high-level feature extractors like Convolutional Neural Networks (CNN) [1] [2]. There are number of problems in training a neural network and one of these problems is that as the depth of the neural network increases the time and computational cost for the training process is also increases. This is one of the limiting factors for using the deep neural networks in various applications. In order to solve this problem an identity mapping named residual unit is proposed [3].