1. INTRODUCTION
Training of deep learning methods requires large amounts of data, and they usually perform better when training data size is increased. Yet, for some applications, it is not always possible to obtain more data when the existing dataset is not large enough. Even though the raw data can be collected easily, data labeling or annotation is difficult, expensive and time consuming. Successors of [1] yielded better accuracies with less number of parameters on the same benchmark with some architectural modifications using the same building blocks. This shows that the choice of parameters, and the design of the architecture are important factors affecting the performance. Many researchers proposed different CNN architectures [2], [3], [4], [5], [6], [7], [8] to achieve higher accuracy.