1. Introduction
Recent work by Zhou et al [34] has shown that the convolutional units of various layers of convolutional neural networks (CNNs) actually behave as object detectors despite no supervision on the location of the object was provided. Despite having this remarkable ability to localize objects in the convolutional layers, this ability is lost when fully-connected layers are used for classification. Recently some popular fully-convolutional neural networks such as the Network in Network (NIN) [13] and GoogLeNet [25] have been proposed to avoid the use of fully-connected layers to minimize the number of parameters while maintaining high performance.