1. Introduction
“Deep learning. How well do you think it would work for your computer vision problem?” Most likely this question has been posed in your group's coffee room. And in response someone has quoted recent success stories [29], [15], [10]– and someone else professed skepticism. You may have left the coffee room slightly dejected thinking “Pity I have neither the time, GPU programming skills nor large amount of labelled data to train my own network to quickly find out the answer”. But when the convolutional neural network OverFeat [38] was recently made publicly available
There are other publicly available deep learning implementations such as Alex Krizhevsky's ConvNet and Berkeley's Caffe. Benchmarking these implementations is beyond the scope of this paper.
it allowed for some experimentation. In particular we wondered now, not whether one could train a deep network specifically for a given task, but if the features extracted by a deep network - one carefully trained on the diverse ImageNet database to perform the specific task of image classification - could be exploited for a wide variety of vision tasks. We now relate our discussions and general findings because as a computer vision researcher you've probably had the same questions:Top) cnn representation replaces pipelines of s. o. a methods and achieve better results. E. g. Dpd [50]. Bottom) augmented cnn representation with linear svm consistently outperforms s. o. a. On multiple tasks. Specialized cnn refers to other works which specifically designed the cnn for their task