I. Introduction
Deep neural networks have brought breakthrough successes in many machine learning applications and much effort is invested towards improving the understanding of the empirically observed advantages of deep over shallow networks. Universality results guarantee that any function in (the space of Lebesgue integrable functions with for and finite essential supremum in case ) can be approximated arbitrarily well by shallow/deep neural networks provided one allows a sufficiently large number of neurons [1], [2], in particular for ReLU networks [3] in which the activation is the widely used rectifier [4], [5]. For shallow networks there is little connection between network properties and structure as all neurons are aligned in a single hidden layer. For deep networks, in contrast, arguments relating specific network architectures and properties have gained attention from many perspectives (e.g., philosophical [6], empirical [7] or approximation theoretical [8]). Bengio [9] argues that, say, an input image is transformed along the network layers into gradually higher level features, representing increasingly more abstract functions of the raw input. In a similar direction, theoretical results show that additional network layers refine the partition of the input space of the network [10], [11]. These results motivate us to study the property of coarse-to-fine function approximation and its realization via the network layer architecture. Is it possible to construct a universal deep network representation that can approximate any function along the layers with gradually improving accuracy? Answering this question by providing a construction not only improves the understanding of the role played by depth, but also provides hints on how to encode certain properties into the structure of a network. In this article, we demonstrate a step towards understanding the structure of deep networks via the property of the coarse-to-fine function approximation. To be precise, we consider deep feedforward ReLU networks consisting of of layers of affine linear operators (widths ) with the pointwise acting piecewise linear rectifier activation .
We present a short and direct construction of a particularly simple structured class of convolutional ReLU networks that provide universal approximation in a coarse-to-fine manner with increasing number of layers.
The network is sparsely connected (the number of outgoing edges from any node in the hidden layers is two) and localized in the sense that local changes in the function to be approximated only require local changes in the final layer of weights.
The Fourier and wavelet approach to function approximation and analysis is the use of basis functions. In the classical work on universality of shallow networks, basis functions and function values can be separated in the sense that basis functions are encoded via the activations while function values are specified via weights, providing a uniform way to approximate functions [12]. Our construction follows this spirit, with basis functions encoded in the network structure and function values specified in the final layer weights. The basis functions in our construction are unit interval characteristic functions and their shifts. We generalize our central observation on characteristic functions to all higher order univariate B-splines [13], showing that each can be derived via rectification from linear combinations of its shifts on the next coarser dyadic dilation level.