I. Introduction
Artificial neural networks (ANNs) are used as parameterized non-linear models that serve as inductive bias for a large number of machine learning tasks, including notable applications of Reinforcement Learning (RL) to control problems [1]. While ANNs rely on clocked floating- or fixed-point operations on real numbers, Spiking Neural Networks (SNNs) operate in an event-driven fashion on spiking synaptic signals (see Fig. 1). Due to their lower energy consumption when implemented on specialized hardware, SNNs are emerging as an important alternative to ANNs that is backed by major technology companies, including IBM and Intel [2], [3]. Specifically, SNNs are considered to be important candidates as co-processors to be implemented in battery-limited mobile devices (see, e.g., [4]). Applications of SNNs, and of associated neuromorphic hardware, to supervised, unsupervised, and RL problems have been reported in a number of works, first in the computational neuroscience literature and more recently in the context of machine learning [5]–[7].
(a) SNN first-to-spike policy with action selected (r in the illustration) among up, down, left, and right marked with a bold line and decision time marked with a dashed vertical line; (b) an example of a realization of an action sequence in a windy grid-world problem.