I. Introduction
The active sequential learning problem naturally arises in many complex inference, sensing, and control settings, e.g., tree-search [1], sequential design of experiments [2] (an example of which is the game of twenty questions [3]), and the multi-armed bandit [4] problems. These problems typically involve an estimation or adaptive control task which is based on sequential sensing of the environment—in the sense that instead of passively collecting all the observations of the environment at once, the agent seeks to actively and sequentially query the environment to build up its knowledge about the problem instance, in order to produce a final optimal solution. In other words, at each time step of the query procedure, the agent can adaptively design its sensing strategy based on the information obtained from all the previous steps. The design of active sensing strategies is a highly nontrivial task. The goal of this paper is to develop a unified deep learning framework to design active sensing procedures in a data-driven way.