I. Introduction
Active hypothesis testing is a fundamental decision-making problem underlying: experimental design in statistics and system identification [1]–[6]; active detection and estimation in signal processing and control [7]–[9]; and, active exploration and learning in robotics and machine learning [6], [10]–[12]. Despite its importance to many disciplines across numerous applications, active hypothesis testing remains largely unsolved due to the difficulty of computing its associated value functions (and dynamic programming equations). Most investigations of active hypothesis testing have thus resorted to studying the performance of heuristic strategies in the asymptotic regime of infinite sample size (or vanishing probability of error) [1], [7], [13]–[19]. Unfortunately, insight into the importance of feedback in active hypothesis testing problems can be lost under such asymptotic analysis, with [7] notably showing that asymptotic performance measures such as error exponents can be optimized without any feedback (despite feedback being known to be important to optimize nonasymptotic performance measures, see e.g., [18]). In this paper, we develop novel nonasymptotic results for active hypothesis testing by building on recent structural results for partially observed Markov decision processes (POMDPs).