I. Introduction
Game theory and learning have emerged as two of the foundational elements for decision-making in multiagent systems. Autonomous agents can use game theoretic learning to adjust their strategies in the short term through interactions with one another, while natural selection can influence their decisions over the longer term. Yet, when it comes to cyber-physical systems (CPS) [1], the problem of decision-making can become exceptionally difficult for an agent to solve, owing to either its inherent computational complexity or unreasonable assumptions imposed on the cognitive capabilities of the other agents; instances of such scenarios include CPS with humans-in-the-loop [2]; human–robot interactions [3]; and security [4], [5]. As a result, decision-makers often look for policies that seem satisfactory—based on empirical data—rather than policies that are optimal in theory, but fail when applied in practice [6].