I. Introduction
With the ever growing amount of wireless devices and thus increasingly hostile and congested radio frequency (RF) environment, devices will increasingly need the ability to dynamically change according to RF conditions, including, but not limited to, poor channel conditions and the presence of interference. One way a device can mitigate obstacles such as these is through reinforcement learning (RL). Its learning capacity over dynamically changing environments over a long period of time has shown promise in many fields. Furthermore, RL can help a system to optimally operate, even in the presence of unknown obstacles. For RF-related systems, this capability is desirable for systems that need adaptability, and hence optimization of communication, in real-time systems.