I. Introduction
The development and evaluation of reinforcement learning techniques in real-world problems are far from trivial. One such task is to simulate the dynamics of an environment and the behavioral interactions of an agent [1]. Nevertheless, reinforcement learning has been applied successfully and has given promising results in several areas, such as traffic, networks, intelligence, robotics, and games, among others [2]. The issue of traffic is especially critical. Even in small communities, it is well recognized as one of the difficulties we encounter daily. As a result, the artificial intelligence community has paid special attention to traffic. Because of its distributed and autonomous nature, traffic has proven to be an intriguing testbed for reinforcement learning systems [3].