I. Introduction
Multi-agent systems (MASs) have garnered significant attention due to their applicability in diverse domains such as formation control [1], autonomous vehicles [2] and smart grid [3]. In multi-agent reinforcement learning (MARL), several agents engage with their surroundings to acquire rewards and develop best practices to fulfill extended objectives. [4]. The incorporation of reinforcement learning (RL) techniques has enabled MASs to autonomously learn and adapt to complex environments, making them highly effective in dynamic and uncertain scenarios [5], [6]. However, MASs are vulnerable to Byzantine attacks, where some agents may behave arbitrarily due to faults or malicious intentions. These attacks can severely disrupt the learning process and degrade system performance. To mitigate the impact of Byzantine attacks, various methods have been proposed, focusing on enhancing consensus protocols and improving fault tolerance. Techniques such as majority voting [7], reputation-based systems [8], optimization algorithm [9], [10] and resilient consensus algorithms [11]–[13] have been explored to ensure system reliability in the presence of faulty or malicious agents. While these methods provide some level of robustness, they often come with significant communication overhead or computational complexity, which can limit their scalability and effectiveness in large-scale MASs. To address these limitations, researchers have explored alternative approaches that reduce communication or computational demands. Current studies in event-triggered mechanisms for multi-agent systems primarily aim to develop rules for triggering that ascertain the timing of such events and avoid Zeno phenomena, defined by endlessly frequent triggers within a limited time frame [14], [15]. Therefore, it is imperative to develop an algorithm capable of efficiently handling high-dimensional state spaces, maintaining robust performance in the presence of Byzantine attacks, and operating with low communication overhead.