I. Introduction
Competitive Influence Maximization (CIM) problem considers multiple parties offering same or similar products [1] and compete for buyers or users in the social network. A user will choose only one of the products. The parties, therefore, need to compete to attract as many users as possible in order to gain profit. Most existing studies extended the traditional Influence Maximization (1M) models to formulate the CIM problem [2]–[4]. Recently, a reinforcement learning (RL) based model is proposed to solve the CIM problem while selecting a fixed number of the node(s), that is, a single node, in each around for influence propagation [5]. Nevertheless, existing solutions in the literature require to train the model in per-setting per network basis. That is, whenever the target network changes (different networks, different diffusion models, …) or the settings of agents change (budget, deadline, …), the model must be re-trained in order to reach acceptable