I. Introduction
Optimizing precoding is crucial for boosting spectral efficiency (SE) and energy efficiency (EE) of multi-input-multi-output (MIMO) systems. Learning precoding policies with deep neural networks (DNNs) enables the policies to be implemented in real-time [1], robust to channel estimation errors, and without explicit channel estimation [2]. While many DNNs such as fully connected neural networks (FNNs) [1] and convolutional neural networks [3], [4] have been designed for learning precoding policies, these DNNs are trained with high complexity and not guaranteed to be generalizable to every impact parameter such as the number of users. Since wireless systems are operated in open and dynamic environments, the learning efficiency of the DNNs in terms of generalization ability and training complexity is of paramount importance.