I. Introduction
Contents generation has been a growing area of artificial intelligence techniques. Contents can be of various kinds: images, videos, text, and music. The former three have seen notably progress in past several years due to the evolution of generative models, such as generative adversarial networks (GANs) [1], [2], variational autoencoders (VAEs) [3], [4], and Flows [5], [6]. Several attempts at using these generative models have been made to generate symbolic music [7]–[9]. However, the music generation is still a challenging problem for the following reasons.