Loading [MathJax]/extensions/TeX/euler_ieee.js
Learning Adversarial Transformer for Symbolic Music Generation | IEEE Journals & Magazine | IEEE Xplore

Learning Adversarial Transformer for Symbolic Music Generation

Publisher: IEEE

Abstract:

Symbolic music generation is still an unsettled problem facing several challenges. The complete music score is a quite long note sequence, which consists of multiple trac...View more

Abstract:

Symbolic music generation is still an unsettled problem facing several challenges. The complete music score is a quite long note sequence, which consists of multiple tracks with recurring elements and their variants at various levels. The transformer model, benefiting from its self-attention has shown advantages in modeling long sequences. There have been some attempts at applying the transformer-based model to music generation. However, previous works train the model using the same strategy as the text generation task, despite the obvious differences between the pattern of texts and musics. These models cannot consistently produce music samples of high quality. In this article, we propose a novel adversarial transformer to generate transformer to generate music pieces with high musicality. The generative adversarial learning and the self-attention networks are combined creatively. The generation of long sequence is guided by the adversarial objectives, which provides a strong regularization to enforce the transformer to focus on learning of the global and local structures. Instead of adopting the time-consuming Monte Carlo (MC) search method that is commonly used in the existing sequence generative models, we propose an effective and convenient method to compute the reward for each generated step (REGS) for the long sequence. The discriminator is trained to optimize the elaborately designed global and local loss objective functions simultaneously, which enables the discriminator to give reliable REGS for the generator. The adversarial objective combined with the teacher forcing objective is used to guide the training of the generator. The proposed model can be used to generate single-track or multitrack music pieces. Experiments show that our model can generate long music pieces with the improved quality compared with the original music transformers.
Page(s): 1754 - 1763
Date of Publication: 02 July 2020

ISSN Information:

PubMed ID: 32614773
Publisher: IEEE

I. Introduction

Contents generation has been a growing area of artificial intelligence techniques. Contents can be of various kinds: images, videos, text, and music. The former three have seen notably progress in past several years due to the evolution of generative models, such as generative adversarial networks (GANs) [1], [2], variational autoencoders (VAEs) [3], [4], and Flows [5], [6]. Several attempts at using these generative models have been made to generate symbolic music [7]–[9]. However, the music generation is still a challenging problem for the following reasons.

References

References is not available for this document.